The academic community is facing a disruptive change due to generative AI. Students are using generative AI tools for coursework, and academic integrity is therefore tested as never before. Meanwhile, teachers are at odds over what to do; while some resist the change, others are more willing to adapt. One suggested solution is to use AI detectors.
Author: Hamid Guedra
Do AI detectors work?
But they are useless. They can only give you answers. –Pablo Picasso on computers (Ratcliffe 2018)
ChatGPT and other generative AI tools have disrupted academia during the past year and a half. Some voices resist this ongoing change, whereas others are more in favor of adaptation (Mucharraz y Cano et al. 2023). Questions abound: How much are students already using generative AI tools for coursework? To what extent and under what circumstances should students be allowed to use generative AI tools, if at all? If the use of generative AI tools is integrated into teaching, how is their efficient and ethical use best instructed?
Some in the academic community are doing their best to figure out how to rein in unethical uses of generative AI. One solution has been to use AI detectors. This, however, raises a question: “Do AI detectors work?” The simple answer is: “No, they don’t.” There is already a growing body of research finding AI detectors to be unreliable and inaccurate, easy to manipulate, and even biased against a certain student population’s writing style (OpenAI n.d.; Myers 2023; Weber-Wulff et al. 2023; Perkins et al. 2024).
I decided to test some available AI detectors and introduce here two simple experiments to illustrate how unreliable AI detectors can be.
Experiment 1: Having to prove a negative
In the first experiment, I used GPTZero (2024), which claims to be “[m]ore than an AI detector. Preserve what’s human.” I wanted to test how GPTZero would rate a short two-sentence passage I had written in a recent blog post (see Guedra 2024). I copied the passage into the detector, and GPTZero gave it an 84% probability of being AI-generated (Image 1). This is a false positive.
Image 1. GPTZero falsely detects AI-generated writing with an 84% probability (All screenshots by Hamid Guedra)
Although my writing may appear robotic sometimes, I did not use generative AI to write the passage. If I were a student, I would feel wrongly accused if my instructors asked me to explain the result to them (cf. Fowler 2023). This would be a case of trying to prove a negative—very difficult, even if not entirely impossible.
Perhaps my robotic writing style is not the whole story here. Image 2 shows another take on the passage. The probability has dropped nearly 60 percentage points, from 84% to 28%. Why such a drop? I changed the transitional word “therefore” to “and” at the beginning of the second sentence in the passage. And, yes, before old-school grammar sticklers get upset about this, beginning a sentence with “and” is fine (Merriam-Webster 2024).
Image 2. GPTZero falsely detects AI-generated writing with a 28% probability
I continued the experiment. Image 3 shows my third take on the passage. The probability has now dropped a total of 82 percentage points from the first take, from 84% to 2%. Why the massive change? I deleted the transitional word, “therefore”. Other than that, it was the same passage.
Image 3. GPTZero detects AI-generated writing with only a 2% probability
This is somewhat alarming. In English for academic and professional writing courses, students are commonly taught to use transitional words and expressions, cohesive devices, to show how parts of their writing connect and to show logical progression. In English composition terms, this is a key element of cohesion and flow: “Cohesion […] is the glue that sticks everything together” (Petelin 2016, 124).
Of course, the amount of text in this experiment was minuscule, making the likelihood of false positives higher than with more text. This is likely the reason GPTZero was only “moderately confident” (Image 1 and Image 3) or “uncertain” (Image 2) between its analyses of the different versions of the passage.
Experiment 2: Gaming AI detectors is easy
My second experiment was different. I aimed to test whether it would be possible to deceive AI detectors by simply prompting a generative AI to rewrite an already AI-generated essay as if the essay were written by a certain kind of student. I used Google’s Gemini Advanced (Gemini 2024) to do this.
First, I prompted Gemini to generate a 500-word argumentative essay about market segmentation, which it did in five seconds. GPTZero successfully flagged this basic version of the essay as AI-written with 96% probability. I continued and prompted Gemini to rewrite the essay as if it were written by an English as a Foreign Language (EFL) student. I did not edit Gemini’s output but submitted the EFL student version of the essay to GPTZero. I also uploaded the essay to Turnitin on LAB/LUT’s Moodle platform and to QuillBot’s (QuillBot 2024) online AI detector.
How well did these three detector tools manage to flag the 100% AI-generated EFL student version of the essay? Not all well. GPTZero was “highly confident” that the essay was written by a human (Image 4).
Image 4. GPTZero fails to detect AI-generated writing
Like GPTZero, Turnitin’s AI detector failed to flag the EFL student version of the essay as AI-written, giving it a 0% probability (Image 5).
Image 5. Turnitin fails to detect AI-generated writing
QuillBot’s AI detector, however, managed much better, giving the EFL student version of the essay a 73% probability of being AI-generated (Image 6).
Image 6. QuillBot successfully detects AI-generated writing
I continued the experiment a little further to test whether my observation about transitional words in the first experiment would perhaps prove true. Deleting a transitional word here and there in the EFL student version of the essay did indeed lower QuillBot’s probability percentage slightly, from 73% to 69%.
Out of curiosity, I used Gemini again to generate another EFL version of the essay and uploaded this to QuillBot’s AI detector. For some reason, this time the detector failed and gave this second version of the essay only an 8% probability of being AI-generated (Image 7).
Image 7. QuillBot fails to detect AI-generated writing
Finally, it is worth noting that because the EFL student version of the essay was riddled with writing mistakes, it would unlikely earn the highest marks for English composition. The essay could, however, still earn a passing grade, making the use of generative AI tools very cost-effective for some students.
Conclusion
Instead of going away, generative AI tools are likely here to stay. Trying to resist the changes they bring might prove futile; instead, adaptation could be a better strategy. However, while some educators are doing their best to adapt and also maintain academic integrity in the process, it is unlikely that policing students’ coursework with AI detectors would prove effective. Since AI detectors seem to be highly unreliable and gaming them takes little to no effort, teachers should use them with caution—if at all.
References
Fowler, J. 2023. What to do when you’re accused of AI cheating. The Washington Post. Cited 27 March 2024. Available at https://www.washingtonpost.com/technology/2023/08/14/prove-false-positive-ai-detection-turnitin-gptzero/
Gemini. 2024. Cited 24 March 2024. Available at https://gemini.google.com/
GPTZero. 2024. Cited 24 March 2024. Available at https://gptzero.me/
Guedra, H. 2024. Teaching critical AI literacy is a must. LAB Focus. Cited 24 March 2024. Available at https://blogit.lab.fi/labfocus/en/teaching-critical-ai-literacy-is-a-must/
Merriam-Webster. 2024. Is it Ever Okay to Start a Sentence With ’And’? Merriam-Webster, Inc. Cited 24 March 2024. Available at https://www.merriam-webster.com/grammar/words-to-not-begin-sentences-with
Mucharraz y Cano, Y., Venuti, F. & Herrera Martinez, R. 2023. ChatGPT and AI Text Generators: Should Academia Adapt or Resist? Harvard Business Publishing. Cited 27 March 2024. Available at https://hbsp.harvard.edu/inspiring-minds/chatgpt-and-ai-text-generators-should-academia-adapt-or-resist
Myers, A. 2023. AI-Detectors Biased Against Non-Native English Writers. Stanford HAI. Cited 27 March 2023. Available at https://hai.stanford.edu/news/ai-detectors-biased-against-non-native-english-writers
OpenAI. n.d. How can educators respond to students presenting AI-generated content as their own? Cited 27 March 2024. Available at https://help.openai.com/en/articles/8313351-how-can-educators-respond-to-students-presenting-ai-generated-content-as-their-own
Perkins, M., Roe, J., Binh, H.V., Postma, D., Hickerson, D., McGaughran, J., Huy & Q.K. 2024. GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education. A preprint. Cited 5 Apr 2024. Available at https://arxiv.org/abs/2403.19148
Petelin, R. 2016. How Writing Works. Crows Nest (NSW), Australia: Allen & Unwin.
QuillBot. 2024. Free AI detector. Cited 24 March 2024. Available at https://quillbot.com/ai-content-detector
Ratcliffe, S. 2018. Oxford Essential Quotations. 6th Edition. Cited 28 March 2024. Available at https://doi.org/10.1093/acref/9780191866692.001.0001
Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., Šigut, P. & Waddington, L. 2023. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. Vol 19. Cited 24 March 2023. Available at https://doi.org/10.1007/s40979-023-00146-z
Author
Hamid Guedra is a Senior Lecturer and teaches English for professional and academic purposes at the Language Centre of LAB University of Applied Sciences and LUT University. Although perhaps a bit robotic sometimes, Hamid is not a generative AI tool.
Illustration: Ideogram.ai, edited by Hamid Guedra
Published 5.4.2024
Reference to this article
Guedra, H. 2024. Testing the Reliability of AI Detectors: Having to Prove a Negative and Gaming AI Detectors. LAB Pro. Cited and date of citation. Available at https://www.labopen.fi/en/lab-pro/testing-the-reliability-of-ai-detectors-having-to-prove-a-negative-and-gaming-ai-detectors/