ReText.AI

AI Text Detection: How ReText.AI Analyzed 13,000 Theses

Olga Shkryaba
Published: June 10, 2026June 10, 2026
-
0
Olga Shkryaba
ReText.AI analyzed 12,996 graduation theses from 2013 to 2025 and more than 590 million characters. This article explains how AI text detection was performed, how the AI text detector worked and how this differs from plagiari
Contents:
AI Text Detection in Graduation Theses: How ReText.AI Analyzed 13,000 Documents
Quick summary: what AI text detection showed
What we studied: AI text detection in graduation theses
What corpus was included in the study
How we prepared texts for AI detection
How to check text for AI: why we analyzed paragraphs, not the whole document
How the AI text detector worked
How AI share was calculated
What AI detection showed by year
Which sections receive the AI label more often
How AI text detection differs from plagiarism checking
Methodology limitations
Main conclusion of the study

AI Text Detection in Graduation Theses: How ReText.AI Analyzed 13,000 Documents

The ReText.AI team conducted a study to see how academic writing has changed after the widespread adoption of neural networks.

We analyzed 12,996 graduation theses from 2013 to 2025. The corpus included more than 590 million characters. The texts were not checked as whole documents, but paragraph by paragraph: this makes it possible to see more precisely which parts of a paper look human-written and which parts look like fragments generated or substantially rewritten by a language model.

In this article, we explain how AI text detection was performed, which fragments were excluded from the analysis, how AI share was calculated, and why the results should be read as corpus-level statistics rather than as evaluations of individual papers.

Quick summary: what AI text detection showed

The main finding of the study is that after 2022, AI share in graduation theses began to grow noticeably.

According to ReText.AI:

  • AI share increased from 9.9% in 2022 to 42.3% in 2025;
  • in 2025, the highest values were most often found in conclusions — around 56%;
  • in introductions, AI share was around 49%;
  • in the main body, the figure was lower — around 41%;
  • the share of papers with almost no AI-like patterns decreased: in 2022, such papers accounted for around 70%, while in 2025 the figure was around 23%;
  • if the trend continues, AI share may reach the 50–60% range across the corpus in 2026.

These figures show a broader shift in academic writing: neural networks have moved from being an experiment to becoming part of the process of drafting, editing, and structuring academic materials.

AI text detection chart showing AI share dynamics in graduation theses from 2013 to 2025 based on ReText.AI research

After 2022, the indicator starts rising faster: from 9.9% in 2022 to 42.3% in 2025.

What we studied: AI text detection in graduation theses

Neural networks have already become part of text work. They are used for drafts, editing, translation, structuring, shortening, and formulating introductions and conclusions.

This is especially noticeable in academic texts: different sections of a paper are written differently. Introductions and conclusions more often consist of standard wording, while the main body usually requires more data, analysis, and author conclusions.

The goal of the study was not simply to find “texts from AI.” We wanted to look at the dynamics:

  • how AI share changed by year;
  • in which sections AI-like patterns appear more often;
  • how an AI text detector reacts to academic style;
  • where background detections may appear;
  • how visible the shift is after the mass adoption of LLM tools.

What corpus was included in the study

The study used 12,996 graduation theses from the period 2013–2025.

After preparation and filtering, 590,944,775 characters were included in the analysis. For each paper, AI share was calculated — the share of text that the AI detector classified as similar to machine generation or LLM-based rewriting.

Infographic with key parameters of the ReText.AI AI text detection study: 12,996 graduation theses, 2013–2025, and more than 590 million characters analyzed

The average AI share across the entire corpus was 14.7%, but the change by year is more important than the overall average. The sample includes many papers from before 2022, when modern LLM tools were not yet widely available. That is why the earlier years help estimate the baseline: how often the detector reacts to formal academic style on its own.

How we prepared texts for AI detection

Before the analysis, the texts were cleaned of fragments that could distort the result.

The following were not included in the check:

  • title pages;
  • abstracts;
  • tables of contents;
  • reference lists;
  • appendices;
  • acknowledgments;
  • internship reports;
  • captions for figures and tables;
  • formula fragments;
  • service text and fragments that were too short.

Only paragraphs of at least 500 characters were used for the analysis. Short phrases often do not give the AI detector enough context and may produce less stable estimates.

We also excluded papers where too little substantive text remained after cleaning: for example, fewer than 10 suitable paragraphs or too little main-body content.

How to check text for AI: why we analyzed paragraphs, not the whole document

A single graduation thesis can consist of very different fragments. For example, the introduction may be written in a template-like way, the main body may be more author-driven, and the conclusion may again use standard academic language.

If the whole document is checked at once, these differences are smoothed out. That is why we used paragraph-level analysis.

Each paragraph was checked separately. This makes it possible to see not only the overall score for a paper, but also the distribution: which parts of the text are more likely to receive an AI label and which parts look more natural to the detector.

How the AI text detector worked

The study used an LLM detector trained to distinguish human-written texts from fragments that resemble language-model generation.

An individual paragraph was passed to the detector as input. As output, the model determined whether the fragment belonged to human text or AI text. If a paragraph was classified as AI, the presumed model-generator group was additionally recorded.

It is important to keep in mind that a detector of AI text evaluates not the history of how a document was created, but linguistic features: structure, repetition, predictability of wording, academic templating, and other stylistic patterns.

That is why the study results were used only for aggregated analytics: by years, sections, and groups of texts.

How AI share was calculated

AI share was calculated by characters, not by the number of paragraphs.

Formula:

AI share = characters in paragraphs classified as AI / total characters in analyzed paragraphs

For example, if after cleaning a paper contained 100,000 characters of substantive text and 25,000 characters were in paragraphs that the detector classified as AI, the AI share of that paper was 25%.

This approach makes the estimate more stable: long substantive paragraphs have a stronger influence on the final score than short fragments.

What AI detection showed by year

The main result of the study is a noticeable increase in AI share after 2022.

In papers written before the mass adoption of modern LLM tools, the detector also found individual AI-like fragments. This is the expected baseline: formal academic style, template-like introductory wording, translations, and standard phrases can look “machine-like” to the model.

But after 2023, the picture changes. AI share starts growing much faster. According to the study, in 2025 the indicator reached 42.3% for long paragraphs.

Chart showing AI share distribution in graduation theses: around 70% of papers were in the 0–10% AI-share range in 2022 compared with around 23% in 2025

In 2022, around 70% of papers were in the 0–10% AI-share range. In 2025, around 23% remained in that range: this shows that AI-like patterns became noticeably more common.

Which sections receive the AI label more often

We also decided to compare the introduction, the main body, and the conclusion separately.

The highest AI-share values were more often found in introductions and conclusions. These sections usually contain more universal wording: relevance, goals, objectives, general conclusions, transitions, and summaries.

In 2025, AI share in conclusions was around 56%, in introductions around 49%, and in the main body around 41%.

The main body looked less “generated” on average. A likely reason is that it contains more specifics: data, analysis, calculations, references, results, and author argumentation.

This is why AI text detection in academic writing requires careful interpretation: the detector may react not only to possible machine rewriting, but also to the standard wording of the genre, especially in introductions and conclusions.

AI text detector chart showing AI share by thesis section in 2025: conclusions around 56%, introductions around 49%, and main body around 41%

Conclusions and introductions more often receive AI-generation signals because these sections contain more standard academic wording.

How AI text detection differs from plagiarism checking

A plagiarism checker usually looks for matches with already published sources: websites, articles, student papers, and databases of academic work.

An AI detector solves a different task: it estimates how much a text resembles machine generation or rewriting.

That is why these checks cannot replace each other. A text can be original from the point of view of source overlap but still look AI-like to a detector. And the reverse is also possible: a text can be written by a person but contain matches with sources.

In our study, we analyzed signs of AI generation and LLM rewriting, not source overlaps.

Methodology limitations

The study has several important limitations.

First, an AI detector does not produce an absolute assessment. It works with probabilistic features of the text.

Second, academic style itself can increase the probability of detection, especially in introductions, conclusions, and fragments with standard wording.

Third, the detector may react differently to texts in different languages and to texts that have gone through translation or editing.

Fourth, AI share does not show exactly what role a neural network played: generation from scratch, editing, translation, paraphrasing, or help with individual phrases.

That is why the main value of the study is not in isolated percentages, but in comparing periods and large-scale trends.

Main conclusion of the study

The ReText.AI study showed that after the mass adoption of LLM tools, the share of fragments in academic texts that the AI text detector classifies as similar to machine generation or rewriting has grown noticeably.

The strongest growth is visible after 2022. AI-like fragments appear especially often in introductions and conclusions — sections with more standard academic wording.

At the same time, AI text detection results should be read carefully. They show not the “creation history” of a specific text, but linguistic features that become visible across a large corpus.

The main conclusion is not that neural networks have “replaced” authors, but that they have become part of academic writing. The next step is not to debate the fact of AI use itself, but to build clear rules: where neural networks are acceptable as an editing tool, how to disclose the use of AI tools, and how to distinguish help with text from replacing independent work.

Contents:
AI Text Detection in Graduation Theses: How ReText.AI Analyzed 13,000 Documents
Quick summary: what AI text detection showed
What we studied: AI text detection in graduation theses
What corpus was included in the study
How we prepared texts for AI detection
How to check text for AI: why we analyzed paragraphs, not the whole document
How the AI text detector worked
How AI share was calculated
What AI detection showed by year
Which sections receive the AI label more often
How AI text detection differs from plagiarism checking
Methodology limitations
Main conclusion of the study
Olga Shkryaba
Founder and CEO of Retext.ai
6
Rate article
0 reviews
Share
Rate article
Share
0 reviews
Rate article
Share
0 reviews
Comments
0 / 500