Technical

The Arms Race: Why AI Image Detection Gets Harder Every Year

By Maat Scan · May 26, 2026

A detector that scores 99% accuracy on a standard benchmark dropped below 62% when researchers tested it on deepfakes actually collected from social media in 2024.¹ That gap is not a calibration error or an edge case. It is the basic shape of this problem, and it has been widening every year.

What Detection Tools Look For

AI-generated images leave statistical traces. Diffusion models produce characteristic patterns in the high-frequency domain, visible in Fourier analysis but not to the naked eye. Real photographs carry sensor noise with specific spatial distributions; AI images tend to be slightly too clean, in ways that don't match any real camera sensor profile. Certain semantic inconsistencies persist in peripheral regions even when the central subject looks flawless.

Detection models are trained to spot these patterns. The problem is that the patterns change with every new generator. A CNN trained to recognize Stable Diffusion 1.5 artifacts may perform substantially worse on Flux.1, which was trained on a different dataset with a different architecture and leaves different traces.² Each new major model effectively resets the difficulty curve.

The Benchmark Trap

Accuracy numbers for deepfake detectors are almost always measured on academic datasets assembled from known generators, often the same ones used during training. Those numbers are real. They just don't predict field performance.

Deepfake-Eval-2024, a benchmark built from content actually circulating on social media in 2024, showed how large that gap is. State-of-the-art open-source detectors saw their AUC drop by 48% for images and 50% for video compared to their published benchmark scores.¹ The strongest commercial tools held up better, reaching 82% on in-the-wild content, but still fell well short of lab figures.

Two factors explain the drop. First, social media re-encoding strips away many of the frequency-domain signals detectors rely on most; a JPEG compressed twice through platform pipelines looks different from the original. Second, new generator versions keep appearing faster than detection models can be retrained on them. A tool trained on last quarter's images may never see the generator responsible for this quarter's fakes.

Why the Gap Isn't Closing

A 2025 paper from researchers at ETH Zurich argued that AI image detection is structurally unwinnable, not through lack of effort, but because of how generator and discriminator training interact.³ Their analysis found that detection is hardest at the extremes of dataset complexity: simple datasets let generators train to near-perfection, leaving no residual errors to detect; highly diverse datasets camouflage imperfections through sheer variety. Only intermediate complexity creates conditions favorable to detection, and that window narrows as models improve.

Adversarial attacks make this worse. Researchers have shown that small, invisible perturbations added to AI-generated images can cause detectors to classify them as genuine, and these attacks survive common post-processing including JPEG compression and resizing.⁴ These attacks don't require access to the detector's internals; black-box versions that treat the detector as an opaque system work across multiple commercial tools simultaneously.

The Watermark Alternative

Detection tries to read traces left by generators. Watermarking takes the opposite approach: embed a signal at creation that survives downstream processing and can be checked later.

Google DeepMind's SynthID embeds imperceptible patterns in images generated by Google's tools. As of early 2026, over 10 billion pieces of content carry a SynthID watermark.⁵ The signal survives cropping, color adjustment, and screenshot recapture. The limitation is coverage: only Google's own models embed it. OpenAI, Midjourney, Stability AI, Flux, and most other major generators do not use SynthID, which means it cannot detect content they produce.

C2PA provenance standards work differently: instead of hiding a signal in pixels, they attach a cryptographic certificate to the file at creation time, signed by the camera or software that made it. This proves where an image came from rather than whether it was AI-generated. The two approaches are complementary. A valid C2PA credential from a trusted hardware source, combined with a high authenticity score from a detection tool, is stronger evidence than either alone. The obstacle, as we covered in the C2PA article, is that most platforms strip C2PA metadata during upload.

What Realistic Defense Looks Like

The framing of an arms race implies that one side will eventually win. That is probably the wrong model. Detection accuracy against a specific, known generator can be very high. The problem is that the set of generators in active use keeps expanding, and detection models trained on last year's output are always catching last year's fakes.

A more useful frame: detection raises the cost of deception. An image that passes an automated check does not become trustworthy, but an image that fails one provides an early signal. Stacking multiple approaches, detection algorithms, provenance metadata, and human review for high-stakes content, produces better outcomes than any single method alone.

The EU AI Act's disclosure requirements, which begin phased enforcement in August 2026, push generators to mark their own output rather than leaving detection entirely to downstream tools. That shift in responsibility, from detector to creator, may matter more in practice than any improvement in classifier accuracy.

Sources

Zheng et al., "Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024," arXiv:2503.02857, March 2025.
arXiv:2602.07814, "Detection Accuracy of Flux Dev Images," 2026 (detection rates 18–30% on newer Flux models).
Aczel & Vettor, "The Unwinnable Arms Race of AI Image Detection," arXiv:2509.21135, September 2025.
Jia et al., "Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks," arXiv:2407.20836, July 2024.
Google DeepMind, "SynthID: Watermarking and Detecting AI-Generated Content," deepmind.google, 2025–2026.

← Previous

What Is C2PA? Fighting Fake Images with Metadata

Try Maat Scan

Scan a photo now