Technical

Frequency Analysis: How Researchers Spot AI Images at the Pixel Level

By Maat Scan · June 2, 2026

In 2020, researchers at the University of Hannover ran a GAN-generated face through a Fourier transform and found something the pixels alone would never reveal: a grid of evenly spaced spikes in the frequency spectrum, too regular to be natural, too faint for the eye to catch.¹ That one observation launched a detection approach now embedded in tools worldwide. Four years later, the field has grown considerably more complicated.

The Frequency Layer Beneath Every Image

A digital image is a grid of pixel values. But it can also be described as a sum of wave patterns at different spatial frequencies, where "spatial frequency" means how quickly brightness changes across the image. Fine textures like hair or fabric are high-frequency. Gradual color shifts like a sky at dusk are low-frequency.

The Discrete Fourier Transform (DFT) converts an image from pixel space into this frequency representation. Real photographs follow a characteristic 1/f power law: low frequencies dominate, and energy falls off smoothly as frequency rises.²This pattern comes from physics: natural scenes have statistical regularities that cameras faithfully record. AI generators are trained to match what images look like, but the underlying process they use leaves different statistical traces.

Why GANs Leave Checkerboard Fingerprints

GAN generators build images from low-resolution latent codes up to full-size output using transposed convolution, the standard upsampling layer. That operation has a structural flaw: the convolution kernel overlaps unevenly across the output, producing a repeating artifact at regular spatial intervals. In pixel space it looks like a faint grid. In the DFT it appears as discrete spikes at predictable frequency coordinates.¹

Training a classifier on these spectra rather than on pixels turned out to be remarkably effective. A frequency-domain CNN achieved 92.8% accuracy and an AUC of 0.95 on detecting GAN-generated images in 2024 tests, outperforming pixel-based classifiers on the same task.² Different GAN architectures produce spikes at different positions, which means the frequency spectrum doubles as a generator fingerprint: researchers can often identify not just whether an image is fake, but which model made it.

Diffusion Models Changed the Problem

Diffusion models don't use transposed convolution, so they don't produce checkerboard spikes. Detection tools trained to catch GAN artifacts miss most diffusion-generated content. That created a significant gap as Stable Diffusion, Midjourney, and DALL-E replaced GANs as the dominant generation method.

Diffusion models do leave traces, but subtler ones. Research has shown that diffusion-generated images exhibit progressively larger statistical differences from real photographs as you move from low to high frequencies.³ The high-frequency content is systematically underrepresented: the denoising process that generates the image cannot perfectly reproduce the fine-grain noise statistics of a real camera sensor.

A 2025 method called Frequency Forgery Clue enhancement (F²C) addressed this by applying a weighted filter to the Fourier spectrum that suppresses low-information frequency bands and amplifies the discriminative ones.³ A separate 2025 architecture called FreqCross found that synthetic images from Stable Diffusion 3.5 show characteristic spectral signatures in the 0.1 to 0.4 normalized frequency range, a band that natural photographs occupy differently.⁴

How JPEG Erases the Evidence

JPEG compression is itself a frequency-domain operation. It uses the Discrete Cosine Transform to divide image data into frequency bands, then discards the high-frequency information that human vision is least sensitive to. When a detector is looking for subtle high-frequency mismatches, JPEG often deletes exactly the signals it needs.

GAN checkerboard spikes partially survive mild compression, since they sit at specific positions the codec doesn't specifically target. Diffusion artifacts are harder: the high-frequency statistical mismatch that makes detection possible is precisely what JPEG strips first. Images shared through social media are re-encoded at least twice before a viewer sees them, which explains a significant portion of the accuracy drop from lab benchmarks to real-world content that we documented in the arms-race article.

Frequency Fingerprinting

Detection is a binary question: real or fake? Fingerprinting goes further. Because each model's architecture leaves a distinctive spectral pattern, a classifier trained on multiple generators can often identify which specific tool produced a given image, even when the image itself has been cropped or lightly edited.

This matters for forensics. When a manipulated image surfaces in a political campaign or a court case, knowing whether it came from Midjourney v7 or Flux.1 or a custom fine-tuned model is often more useful than a binary fake/real label.

The limitation is the same one facing all detection: newer models are increasingly trained on distributions that minimize spectral divergence from real photographs. Flux.1 already shows substantially smaller high-frequency deviations than Stable Diffusion 1.5, which is part of why its detection rate is 18 to 30% versus 70%+ for older models.⁵ The frequency approach remains one of the strongest available signals, but it is not a stable one. As with other detection methods, the window between a new generator's release and a trained detector's response is measured in months, not days.

Sources

Zhang et al., "Detecting and Simulating Artifacts in GAN Fake Images," arXiv:1907.06515, 2019 (foundational checkerboard frequency analysis).
Szeghalmy & Fazekas, "Discrete Fourier Transform in Unmasking Deepfake Images: A Comparative Study of StyleGAN Creations," MDPI Information 15(11):711, November 2024.
"Enhancing Frequency Forgery Clues for Diffusion-Generated Image Detection," arXiv:2511.00429, November 2024.
Li et al., "FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated Images," arXiv:2507.02995, July 2025.
arXiv:2602.07814, "Detection Accuracy of Flux Dev Images," 2026 (Flux.1 detection rates 18–30%).

← Previous

The Arms Race: Why AI Image Detection Gets Harder Every Year

Try Maat Scan

Scan a photo now