Technical
How AI Image Detection Works
By Maat Scan · March 24, 2026
As of early 2026, roughly one in three images on major social platforms shows signs of AI augmentation.1 Meanwhile, humans asked to distinguish real photos from AI-generated ones got it right only 38% of the time in recent studies, worse than random guessing.2That gap is what AI detection systems exist to close — and closing it has turned out to be much harder than it first appeared.
The Spectrum of "AI-Generated"
The term covers a broad range. At one end are images entirely synthesized by generative models like Midjourney, DALL-E, or Stable Diffusion, where no real photograph was involved at any stage. At the other end are lightly retouched photos where someone used a beauty filter or skin-smoothing app. In between sit heavily edited images, face swaps, inpainting (where specific regions are replaced by AI), and upscaled images.
Detection systems must handle all of these cases, which is why a single score rarely tells the full story. Maat Scan uses five separate signal dimensions to build a more complete picture.
Signal 1: Visual Impression (VLM Analysis)
The first and most powerful signal comes from a Vision-Language Model (VLM), a large neural network trained on millions of real and AI-generated images. VLMs learn to recognize subtle statistical patterns that separate synthetic images from photographs taken with a physical camera.
Unlike simpler binary classifiers, VLMs understand context. They can recognize that a face with perfect symmetry and zero skin variation is statistically unusual for a real human, or that background objects carry a dreamlike quality inconsistent with optical lenses. The VLM component produces an overall naturalness estimate that anchors the full score.
Signal 2: Skin and Material Texture
Human skin has a characteristic micro-texture: subtle pores, fine hairs, variable lighting across small areas. Generative models, especially those optimized to produce flattering portraits, tend to produce skin that is unnaturally smooth. Texture analysis measures the frequency spectrum of fine detail in the skin and material regions of an image.
Heavy photo retouching, such as the skin-smoothing apps widely used in Japan, Korea, and elsewhere, produces similar signatures: smoothing removes natural texture variation. This signal is therefore useful both for detecting full AI generation and for identifying heavily edited photographs.
Signal 3: Edge Integrity and Background Warping
Generative models often struggle at transitions: the boundary between a person and their background, or between two objects. Inpainting tools that replace portions of an image can leave subtle inconsistencies in edge sharpness and local contrast. Detection algorithms analyze these boundary regions for blending artifacts, unnatural gradients, or warping that would not occur in a camera photograph.
Face-swap technologies used in deepfakes are particularly vulnerable to this analysis, since blending a synthetic face onto a real body requires pixel-level compositing that rarely achieves perfect consistency.
Signal 4: Facial and Body Geometry
Human faces follow well-established geometric constraints: the ratio of eye spacing to face width, the relationship between nose length and chin height, the placement of ears relative to the jawline. Generative models sometimes violate these constraints in subtle ways, particularly in hands, teeth, and ears, which have historically been difficult for these models to render correctly.
Detection systems map detected facial landmarks and compute composite geometry scores. When proportions fall outside the normal distribution for real human faces, they contribute to a lower score. Maat Scan applies generous tolerances to avoid penalizing natural human variation, but systematic geometric anomalies remain detectable.
Signal 5: Image Metadata (EXIF Data)
Every photo taken with a camera or smartphone embeds metadata in the file: the camera model, lens focal length, shutter speed, ISO setting, GPS coordinates, and a timestamp. Images generated by AI tools typically lack this data entirely, since they are synthetic outputs with no physical device behind them.
Missing camera metadata is not proof of AI generation on its own — social media platforms strip EXIF data when images are shared. But combined with other signals, it shifts the overall assessment. Conversely, the presence of editing software metadata (Adobe Photoshop or Lightroom tags, for example) is a direct indicator that the image was processed after capture.
The Detection Gap
Accuracy varies sharply depending on which generator produced the image. Outputs from generators that were standard two or three years ago — older Midjourney versions, early DALL-E 3, original Stable Diffusion — are caught at 80 to 95% accuracy by leading detection tools.3 The newest generators are a different problem. A February 2026 benchmark study found that images from Flux Dev, Adobe Firefly v4, and Midjourney v7 were correctly identified only 18 to 30% of the time, worse than flipping a coin.4
A 2025 empirical study confirmed that detection performance degrades measurably with each new model generation, and that human raters lose reliability faster than automated systems do.5 Detection classifiers require regular retraining just to maintain their current accuracy, and that retraining cycle will not stop.
There will always be a category of images, particularly lightly edited real photographs, where assigning a confident score is genuinely hard. Maat Scan's scores are explicitly described as statistical estimates, not verdicts.
How to Use Detection Results
A low naturalness score is a signal worth investigating, not a conclusion. Responsible use means treating the score as one data point: look at which dimensions drove it, check the original source of the image, and consult other verification methods before making any consequential decision.
Detection tools are most valuable as a first-pass filter. They flag images that warrant closer inspection. They do not replace human judgment, and they should not be presented as doing so.
Sources
- Facia.ai / UC Berkeley, "AI Image Prevalence on Social Platforms," 2026.
- OpenPR.com, "Human AI Detection Accuracy Falls Below Chance," 2025.
- Imagera AI, "AI Image Detection Benchmark 2026," Imagera.ai, 2026.
- arXiv 2602.07814, "Open-Source AI-Generated Image Detection Benchmark," February 2026.
- arXiv 2511.02791, "Empirical Study of AI Image Detection Across Model Generations," November 2025.
- arXiv 2504.20865, "AI-GenBench: Ongoing Benchmark for AI Image Detection," April 2025.
