AI Media Forensics — manual
AI Media Forensics Manual
Practical Techniques for Detecting Synthetic Media
📑 Table of Contents
1. Introduction
Artificial Intelligence has enabled the creation of hyper-realistic synthetic media — images, video, audio, and text that can convincingly mimic authentic content. While AI brings innovation in media production, it also introduces risks: misinformation campaigns, reputational attacks, political manipulation, and cyber-enabled fraud.
This manual is designed for journalists, investigators, analysts, and digital forensic professionals. It delivers:
- A structured methodology for content verification.
- A multi-phase workflow that scales from rapid screening to evidentiary forensics.
- A toolkit of practical technologies aligned with OSINT and DFIR practices.
- Best practices for documentation, reporting, and transparency.
2. Analysis Models
The detection process is divided into four escalating phases:
- Rapid Triage (Initial Screening) – Quick suspicion check.
- Preliminary Verification (Lightweight Checks) – OSINT-based fast validation.
- Structured Forensic Analysis (In-Depth Review) – Comprehensive forensic-grade methods.
- Peer Review & Validation (Cross-Check) – Independent replication to reduce bias.
3. Detection Domains
How to use this section: Each domain targets a distinct failure mode common to synthetic media. Treat domains as independent lines of evidence. A single red flag rarely proves anything; two or more from different domains justifies escalation.
3.1 Anatomy & Object Integrity
Objective: Detect biological or object construction errors introduced by generative AI.
Indicators:
- Extra, missing, or fused fingers; malformed nails; symmetrical eyes without natural variation.
- Teeth rendered as uniform blocks or inconsistent with gum lines.
- Ears, earrings, or glasses distorted or asymmetrical.
- Clothing, fabric, or accessories with warped stitching, inconsistent patterns, or impossible geometry.
Checks:
- Zoom to 200–400% and scan hands, eyes, and teeth.
- Look for repeated face patterns in group shots.
- Compare mirrored body parts for natural asymmetry.
Tools: Forensically, magnifiers, reverse image search on cropped anomalies.
3.2 Geometry & Physics
Objective: Test whether light, perspective, and reflections obey physical laws.
Indicators:
- Shadows inconsistent with light sources or each other.
- Reflections missing in mirrors, water, or glass.
- Vanishing points misaligned; horizon misplaced.
- Object scale inconsistent with distance.
Checks:
- Use SunCalc to validate shadow length vs. claimed time/place.
- Draw vanishing lines to test perspective.
- Inspect reflections for parity and content.
Tools: SunCalc, Google Earth/Street View, Forensically.
3.3 Metadata & Technical Fingerprints
Objective: Analyze embedded metadata and camera/device signatures.
Indicators:
- Missing EXIF in photos that should contain it.
- Impossible timestamps or GPS coordinates.
- Software tags showing AI editors or generators.
- Uniform synthetic noise lacking natural PRNU (Photo Response Non-Uniformity).
Checks:
- Run ExifTool to review
Make/Model,DateTimeOriginal,GPSfields.
- Inspect compression signatures and quantization tables.
- Apply Noiseprint for sensor fingerprinting.
Tools: ExifTool, FotoForensics, Noiseprint.
3.4 Voice & Audio
Objective: Identify synthetic patterns in speech or environmental sound.
Indicators:
- Robotic cadence; unnatural prosody.
- Missing breathing, mouth clicks, or ambient noise.
- Spectrogram anomalies: clean high frequencies, banding.
Checks:
- Inspect spectrograms for unnatural frequency bands.
- Measure jitter/shimmer in Praat for vocal variation.
- Compare lip sync to phonemes in video.
3.5 Contextual Consistency
Objective: Confirm claimed time, place, and environment.
Indicators:
- Seasonal mismatch (snow vs. claimed summer).
- Buildings or skylines inconsistent with stated location.
- Weather contradicting meteorological records.
Checks:
- Validate shadows and lighting with SunCalc.
- Compare weather with Meteostat or OGIMET logs.
- Cross-reference landmarks via Google Earth or Street View.
Tools: Meteostat, OGIMET, Google Earth.
3.6 Behavioral & Social Signals
Objective: Assess realism of group dynamics and human behavior.
Indicators:
- Identical faces or clothing repeated in crowds.
- People ignoring focal events (all gazes in wrong direction).
- Uniform expressions or synchronized gestures.
Checks:
- Run face clustering to detect duplicates.
- Check gaze direction consistency.
- Observe micro-expressions and natural motion.
Tools: InVID, Forensically.
3.7 Textual AI Fingerprints
Objective: Detect linguistic artifacts of AI-generated text.
Indicators:
- Repetitive scaffolding or formulaic phrasing.
- Fabricated citations or unverifiable facts.
- Uniform sentence lengths and transitions.
Checks:
- Run AI detectors on samples.
- Perform stylometric comparison to known author texts.
- Spot-check quotes and references.
Tools: GLTR, DetectGPT, HuggingFace Models, JStylo.
3.8 Provenance & Watermarking
Objective: Identify provenance credentials or embedded watermarks.
Indicators:
- Valid C2PA signatures showing edit history.
- Invisible watermarks indicating AI generation.
Checks:
- Extract provenance JSON and verify signatures.
- Run SynthID or watermark scanners where available.
Tools: C2PA, Adobe Content Credentials, Google SynthID.
3.9 AI-vs-AI Detection
Objective: Apply specialized AI detectors trained to spot generative content.
Indicators:
- High detector confidence across multiple frames.
- Consistent outputs from different models.
Checks:
- Apply forensic CNNs (XceptionNet, FaceForensics++).
- Compare results across multiple detectors.
Tools: FaceForensics++, DFDC models, XceptionNet-based classifiers.
3.10 Cross-Modal & Narrative Consistency
Objective: Ensure all media modalities align with the narrative.
Indicators:
- Lip sync mismatch between audio and video.
- Weather sounds inconsistent with visual conditions.
- Narration contradicting imagery.
Checks:
- Align timestamps across text, audio, and video.
- Verify environment acoustics match visual context.
- Map camera positions vs. scene constraints.
Tools: CrossCheck, SensityAI, Reality Defender.
4. Step-by-Step Procedures. Step-by-Step Procedures
This section provides an operational, reproducible workflow from first contact with a file/link to an evidence‑grade conclusion. It is organized into four phases. Each phase includes objectives, inputs, actions, tools, outputs, and escalation criteria.
4.0 Pre‑Flight: OPSEC & Chain of Custody (CoC)
Objective: Preserve evidentiary integrity and avoid contaminating artifacts.
Inputs: Source URL, file(s), claims (who/what/where/when), stakeholder urgency.
Actions:
- Acquire original if possible (avoid platform‑compressed versions). Request raw files via secure channel.
- Hash immediately:
- Bash/macOS:
shasum -a 256 <file>
- PowerShell:
Get-FileHash <file> -Algorithm SHA256
- Bash/macOS:
- Snapshot context: copy URL, post ID, author handle, timestamps (include time zone), and a screenshot of the claim.
- Workspace: operate on a copy; never re‑encode originals. Record tool names & versions.
- Risk & scope: decide if this is routine verification or high‑stakes (elections, conflict, criminal case).
Output: Case record with IDs, hashes, source notes, and a plan for Phase 1.
4.1 Rapid Triage (Initial Screening)
Objective: Decide in seconds whether the material merits deeper checks.
Inputs: One image/video frame, short audio snippet, or text excerpt.
Actions (by media type):
- Image/Video frame:
- Anatomy & objects: hands, eyes, teeth, ears, accessories, signage, logos.
- Physics: shadow direction/length, reflections, specular highlights; lighting continuity.
- “Too perfect” test: cinematic composition, hyper‑clean surfaces, uniform faces.
- Anatomy & objects: hands, eyes, teeth, ears, accessories, signage, logos.
- Audio: listen for breath/pauses, monotone prosody, robotic shimmer at pitch changes.
- Text: repetitive phrasing, encyclopedic tone, confident statements without sources.
Common red flags: extra/merged fingers; mismatched shadows; mirrored or unreadable micro‑text; cloned textures; lip‑sync oddities; identical smiles.
False positives: heavy denoise/HDR; professional retouching; platform recompression; staged marketing visuals.
Output: Triage code — Green (plausible), Amber (suspicious), Red (multiple anomalies). Amber/Red → Phase 2.
4.2 Preliminary Verification (Lightweight Checks)
Objective: Use fast OSINT & basic forensic tools to confirm or challenge authenticity.
Inputs: Original (preferred) or best‑quality copy; claimed time/place/context.
Tools (typical): Google/Bing/Yandex Images; InVID‑WeVerify; ExifTool; Forensically / FotoForensics; Noiseprint; SunCalc; Timeanddate/Meteostat/OGIMET; Google Earth/Street View.
Step‑by‑step:
-
Reverse search (image/video):
- If video, extract 4–12 keyframes (InVID → Keyframes or
ffmpeg -i input.mp4 -vf fps=1 frames/f%04d.jpg).
- Search the full image plus cropped regions (faces, signs, skyline). Try horizontal flip when relevant.
- Compare hits: earlier appearances, different captions, stock/AI galleries.
- If video, extract 4–12 keyframes (InVID → Keyframes or
-
Metadata inspection (images/video/audio):
exiftool <file>→ reviewMake/Model,Software,DateTimeOriginal,GPS*.
- Red flags: missing EXIF in camera JPEGs, impossible timestamps, odd
Software(generator), GPS contradicting claim.
- Caveat: social sites often strip/alter EXIF.
-
Basic pixel forensics (images):
- ELA/Clone/Noise in Forensically/FotoForensics.
- Red flags: isolated high ELA around inserted objects; tiled repeats; uniform noise where natural variation is expected.
- Noiseprint/PRNU hint: lack of camera‑like noise structure can support suspicion.
- ELA/Clone/Noise in Forensically/FotoForensics.
-
Context cross‑check (all media):
- Place: landmark geometry in Google Earth/Street View; signage language & fonts.
- Time/lighting: SunCalc — does shadow azimuth/elevation match claimed date/time/location?
- Weather: compare precipitation/clouds/temperature with Timeanddate/Meteostat/OGIMET.
- Place: landmark geometry in Google Earth/Street View; signage language & fonts.
Evidence to capture: screenshots of reverse‑search results; EXIF dumps; ELA/Noise overlays; SunCalc and weather pages (PDFs or images).
Decision & escalation:
- Converging authentic signals → document as provisionally authentic.
- ≥2 independent inconsistencies → escalate to Phase 3.
4.3 Structured Forensic Analysis (In‑Depth Review)
Objective: Produce a defendable assessment using advanced methods across modalities.
Inputs: Highest‑quality media; claims; any prior investigative notes.
Modules & procedures:
A) Video Forensics
- Frame extraction:
- Constant rate:
ffmpeg -i in.mp4 -vf fps=5 frames/f_%05d.jpg
- Scene changes:
ffmpeg -i in.mp4 -vf "select='gt(scene,0.5)'" -vsync vfr scenes/s_%05d.jpg
- Constant rate:
- Temporal artifacts: look for warping/morphing around faces/hands; inconsistent motion blur; jitter on edges; rolling‑shutter realism during pans.
- Optical flow/consistency: check for motion coherence of shadows/reflections across frames.
B) Audio Forensics
- Spectrogram analysis (Audacity): View → Spectrogram; inspect harmonics, breath noise, plosives; spot copy‑paste bands.
- Prosody/phonation (Praat): measure pitch (F0), jitter/shimmer; overly uniform patterns suggest synthesis.
- Deepfake detectors: run Resemble Detect / Deepware; treat as supporting, not decisive.
- Physiological cues: where applicable, evaluate biometric pulse cues (e.g., FakeCatcher‑style signals) with caution.
C) Text Stylometry
- Establish a baseline from verified writings (if authorship is at issue).
- Analyze with JStylo (function words, POS patterns, sentence length variance).
- Cross‑check with GPTZero/DetectGPT/HuggingFace classifiers; corroborate with factual verification (quotes, sources, dates).
D) Contextual OSINT
- Geolocation: skyline line‑drawing; terrain/river bends; sign typography; street furniture; license plates.
- Chronology: construction timelines (bridges, towers), event schedules, transport GTFS feeds.
- Remote sensing: Sentinel Hub/NASA Worldview for cloud cover, snow extent, wildfire smoke on claimed dates.
E) Provenance & Watermarking
- C2PA/Content Credentials: inspect with compatible viewers; export the provenance JSON; verify signatures and edit history.
- SynthID/Watermarks: where tooling is available, check invisible watermarks in images/audio/text; document limitations.
F) Model‑Specific Forensics (AI‑vs‑AI)
- Apply forensic CNNs (e.g., XceptionNet/FaceForensics++/DFDC models) on images/frames; never as a sole indicator. Record model type, version, thresholds, and confusion risks.
G) PRNU / Camera Fingerprinting (expert option)
- Extract sensor noise residuals; compare to a reference set of images from the purported device.
- Caveats: recompression, denoise, and resizing degrade PRNU; treat as corroborative.
Outputs:
- Annotated frames/spectrograms; tool outputs (versions, parameters); OSINT corroboration; a reasoned conclusion with probability language (see 4.5).
Escalation triggers: conflicting signals; high impact (elections, criminal proceedings); legal request for expert affidavit.
4.4 Peer Review & Validation (Cross‑Check)
Objective: Reduce bias and ensure reproducibility.
Process:
- Prepare a neutral brief (facts, methods, outputs) avoiding leading language.
- A second analyst replicates key steps (reverse search, EXIF, pixel/audio/text analysis, context checks) independently.
- Compare findings; document agreements and discrepancies; if needed, seek a third expert or additional data (original file, higher resolution, longer cut).
Artifacts: replication log, checklist of reproduced results, change log of conclusions.
Outcome: consensus conclusion or documented divergence with rationale.
4.5 Decision & Reporting Framework
Probability bands (recommendation):
- Very Low (≤20%) — unlikely AI‑generated.
- Low (21–40%) — weak indicators; more data recommended.
- Indeterminate (41–59%) — conflicting signals; seek originals or expert tests.
- High (60–80%) — multiple independent indicators of AI/manipulation.
- Very High (>80%) — strong, corroborated evidence across domains.
Language examples: “High likelihood of AI generation based on [A, B, C], with no contradicting evidence. Limitations: [X, Y].”
Minimum evidence for publication (suggested): ≥2 independent indicators from different domains or 1 strong forensic indicator + context contradiction.
4.6 Automation Recipes (Optional)
- Batch EXIF export:
exiftool -csv -r -DateTimeOriginal -Make -Model -Software -GPS* <folder> > exif_report.csv
- Batch keyframes:
ffmpeg -i in.mp4 -vf fps=1 out/frame_%05d.jpg
- Scene change list:
ffmpeg -i in.mp4 -filter:v "select='gt(scene,0.4)',showinfo" -f null - 2> scenes.log
Tip: Log tool versions and parameters alongside outputs for reproducibility.
4.7 Case Log Template (suggested fields)
- Case ID, Analyst, Date/Time (TZ), Source URL/ID, Acquisition method, File hashes (SHA‑256), Media type, Claimed context, Tools & versions, Steps performed, Findings per step, Indicators (pro/contra), Probability band, Peer reviewer, Final conclusion, Evidence archive location.
5. Best Practices
- Two-signal principle: Never conclude based on one indicator.
- Documentation: Maintain chain of custody (hashes, metadata, tool versions).
- Probabilistic reporting: Use “high likelihood” instead of absolutes.
- Continuous adaptation: Update methods every 6–12 months.
- Automation: Integrate tools into scripted pipelines.
- Crowdsourced verification: Collaborate with OSINT/fact-checking communities.
6. Analyst Toolkit (2025)
- Images: Forensically, FotoForensics, ExifTool, Noiseprint.
- Video: InVID, ffmpeg, FakeCatcher.
- Audio: Praat, Audacity, Resemble Detect, Deepware Scanner.
- Text: GPTZero, DetectGPT, JStylo, HuggingFace classifiers.
- Provenance: C2PA tools, Adobe Content Credentials, SynthID.
- Context: Google Earth, Sentinel Hub, Meteostat, NASA Worldview.
- AI-forensics: XceptionNet, FaceForensics++, DFDC models.
7. Strategic Outlook
- Evolving AI: Generation models are rapidly improving, masking older flaws.
- Future of detection: Watermarking, provenance standards, and blockchain-based verification will be critical.
- Present reality: Only a hybrid approach (intuition + OSINT + forensics + AI detectors + provenance tools) can sustain investigative integrity.
Appendix A: Domain → Tools Matrix
| Detection Domain | Techniques | Tools (Open-Source / Free) |
|---|---|---|
| Visual Forensics | Identify anomalies: hands, eyes, teeth, reflections, shadows | Forensically, Deepware Scanner, GIMP |
| Metadata & File Integrity | Extract & analyze EXIF, XMP, hashes, signatures | ExifTool, Mat2, Hashdeep |
| Error Level & Compression Analysis | ELA, JPEG ghost detection, noiseprint mismatch | FotoForensics, Noiseprint |
| Reverse Image/Video Search | Reverse search images/videos for provenance | InVID-WeVerify, Yandex Images, TinEye |
| Audio Forensics | Spectrograms, waveform anomalies, deepfake audio classifiers | Sonic Visualiser, Praat, FakeCatcher (Intel) |
| Textual Stylometry | Stylometry, linguistic patterns, AI-text probability detectors | GLTR, DetectGPT, HuggingFace Transformers |
| Context & OSINT Cross-Verification | Cross-check geography, time, weather, events | OSINT Framework, Wayback Machine, Bellingcat Tools |
| Network & Source Traceability | Trace network origins, domains, C2PA provenance | WhoisXML API, Maltego CE, RiskIQ, C2PA |
| Cross-Modal Consistency | Check if narrative matches across modalities | CrossCheck, SensityAI, Reality Defender |
| Automation & Pipelines | Automate via pipelines, ML models, SIEM/XDR integrations | Apache Tika, HuggingFace pipelines, Python, MISP, Sigma rules |
Appendix B: Automation Snippets & Field Kit
Image & Metadata
- Extract metadata (all files in folder to CSV):
exiftool -csv -r folder/ > exif_report.csv - Strip metadata for sharing (privacy):
mat2 file.jpg
Video Analysis
- Extract 1 frame per second:
ffmpeg -i video.mp4 -vf fps=1 frames/out_%04d.jpg - Extract scene changes:
ffmpeg -i video.mp4 -filter:v "select='gt(scene,0.4)',showinfo" -vsync vfr scenes/out_%04d.jpg - Get video codec/container info:
mediainfo video.mp4
Audio Analysis
- Convert to WAV for spectrograms:
ffmpeg -i input.mp4 -vn -acodec pcm_s16le output.wav - Generate spectrogram (SoX):
sox output.wav -n spectrogram -o spectro.png
Text Analysis
- Detect AI-like text probability (DetectGPT):
from detectgpt import DetectGPT model = DetectGPT() score = model.score_text("sample text") print(score) - Check perplexity with GPT-2 LM (HuggingFace):
from transformers import GPT2LMHeadModel, GPT2TokenizerFast import torch model = GPT2LMHeadModel.from_pretrained("gpt2") tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") text = "sample text" encodings = tokenizer(text, return_tensors="pt") max_length = model.config.n_positions stride = 512 nlls = [] for i in range(0, encodings.input_ids.size(1), stride): begin_loc = max(i + stride - max_length, 0) end_loc = i + stride trg_len = end_loc - i input_ids = encodings.input_ids[:, begin_loc:end_loc] target_ids = input_ids.clone() target_ids[:, :-trg_len] = -100 with torch.no_grad(): outputs = model(input_ids, labels=target_ids) nlls.append(outputs.loss * trg_len) ppl = torch.exp(torch.stack(nlls).sum() / end_loc) print(ppl.item())
Networking & Provenance
- WHOIS lookup (Linux):
whois example.com - Get SSL/TLS certificate info:
echo | openssl s_client -connect example.com:443 -servername example.com 2>/dev/null | openssl x509 -noout -dates -issuer -subject
Workflow Helpers
-
Batch hash files in folder:
sha256sum * > hashes.txt -
Create case log template (Markdown):
# Case Log - Case ID: - Analyst: - Date/Time (TZ): - Source URL/ID: - File Hashes (SHA-256): - Media Type: - Claimed Context: - Tools & Versions: - Findings: - Indicators: - Probability Band: - Peer Reviewer: - Final Conclusion:🔖 Credits
Maintained by Oryon +OSINT360 GPT.
This document is part of the Cyber Intelligence Toolkit project.