The word "deepfake" has become, in just a few years, one of the defining terms of the generative-AI era. Behind this portmanteau lies a precise technical reality and a set of stakes that touch privacy, democracy, finance and journalism alike. This pillar guide explains in depth what a deepfake actually is, how it is built, the many forms it takes, and above all how to detect one using reliable methods.
Deepfake definition: what exactly is it?
A deepfake (a blend of deep learning and fake) is synthetic content — an image, a video or an audio clip — generated or manipulated by deep-learning algorithms to make a person appear to say or do something they never said or did. What sets a deepfake apart is not the mere act of faking (photo manipulation predates digital photography), but the fact that it is produced automatically by a neural network trained on large volumes of data.
If you want the most accurate deepfake definition: it is a piece of media realistic enough to fool a human observer, whose fabrication relies on AI models capable of learning the appearance, voice or expressions of a target. This definition matters because it covers both a face swapped into a video and a cloned voice used during a fraudulent phone call.
Synthetic media: the broader term
In professional and academic contexts, deepfakes are increasingly grouped under the umbrella term synthetic media — any content created or substantially altered by generative AI. Understanding this broader category is useful, because regulation and platform policies tend to address synthetic media as a whole rather than deepfakes alone.
A brief history of deepfakes
The history of the deepfake is short but dense. The word itself surfaced in late 2017 on online forums, when a user posted manipulated videos built with publicly available deep-learning libraries. Initially confined to fringe and often malicious uses, the technology improved fast.
Several broad phases stand out:
- 2014-2017: the foundations. The invention of generative adversarial networks (GANs) in 2014 laid the theoretical groundwork. Realistic image synthesis became possible in the lab.
- 2017-2020: democratization. Consumer apps let anyone swap faces in a video. Quality was still flawed, but concern grew.
- 2021-2023: the quality leap. Diffusion models (behind several well-known image generators) transformed visual creation. The line between real and fake blurred.
- 2024 and beyond: industrial maturity. High-quality video generation, voice cloning from seconds of audio, real-time avatars usable in video calls. The deepfake became an operational fraud tool.
How does a deepfake work? The key technologies
To detect a deepfake well, you need to understand how it is built. Three major families of techniques dominate today.
Generative adversarial networks (GANs)
A GAN pits two neural networks against each other: a generator that produces fake images, and a discriminator that tries to tell real from fake. Through training, the generator becomes so good that the discriminator can no longer tell the difference. This adversarial logic explains the rising realism of synthetic faces. GANs were long the go-to tool for generating faces of people who do not exist.
Autoencoders (face-swap)
For face-swapping in video, autoencoders are common. The idea: one network learns to compress and reconstruct person A's face, another does the same for person B, while sharing part of the architecture. By crossing the components, you can reconstruct A's face with B's expressions. This is the historical technique behind the first deepfake videos.
Diffusion models
More recent, diffusion models start from random noise that they progressively "denoise" until a coherent image forms, guided by a description or a reference image. They now produce the most realistic AI-generated images and power a new generation of video tools. If you want to go deeper into synthetic image creation, our guide on how to detect an AI-generated image covers the artifacts these models leave behind.
Types of deepfakes
The word deepfake covers very different realities depending on the target medium and the goal. Here are the main categories.
| Type | Medium | Principle | Main risk |
|---|---|---|---|
| Face-swap | Video/image | Replacing one face with another | Disinformation, non-consensual content |
| Lip-sync (reenactment) | Video | Syncing lips to new audio | Fake speeches, political manipulation |
| Voice cloning | Audio | Reproducing a target's voice | CEO fraud, family scams |
| Full-body / avatar | Video | Generating the full body and motion | Fake testimony, fraudulent video calls |
| Pure synthesis | Image | Creating a non-existent face | Fake profiles, social engineering |
Face-swap and facial reenactment
Face-swap replaces one person's face with another's in an existing video. Facial reenactment goes further: it animates a target face using a source actor's movements, including the lips. This is the technique that makes a public figure appear to "deliver" a speech they never gave.
Voice cloning and audio deepfakes
Voice cloning deserves special attention, because a few seconds of recording can sometimes be enough to reproduce a voice convincingly. These audio deepfakes sit at the heart of many fraud schemes. We dedicate a full guide to them: cloned voice and audio deepfakes, how to detect them.
Why are deepfakes dangerous?
The risks are not theoretical. They are already materializing across several domains.
- Financial fraud. Companies have wired large sums after a call or video conference with a fake executive. We cover this in detail in our article on deepfake video-conference fraud.
- Disinformation and political manipulation. A fake video of a public official can spread before any rebuttal.
- Reputation harm and non-consensual content. Individuals and public figures alike face defamatory or intimate fabrications.
- Social engineering. Fake profiles, fake recruiters and fake ID documents fuel online scams.
For an overview of scams and protective reflexes, see our feature on deepfakes, scams and how to protect yourself.
How to detect a deepfake: the methods
Detection rests on a combination of signals. No single method is foolproof; it is the multi-layer approach that makes the difference.
Detection with the naked eye: the warning signs
Some clues remain perceptible, especially on medium-quality deepfakes:
- Eye blinks that are absent, too rare or irregular.
- Imperfect lip-sync between audio and lip movement.
- Lighting and shadow inconsistencies between the face and the background.
- Blurry, shimmering or "bleeding" face edges during fast movement.
- Problem details: teeth, ears, eye reflections, hands and fingers.
- Temporal artifacts: a face that "jumps" from one frame to the next.
These signals are useful but insufficient: the best deepfakes erase them. For video in particular, rigorous analysis means examining frame by frame, detailed in our guide on how to detect a deepfake video.
Technical detection: forensic analysis
Forensic analysis goes well beyond visual inspection. It cross-references several layers of evidence:
- EXIF metadata: technical information attached to the file (device, date, software).
- C2PA / Content Credentials: cryptographic provenance signatures, when present.
- ELA (Error Level Analysis): highlighting recompressed regions that reveal edits.
- AI vision: models trained to recognize the statistical signatures of generated content.
- Watermark detection: spotting the invisible markers embedded by some generators.
- PRNU: analysis of sensor-specific noise, useful to confirm an image came from a real camera.
This is precisely the approach of TruthLens, which combines these layers into a single report. You can run an analysis directly from the file analysis page.
Audio detection
For audio, you examine spectral artifacts, prosody, breathing and phoneme transitions. Liveness tests and family "passwords" round out the human toolkit. The details are in our dedicated audio guide.
Authenticity and certification: proving the real
Detecting the fake is not always enough: you also need to prove the authentic. This is the other side of the problem. A timestamped, cryptographically signed analysis report (a SHA-256 fingerprint) documents that a piece of content was verified at a given moment. This certification logic sits at the core of digital trust. We explore it in our article on authenticity of content in the AI era.
TruthLens produces this kind of certified PDF report, usable as evidence in professional, journalistic or legal contexts. Where the human eye reaches its limits, multi-layer analysis provides a body of objective evidence.
The role of dedicated tools
Consumer detection tools give a first indication but often lack transparency about their method. A serious forensic tool should: explain its evidence, cross several techniques, return a confidence level rather than a binary verdict, and allow a verifiable record to be kept. For video content, frame-by-frame analysis and cloned-voice detection (as an enhanced option) add decisive layers.
Why a layered verdict beats a binary one
The temptation is to want a single "real or fake" answer. But generators are designed precisely to defeat any single test, so a binary verdict is fragile by construction. A layered verdict instead reports which signals agree and which disagree: metadata may be clean while AI vision flags a high generation score, or a watermark may be present while ELA reveals a localized edit. This nuance is not a weakness — it is what makes the conclusion defensible, especially when the analysis must hold up in a professional or legal setting where overstating certainty is itself a risk.
The legal and regulatory framework for deepfakes
The response to the phenomenon is not only technical: it is also legal. In Europe, the regulation on artificial intelligence imposes transparency obligations on content generated or manipulated by AI, with labeling of deepfakes intended to inform the public. In France, the law already penalizes identity theft, invasion of privacy and the distribution of montages made without the consent of the person depicted. Several recent texts specifically target non-consensual sexual content and the manipulation of another person's image or voice.
For organizations, these developments imply heightened responsibility: being able to prove that content has been verified, keeping a record of that verification, and documenting the chain of provenance. This is where provenance standards such as C2PA and timestamped analysis reports gain their full value, turning an intuition into a body of evidence that can be put forward.
Transparency and content labeling
One of the most promising avenues is to mark content at the source. The invisible watermarks embedded by some generators, along with provenance signatures, in theory allow the synthetic to be distinguished from the authentic without even resorting to after-the-fact detection. In practice, these markings remain unevenly adopted and can be stripped out, which maintains the need for independent forensic detection.
What to do when you suspect a deepfake
Adopt a methodical approach:
- Do not share the content until it is verified.
- Find the original source and cross-check with reliable sources.
- Analyze the file with a multi-layer forensic tool.
- Keep evidence (a timestamped report) if the content concerns you.
- Report to the platforms and, in case of harm, to the relevant authorities.
FAQ
What is the difference between a deepfake and a simple photo edit?
A classic photo edit is done manually with retouching software. A deepfake is generated or manipulated automatically by a neural network trained on data, which gives it a realism and an animation capability (video, voice) beyond traditional editing. Detection also differs: you look for the statistical signatures specific to AI models.
Can a deepfake be detected with certainty?
No method guarantees 100% certainty, because generators keep improving. Reliability comes from the multi-layer approach: combining metadata, error analysis, AI vision, watermark detection and visual signals. A good tool returns a reasoned confidence level rather than a misleading binary verdict.
Is voice cloning really accessible?
Yes. A few seconds of recording are now enough for some models to produce a convincing voice. This is what makes phone fraud particularly dangerous. Countermeasures exist: liveness tests, family passwords, and spectral analysis of suspicious recordings.
How can I check suspicious content myself?
You can submit an image, a video or an audio file for multi-layer forensic analysis. TruthLens lets you launch this analysis from its upload page and receive a detailed report. For sensitive cases, keep the timestamped report as evidence and always cross-check with the original source.