OSFFD: Open Set Face Forgery Detection via Dual-Level Evidence Collection

DLED estimates prediction uncertainty by collecting category-specific evidence at both the spatial and frequency levels, enabling reliable detection of forgeries from previously unseen fake categories — with a 20%+ margin over strong baselines.

Abstract

The surge in face forgeries has increasingly undermined confidence in the authenticity of online content. As generation algorithms rapidly evolve, new fake categories will constantly emerge, severely challenging existing face forgery detection methods. Although face forgery detection has recently improved, current techniques remain largely confined to binary Real-vs-Fake classification or the recognition of known fake categories. Moreover, they fail to identify the emergence of entirely new forgery methods.

In this work, we study the Open Set Face Forgery Detection (OSFFD) problem, which requires the detection model to identify novel fake categories. To enhance its real-world applicability, we reformulate the OSFFD problem and address it through uncertainty estimation. Specifically, we propose the Dual-Level Evidential face forgery Detection (DLED) approach, which estimates prediction uncertainty by extracting and integrating category-specific evidence on the spatial and frequency levels.

Comprehensive experiments across diverse settings demonstrate that our proposed DLED approach achieves state-of-the-art performance. Notably, it surpasses various existing baseline models by a 20% margin on average when identifying forgeries from novel fake categories. Concurrently, our DLED method yields competitive performance on the standard binary Real-versus-Fake face forgery detection task.

Open Set Face Forgery Detection — Setup

OSFFD presents two challenges: (1) reliable detection of unseen fake categories without access to their training data, and (2) robust real-vs-fake detection on known forgeries. We split fake data into seen and unseen categories during evaluation.

Illustration of fake categories in the OSFFD benchmark. At test time, the model must detect samples drawn from categories never seen during training.

Examples of diverse forgeries within seen fake categories. Even within a category, generation algorithms differ substantially in their visual artifacts.

Method — Dual-Level Evidential Detection (DLED)

DLED extracts category-specific evidence at two complementary levels — spatial (texture, boundary, semantic cues) and frequency (high-pass artifacts left by generative pipelines) — and integrates them into a single evidential posterior that yields well-calibrated uncertainty for open-set rejection.

1. Spatial Evidence

A spatial branch produces per-category evidence over the RGB feature space, capturing texture-level forgery cues.

2. Frequency Evidence

A frequency-domain branch exposes high-frequency artifacts of generative pipelines that are often invisible in pixel space.

3. Evidence Integration

A Dirichlet-based fusion combines both evidence streams, providing a calibrated uncertainty score for open-set rejection.

Attention Visualization

Attention maps from DLED on the novel fake categories FS (a) and FE (b). DLED attends to category-specific semantic cues: for FS it highlights edge regions indicative of face transplantation, while for FE it focuses on manipulated areas such as sunglasses, hairbands, and hair.

Evidence Visualization

Distribution of category-specific evidence produced by DLED. We compare evidence collected under two settings: the challenging open-set face forgery detection task with novel fake categories, and the standard binary real-vs-fake detection task.

Evidence for the seen fake categories FR and EFS is condensed in their corresponding corners with low uncertainty, while evidence for the novel fake categories FS and FE is sparse and exhibits higher uncertainty.

Evidence visualization on binary real-vs-fake

Evidence distribution of novel real and fake faces, with the corresponding DLED predictions. The model assigns high confidence to novel real faces, while novel fake faces yield low confidence and high uncertainty.

Quantitative Results

Open Set Detection on Novel Fake Categories

Comparisons against diverse baseline methods on the OSFFD problem. For FS, FR, and EFS, each fake category is held out as the unseen one while the remaining two serve as seen categories. For FE & SM, we use FS, FR, and EFS as seen categories and let FE and SM be the unseen categories. The best results are in bold.

Method	FS		FR		EFS		FE & SM		Avg
Method	Acc ↑	DR ↑	Acc ↑	DR ↑	Acc ↑	DR ↑	Acc ↑	DR ↑	Acc ↑	DR ↑
Two-stage
OC-FakeDect	58.16	14.68	60.69	11.43	56.14	9.01	56.74	11.67	57.93	11.70
SBI	65.15	1.07	64.19	3.00	61.24	0.91	62.27	0.66	63.21	1.41
CNN-based + OSR
Xception	64.60	23.90	53.51	29.06	57.62	22.70	55.28	29.04	57.75	26.17
SPSL	65.07	16.71	54.10	18.93	59.67	18.12	60.02	25.98	59.71	19.93
SIA	62.09	13.59	54.62	13.36	56.85	10.99	56.29	22.53	57.46	15.12
UCF	65.08	0.30	50.98	0.20	52.95	1.28	52.69	1.80	55.42	0.89
NPR	75.37	17.37	64.63	6.75	70.43	4.36	71.45	29.20	70.47	14.42
CLIP-based + OSR
CLIP Closed Set Finetuning	67.24	–	65.19	–	64.53	–	66.24	–	65.80	–
CLIP Zero-Shot	52.30	0.81	50.36	0.26	46.01	0.38	47.62	0.25	49.07	0.43
UnivFD	68.81	3.88	64.00	2.48	63.21	0.73	66.34	8.22	65.59	3.83
CLIPing	66.44	14.38	62.41	6.09	61.29	4.92	66.26	19.27	64.10	11.16
D³	70.46	8.14	64.71	8.90	61.65	1.17	66.33	8.26	65.79	6.62
Ours (DLED)	71.37	33.61	66.83	34.92	75.52	34.71	74.48	82.18	72.05	46.35

BibTeX

@article{cai2025osffd,
  title={Open Set Face Forgery Detection via Dual-Level Evidence Collection},
  author={Cai, Zhongyi and Gernon, Bryce and Bao, Wentao and Li, Yifan and Wright, Matthew and Kong, Yu},
  journal={arXiv preprint arXiv:2512.04331},
  year={2025}
}