Real-Time Puppeteering Defense
First enrollment-free defense against identity puppeteering in AI-based videoconferencing. 97.7% AUC at 75 FPS. Published at NeurIPS 2025 in collaboration with NVIDIA Research.
Overview
AI-based videoconferencing systems reduce bandwidth by transmitting a compact pose-expression latent and re-synthesizing video at the receiver. This architecture is vulnerable to puppeteering attacks, where an adversary hijacks a victim’s likeness in real time by injecting a malicious latent stream.
Because every frame is synthetic, standard deepfake detectors fail outright — the video is always fake by design. We needed a fundamentally different approach.
Key Insight
The pose-expression latent inherently leaks biometric information about the driving identity — the person actually controlling the face. We exploit this leakage to detect identity mismatches without ever looking at the reconstructed RGB video.
Approach
We introduce a pose-conditioned large-margin contrastive loss (PC-LMCL) that trains an encoder to isolate persistent identity cues inside the transmitted latent while cancelling transient pose and expression variation. A simple cosine similarity test on this disentangled embedding flags illicit identity swaps in real time.
Results
- 97.7% AUC at 75 FPS — fast enough for real-time deployment
- 46% error reduction over prior state-of-the-art
- Strong generalization to unseen systems (AUC 0.925 cross-domain)
- No enrollment data required
Impact
- Published at NeurIPS 2025 (main conference)
- Collaboration with NVIDIA Research (Ekta Prashnani, Koki Nagano, Orazio Gallo)
- Secured $150K NVIDIA research gift and 1 year of dedicated compute