WACV 2026

HyperPose: Hyper-pose Embeddings
for 3D-Aware Generative Models
with Self-Supervised Disentangling of Pose and Scene

Mijeong Kim1 ·  Namgi Kim2 ·  Bohyung Han1,2
Computer Vision Lab. · 1ECE & 2IPAI, Seoul National University, Korea
Label-Free 3D-Aware GAN Hyper-pose Embeddings Pose Disentanglement Soft Contrastive Learning
Paper Code
HyperPose Teaser

Figure 1. HyperPose learns 3D configurations from 2D image collections without camera pose labels, depth maps, or domain-specific 3D models.

01

Abstract

We propose a novel framework for training 3D-aware GANs from 2D image collections, learning both image distribution and 3D geometric configurations without strong 3D priorsno camera poses, no depth maps, no target-specific 3D models. We introduce hyper-pose embeddings and a pose disentanglement technique that cleanly separates pose from scene information, resolving the inherent conflict between photo-realism and accurate 3D geometry. We further propose soft contrastive learning for the continuous pose space, and a non-match loss that strengthens disentanglement. Experiments on LSUN Bedroom, Church, AFHQ, and CUB demonstrate state-of-the-art performance, particularly for scenes with complex or diverse geometric structures.
02

Method

HyperPose Method

Figure 2. Overview of the HyperPose framework. All components are jointly optimised end-to-end.

1
Hyper-pose Embedding & Pose Disentanglement Core Idea
Instead of regressing a 2D (yaw, pitch) vector, the discriminator outputs a high-dimensional embedding v ∈ ℝm. An MLP g(·) extracts a scene embedding vscene = g(w) from the generator's latent w, and the pure pose signal is recovered as vpose = v − vscene — preventing pose and scene from becoming entangled.
2
Soft Contrastive Loss LSCL Training
Camera poses live on a continuous manifold; hard binary labels cause unstable training when many similar-pose pairs exist. We define a smooth positive mask via S(ξ₁,ξ₂) = exp(−d²/2σ²), assigning graded similarity weights so that near-identical poses are treated as soft positives, not negatives.
3
Non-match Loss Lnon-match Regulariser
Synthetic hard negatives are built by mis-pairing pose and scene embeddings from different images: pose = vi − vscenej≠i. Adding these to the contrastive denominator tightens disentanglement and yields more discriminative pose representations.
03

Experimental Results


Generated Samples

Multi-view videos generated by HyperPose across four datasets.

Figure 3. HyperPose generates high-fidelity, geometrically consistent multi-view videos — without any pose supervision.


Quantitative Results

Evaluated on four challenging benchmarks. Metrics: FID ↓, Recall/Precision ↑, NFS ↑ (3D geometry), Depth FID ↓ (Bedroom only). .

LSUN Bedroom — 128²
MethodDepth FID ↓FID ↓Recall ↑NFS ↑
GRAF97.470.70.0019.4
π-GAN124.156.30.119.7
GIRAFFE145.642.80.0216.9
GIRAFFE-HD27.70.13
HyperPose49.512.50.2328.2
LSUN Church — 128²
MethodFID ↓Recall ↑Precision ↑NFS ↑
GRAF91.10.000.539.3
π-GAN56.80.180.4924.4
GIRAFFE38.40.020.5113.5
GIRAFFE-HD10.3
HyperPose5.80.370.6029.9
Unified AFHQ — 256²
MethodFID ↓Recall ↑Precision ↑NFS ↑
GRAF107.00.000.358.5
π-GAN48.40.120.4121.4
GIRAFFE31.30.040.5114.2
GIRAFFE-HD14.20.100.55
StyleNeRF14.0
HyperPose7.50.300.5319.2
CUB — Large Pose Variation
MethodFID ↓Recall ↑Precision ↑NFS ↑
GRAF46.30.090.6721.3
π-GAN48.80.100.6422.1
GIRAFFE49.30.040.6830.6
GIRAFFE-HD24.30.170.67
HyperPose10.80.390.6244.5

HyperPose outperforms all baselines across every dataset and metric. The CUB gain (FID 10.8 vs. 24.3) highlights the strength of continuous pose modeling under large geometric variation.


Ablation Study

LSUN Bedroom 128². Each component is validated in isolation.

Component Contributions

Pose Disentangle.Lnon-matchFID ↓Precision ↑NFS ↑
13.40.5128.4
12.60.5428.0
12.50.5628.2

Pose disentanglement improves FID and Precision; Lnon-match adds further gain.

vs. Pose Regression Baseline

MethodFID ↓Recall ↑Depth FID ↓
w/ Lregression12.80.21138.4
HyperPose10.80.2349.5

Our contrastive approach dramatically improves 3D geometry (Depth FID 49.5 vs. 138.4).

04

BibTeX

@inproceedings{kim2026hyperpose, author = {Kim, Mijeong and Kim, Namgi and Han, Bohyung}, title = {HyperPose: Hyper-pose Embeddings for 3D-Aware Generative Models with Self-Supervised Disentangling of Pose and Scene}, booktitle = {WACV}, year = {2026} }