HiReFF

⭐ Feed-forward 2K-resolution 360° human video reconstruction from only four uncalibrated 90°-spaced views.
⭐ Streaming 3D Gaussian reconstruction at 3.01 FPS on a single RTX 4090 GPU.
⭐ High-resolution Side-tuning achieves 2K rendering with only 34% additional VRAM during training.

Abstract

Uncalibrated volumetric video streaming for human reconstruction is essential for holographic communication and AR/VR, yet remains challenging due to the need for temporal consistency and computational efficiency from sparse-view inputs. Existing methods rely on per-scene optimization or calibrated cameras, while recent feed-forward models are limited to low-resolution (0.5K) single-frame synthesis. We present HiReFF, a feed-forward method for 2K-resolution 360° human video reconstruction from uncalibrated sparse-view videos. Our framework decomposes the problem into two key tasks: foreground 3D Gaussian reconstruction from sparse-view videos (four views separated by 90°) and computationally efficient high-resolution synthesis. To enable the former, we propose Scale-synchronized Camera Calibration to resolve scale ambiguity for multi-view supervision, and Gaussian-wise Foreground Masking to reconstruct clean foregrounds by modulating Gaussian parameters. For efficient high-resolution synthesis, our High-resolution Side-tuning achieves 2K rendering by augmenting the Gaussian head with supplementary features while keeping the backbone at 0.5K, drastically reducing computational overhead. Experiments demonstrate that HiReFF significantly outperforms existing methods in high-resolution streaming volumetric video reconstruction.

Method

Overview of HiReFF. Taking four-view uncalibrated videos as input, we first extract features using an Alternating-Attention (AA) Transformer, then decode to obtain Gaussian parameters, supervising through rendered multi-view images. Specifically, HiReFF employs Scale-synchronized Camera Calibration to introduce supervision from additional viewpoints while indirectly supervising the Camera Head, uses Gaussian-wise Masking to remove background while preserving camera parameter accuracy, and leverages High-resolution Side-tuning to ensure computational efficiency at high resolution.

Results

Visualization Results

Qualitative comparison of HiReFF on high-resolution novel view synthesis from uncalibrated sparse-view videos.

Temporal Consistency

HiReFF maintains strong temporal consistency across streaming frames, producing stable reconstructions.

BibTeX


@inproceedings{jiang2026hireff,
  title     = {HiReFF: High-Resolution Feedforward Human Reconstruction from Uncalibrated Sparse-View Video},
  author    = {Yiming Jiang and Hanzhang Tu and Wenfeng Song and Siyou Lin and Liang An and Shuai Li and Aimin Hao and Yebin Liu},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026},
}