HiReFF: High-Resolution Feedforward Human Reconstruction from Uncalibrated Sparse-View Video

1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University   2Tsinghua University   3Beijing Information Science and Technology University   4Zhongguancun Laboratory
☆ Work done during an internship at Tsinghua University    ✉ Corresponding Author
ECCV 2026

⭐   Feed-forward 2K-resolution 360° human video reconstruction from only four uncalibrated 90°-spaced views.
⭐   Streaming 3D Gaussian reconstruction at 3.01 FPS on a single RTX 4090 GPU.
⭐   High-resolution Side-tuning achieves 2K rendering with only 34% additional VRAM during training.


Abstract

Teaser

Uncalibrated volumetric video streaming for human reconstruction is essential for holographic communication and AR/VR, yet remains challenging due to the need for temporal consistency and computational efficiency from sparse-view inputs. Existing methods rely on per-scene optimization or calibrated cameras, while recent feed-forward models are limited to low-resolution (0.5K) single-frame synthesis. We present HiReFF, a feed-forward method for 2K-resolution 360° human video reconstruction from uncalibrated sparse-view videos. Our framework decomposes the problem into two key tasks: foreground 3D Gaussian reconstruction from sparse-view videos (four views separated by 90°) and computationally efficient high-resolution synthesis. To enable the former, we propose Scale-synchronized Camera Calibration to resolve scale ambiguity for multi-view supervision, and Gaussian-wise Foreground Masking to reconstruct clean foregrounds by modulating Gaussian parameters. For efficient high-resolution synthesis, our High-resolution Side-tuning achieves 2K rendering by augmenting the Gaussian head with supplementary features while keeping the backbone at 0.5K, drastically reducing computational overhead. Experiments demonstrate that HiReFF significantly outperforms existing methods in high-resolution streaming volumetric video reconstruction.

Method

Pipeline Overview

Overview of HiReFF. Taking four-view uncalibrated videos as input, we first extract features using an Alternating-Attention (AA) Transformer, then decode to obtain Gaussian parameters, supervising through rendered multi-view images. Specifically, HiReFF employs Scale-synchronized Camera Calibration to introduce supervision from additional viewpoints while indirectly supervising the Camera Head, uses Gaussian-wise Masking to remove background while preserving camera parameter accuracy, and leverages High-resolution Side-tuning to ensure computational efficiency at high resolution.

Results

Visualization Results

Qualitative comparison of HiReFF on high-resolution novel view synthesis from uncalibrated sparse-view videos.

Visualization Results (Fig.4)

Temporal Consistency

HiReFF maintains strong temporal consistency across streaming frames, producing stable reconstructions.

Temporal Consistency (Fig.5)

BibTeX


@inproceedings{jiang2026hireff,
  title     = {HiReFF: High-Resolution Feedforward Human Reconstruction from Uncalibrated Sparse-View Video},
  author    = {Yiming Jiang and Hanzhang Tu and Wenfeng Song and Siyou Lin and Liang An and Shuai Li and Aimin Hao and Yebin Liu},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026},
}