MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data

Hunor Laczko, Libang Jia, Loc-Phat Truong, Diego Hernandez, Sergio Escalera, Jordi Gonzalez, Meysam Madadi CVPR 2026

arXiv GitHub Dataset

Subjects

754

Garments

Cameras

3,273

Sequences

72.5M

Frames

Yes

Paired VTON Data

Abstract

Existing 4D human datasets fall short for fashion-specific research, lacking either realistic garment dynamics or task-specific annotations. Synthetic datasets suffer from a realism gap, whereas real-world captures lack the detailed annotations and paired data required for virtual try-on (VTON) and size estimation tasks. To bridge this gap, we introduce MV-Fashion, a large-scale multi-view video dataset engineered for domain-specific fashion analysis. MV-Fashion features 3,273 sequences (72.5 million frames) from 80 diverse subjects wearing 3-10 outfits each. It is designed to capture complex, real-world garment dynamics, including multiple layers and varied styling (e.g. rolled sleeves, tucked shirt). A core contribution is a rich data representation that includes pixel-level semantic annotations, ground-truth material properties like elasticity, and 3D point clouds. Crucially for VTON applications, MV-Fashion provides paired data: multi-view synchronized captures of worn garments alongside their corresponding flat, catalogue images. We leverage this dataset to establish baselines for fashion-centric tasks, including virtual try-on, clothing size estimation, and novel view synthesis.

BibTeX

@misc{laczko2026mvfashion,
      title={MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data},
      author={Hunor Laczko and Libang Jia and Loc-Phat Truong and Diego Hernandez and Sergio Escalera and Jordi Gonzalez and Meysam Madadi},
      year={2026},
      eprint={2603.08147},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.08147}
}