MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data

Hunor Laczko, Libang Jia, Loc-Phat Truong, Diego Hernandez, Sergio Escalera, Jordi Gonzalez, Meysam Madadi CVPR 2026
MV-Fashion sample left
MV-Fashion sample right
80
Subjects
754
Garments
68
Cameras
3,273
Sequences
72.5M
Frames
Yes
Paired VTON Data

Abstract

Existing 4D human datasets fall short for fashion-specific research, lacking either realistic garment dynamics or task-specific annotations. Synthetic datasets suffer from a realism gap, whereas real-world captures lack the detailed annotations and paired data required for virtual try-on (VTON) and size estimation tasks. To bridge this gap, we introduce MV-Fashion, a large-scale multi-view video dataset engineered for domain-specific fashion analysis. MV-Fashion features 3,273 sequences (72.5 million frames) from 80 diverse subjects wearing 3-10 outfits each. It is designed to capture complex, real-world garment dynamics, including multiple layers and varied styling (e.g. rolled sleeves, tucked shirt). A core contribution is a rich data representation that includes pixel-level semantic annotations, ground-truth material properties like elasticity, and 3D point clouds. Crucially for VTON applications, MV-Fashion provides paired data: multi-view synchronized captures of worn garments alongside their corresponding flat, catalogue images. We leverage this dataset to establish baselines for fashion-centric tasks, including virtual try-on, clothing size estimation, and novel view synthesis.

BibTeX

@misc{laczko2026mvfashion,
      title={MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data},
      author={Hunor Laczko and Libang Jia and Loc-Phat Truong and Diego Hernandez and Sergio Escalera and Jordi Gonzalez and Meysam Madadi},
      year={2026},
      eprint={2603.08147},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.08147}
}