Abstract

We introduce Intrinsic Image Fusion, a method that reconstructs high-quality physically based materials from multi-view images. Material reconstruction is highly underconstrained and typically relies on analysis-by-synthesis, which requires expensive and noisy path tracing. To better constrain the optimization, we incorporate single-view priors into the reconstruction process. We leverage a diffusion-based material estimator that produces multiple, but often inconsistent, candidate decompositions per view. To reduce the inconsistency, we fit an explicit low-dimensional parametric function to the predictions. We then propose a robust optimization framework using soft per-view prediction selection together with confidence-based soft multi-view inlier set to fuse the most consistent predictions of the most confident views into a consistent parametric material space. Finally, we use inverse path tracing to optimize for the low-dimensional parameters. Our results outperform state-of-the-art methods in material disentanglement on both synthetic and real scenes, producing sharp and clean reconstructions suitable for high-quality relighting.

Motivation

Appearance decomposition is a highly under-constrained task. Optimization-based methods often bake shading details and rendering noise into the recovered materials. In contrast, monocular material priors yield clean, detailed maps, but they lack multi-view consistency and can be physically incorrect. Naively aggregating these predictions leads to texture seams and artifacts due to their ambiguity. We propose to model the monocular prediction space with a parametric texture and aggregate only base textures. Finally, we apply inverse path tracing to optimize the low-dimensional texture parameters, producing physically grounded estimates.

RGB

Optimization

Monocular Prior

Naive Fusion

Our Fusion

Our Fusion + Optimization

Comparisons

We compare our method against recent inverse rendering methods. FIPT and NeILF++ are purely optimization-based methods. Due to the under-constrained setting and the noisy estimation of the global illumination, these methods often bake shading details into the recovered materials, especially the albedo. IRIS uses per-object single color proxy albedo estimation, better constraining the optimization. For more information, visit IRIS. However, this proxy is too coarse to completely rely on it, and residual shading still leaks into the decomposition. In contrast, our method preserves sharp texture details and produces physically grounded material decompositions. We present real-world comparisons on the ScanNet++ dataset. Imperfect geometry is a common challenge for appearance decomposition, often causing projection errors near object boundaries and missing regions. Our method optimizes on a small amount of parameters with inverse path tracing, making it more robust to these geometric imperfections.

Albedo

Metallic

Roughness

Rendering

Method:

Synthetic Scenes:

Albedo

Metallic

Roughness

Rendering

Method:

Real Scenes:

Relighting

Our decomposition can be used in standard rendering engines for photorealistic relighting. We move an emissive sphere inside scenes to show the capabilities of our method.

Synthetic Scenes:

Real Scenes:

Citation


          @article{kocsis2025iif,
              author = {Kocsis, Peter and H\"{o}llein, Lukas and Nie\{ss}ner, Matthias},
              title = {Intrinsic Image Fusion for Multi-View 3D Material Reconstruction},
              journal = {ArXiv},
              year = {2025}}

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction

CVPR 2026

Peter Kocsis

TU Munich

Lukas Höllein

TU Munich

Matthias Nießner

TU Munich

Abstract

Motivation

Comparisons

Relighting

Citation