Appearance decomposition is a highly under-constrained task. Optimization-based methods often bake shading details and rendering noise into the recovered materials. In contrast, monocular material priors yield clean, detailed maps, but they lack multi-view consistency and can be physically incorrect. Naively aggregating these predictions leads to texture seams and artifacts due to their ambiguity. We propose to model the monocular prediction space with a parametric texture and aggregate only base textures. Finally, we apply inverse path tracing to optimize the low-dimensional texture parameters, producing physically grounded estimates.
We compare our method against recent inverse rendering methods. FIPT and NeILF++ are purely optimization-based methods. Due to the under-constrained setting and the noisy estimation of the global illumination, these methods often bake shading details into the recovered materials, especially the albedo. IRIS uses per-object single color proxy albedo estimation, better constraining the optimization. For more information, visit IRIS. However, this proxy is too coarse to completely rely on it, and residual shading still leaks into the decomposition. In contrast, our method preserves sharp texture details and produces physically grounded material decompositions. We present real-world comparisons on the ScanNet++ dataset. Imperfect geometry is a common challenge for appearance decomposition, often causing projection errors near object boundaries and missing regions. Our method optimizes on a small amount of parameters with inverse path tracing, making it more robust to these geometric imperfections.
Our decomposition can be used in standard rendering engines for photorealistic relighting. We move an emissive sphere inside scenes to show the capabilities of our method.
@article{kocsis2025iif,
author = {Kocsis, Peter and H\"{o}llein, Lukas and Nie\{ss}ner, Matthias},
title = {Intrinsic Image Fusion for Multi-View 3D Material Reconstruction},
journal = {ArXiv},
year = {2025}}