LightIt: Illumination Modeling and Control for Diffusion Models

CVPR 2024

Peter Kocsis
TU Munich
Julien Philip
Adobe Research
Kalyan Sunkavalli
Adobe Research
Matthias Nießner
TU Munich
Yannick
Hold-Geoffroy
Adobe Research

Abstract

We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce shading, which includes cast shadows. We first train a shading estimation module to generate a dataset of real-world images and shading pairs. Then, we train a control network using the estimated shading and normals as input. Our method demonstrates high-quality image generation and lighting control in numerous scenes. Additionally, we use our generated dataset to train an identity-preserving relighting model, conditioned on an image and a target shading. Our method is the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

Lighting-controllable Image Generation

Image Synthesis

Method

Dataset Generation

Dataset pipeline
We generate a dataset using the Outdoor Laval dataset. We randomly crop images from the panoramas and automatically predict normal, shading, and caption. For our relighting experiments, we extend the dataset with relit images using OutCast.

Shading Estimation

Shading estimation
We develop a shading estimation module to obtain direct shading from a single image. (i) We predict image features (FeatureNet) and unproject them to a 3D feature grid in NDC space. (ii) We predict a density field from the features (DensityNet). (iii) Given the sun's direction and solid angle, we trace rays toward the lightsource to obtain a coarse shadow map. (iv) Using the shadows and N-dot-L shading information, we predict a coarse shading map (ShadingNet). (v) We refine the shading map to get our direct shading (RefinementNet).

Model Overview

Model Overview
To generate lighting-controlled images, we train a light control module similar to Controlnet, conditioned on normal and shading estimation. We use a custom Residual Control Encoder to encode the control signal. Adding a Residual Control Decoder with a reconstruction loss ensures the full control signal is present in the encoded signal and helps identity preservation as it can be seen below.
Identity Preservation

Citation


          @inproceedings{kocsis2024lightit,
            author = {Peter Kocsis
                      and Julien Philip 
                      and Kalyan Sunkavalli 
                      and Matthias Nie{\ss}ner 
                      and Yannick Hold-Geoffroy},
            title = {LightIt: Illumination Modeling and Control for Diffusion Models},
            booktitle = {CVPR},
            year={2024}}