Abstract

We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce shading, which includes cast shadows. We first train a shading estimation module to generate a dataset of real-world images and shading pairs. Then, we train a control network using the estimated shading and normals as input. Our method demonstrates high-quality image generation and lighting control in numerous scenes. Additionally, we use our generated dataset to train an identity-preserving relighting model, conditioned on an image and a target shading. Our method is the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

Lighting-controllable Image Generation

Method

Dataset Generation

We generate a dataset using the Outdoor Laval dataset. We randomly crop images from the panoramas and automatically predict normal, shading, and caption. For our relighting experiments, we extend the dataset with relit images using OutCast.

Shading Estimation

We develop a shading estimation module to obtain direct shading from a single image. (i) We predict image features (FeatureNet) and unproject them to a 3D feature grid in NDC space. (ii) We predict a density field from the features (DensityNet). (iii) Given the sun's direction and solid angle, we trace rays toward the lightsource to obtain a coarse shadow map. (iv) Using the shadows and N-dot-L shading information, we predict a coarse shading map (ShadingNet). (v) We refine the shading map to get our direct shading (RefinementNet).

Model Overview

To generate lighting-controlled images, we train a light control module similar to Controlnet, conditioned on normal and shading estimation. We use a custom Residual Control Encoder to encode the control signal. Adding a Residual Control Decoder with a reconstruction loss ensures the full control signal is present in the encoded signal and helps identity preservation as it can be seen below.

Citation

@inproceedings{kocsis2024lightit, author = {Peter Kocsis and Julien Philip and Kalyan Sunkavalli and Matthias Nie{\ss}ner and Yannick Hold-Geoffroy}, title = {LightIt: Illumination Modeling and Control for Diffusion Models}, booktitle = {CVPR}, year={2024}}

LightIt: Illumination Modeling and Control for Diffusion Models

CVPR 2024

Peter Kocsis

TU Munich

Julien Philip

Adobe Research

Kalyan Sunkavalli

Adobe Research

Matthias Nießner

TU Munich

Yannick
Hold-Geoffroy

Adobe Research

Abstract

Lighting-controllable Image Generation

Method

Dataset Generation

Shading Estimation

Model Overview

Citation