Learning a large set of scenes. We learn a large set of scenes in the latent space of an autoencoder using a two-stage approach. Stage 1 jointly learns the encoder \(E_\phi\) and decoder \(D_\psi\) to optimally compress the training images \(x_{i,j}\), while learning a subset of scenes \(\mathcal{T}_1\). Stage 2 utilizes the components learned in the first stage to learn the rest of the scenes \(\mathcal{T}_2\). We represent each scene with a Tri-Plane \(T_i\) obtained by concatenating along the feature dimension "micro" planes \(T_i^\mathrm{mic}\) integrating scene-specific information and "macro" planes \(T_i^\mathrm{mac}\) encompassing global information. The micro planes \(T_i^\mathrm{mic}\) are independently learned for each scene. The macro planes \(T_i^\mathrm{mac}\) are computed from a set of shared base planes \(\mathcal{B}\) via a summation with weights \(W_i\). \(\mathcal{B}\) are jointly learned with all scenes, and \(W_i\) are learned specifically for each scene. We train a latent Tri-Plane \(T_i\) by matching its rendering \(\tilde{z}_{i,j}\) with the encoded image \(z_{i,j}\) via the reconstructive objective \(\mathcal{L}^\mathrm{(latent)}\). We also align the decoded scene renderings \(\tilde{x}_{i,j}\) with the ground truth RGB image via \(\mathcal{L}^\mathrm{(RGB)}\). \(\mathcal{L}^\mathrm{(ae)}\) is a auto-encoder reconstructive loss used in the first stage only.
Resource Costs. Comparison of resource costs and novel view synthesis (NVS) quality of recent works when naively scaling the inverse graphics problem (\(N = 2000\) scenes). Circle sizes represent the NVS quality of each method. Our method presents similar NVS rendering quality compared to Tri-Planes, our base representation, while demonstrating the lowest training time and memory footprint of all methods.
@article{scaled-ig,
title={{Scaled Inverse Graphics: Efficiently Learning Large Sets of 3D Scenes}},
author={Karim Kassab and Antoine Schnepf and Jean-Yves Franceschi and Laurent Caraffa and Flavian Vasile and Jeremie Mary and Andrew Comport and Valérie Gouet-Brunet},
journal={arXiv preprint arXiv:2410.23742},
year={2024}
}