NVIDIA Instant NeRF Turns 2D Photos into 3D Scenes in the Blink of an AI

When it comes to the elusive goal of creating 3D scenes from a handful of 2D images, it now appears you just need to ensure you don’t lose your NeRF. Actually, Instant NeRF. So writes NVIDIA’s Isha Salian in a just released post.

The NVIDIA Research team has developed an approach to turning a collection of still images into a digital 3D scene almost instantly — making it one of the first models of its kind to combine ultra-fast neural network training and rapid rendering. Known as inverse rendering, the process uses AI to approximate how light behaves in the real world, enabling researchers to reconstruct a 3D scene from a handful of 2D images taken at different angles.

NVIDIA applied this approach to a popular new technology called neural radiance fields, or NeRF. The result, dubbed Instant NeRF, is the fastest NeRF technique to date, achieving more than 1,000x speedups in some cases. The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds.

According to David Leubke, vice president for graphics research at NVIDIA, “If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene. In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.”

Showcased in a session at NVIDIA GTC this week, Instant NeRF could be used to create avatars or scenes for virtual worlds, to capture video conference participants and their environments in 3D, or to reconstruct scenes for 3D digital maps.

In a tribute to the early days of Polaroid images, NVIDIA Research recreated an iconic photo of Andy Warhol taking an instant photo, turning it into a 3D scene using Instant NeRF.

Take a look!

[embedded content]

What Is a NeRF?

Far from a squishy ball you could safely hurl at grandma without getting into trouble (other than having to then listen to her explain what ailed her) NeRFs use neural networks to represent and render realistic 3D scenes based on an input collection of 2D images.

As Salian writes, collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from every angle — the neural network requires a few dozen images taken from multiple positions around the scene, as well as the camera position of each of those shots.

In a scene that includes people or other moving elements, the quicker these shots are captured, the better. If there’s too much motion during the 2D image capture process, the AI-generated 3D scene will be blurry.

From there, a NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space. The technique can even work around occlusions — when objects seen in some images are blocked by obstructions such as pillars in other images.

Instant NeRF Accelerates by 1,000x

While estimating the depth and appearance of an object based on a partial view is a natural skill for humans, it’s a demanding task for AI.

Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization. Bringing AI into the picture speeds things up. Early NeRF models rendered crisp scenes without artifacts in a few minutes, but still took hours to train.

Instant NeRF, however, cuts rendering time by several orders of magnitude. It relies on a technique developed by NVIDIA called multi-resolution hash grid encoding, which is optimized to run efficiently on NVIDIA GPUs. Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly.

The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them. It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on.

Read Salian’s complete post here to learn more about Instant NeRF.

Dan Sarto's picture

Dan Sarto is Publisher and Editor-in-Chief of Animation World Network.