Neural Radiance Fields (NeRF) are one of those ideas that sound abstract until you actually build one. Then suddenly, everything clicks.

In this article, we’ll train a NeRF end-to-end, starting from raw images and camera poses, all the way to rendering RGB images and depth maps. The goal here is to use a minimal dataset to understand how NeRF works in practice.

What We’re Building

By the end of this walkthrough, we will:

All with a compact setup that runs locally.


Step 1: Installing the Toolkit

To avoid reimplementing NeRF from scratch, we’ll use deepvision-toolkit, which provides datasets, models, and rendering utilities.

!pip install deepvision-toolkit

This gives us:

Step 2: Downloading the Tiny NeRF Dataset

We’ll use the Tiny NeRF dataset released by the original NeRF authors. It contains:

import requests
import numpy as np
import matplotlib.pyplot as plt

url = "https://people.eecs.berkeley.edu/~bmild/nerf/tiny_nerf_data.npz"
save_path = 'tiny_nerf.npz'

file_data = requests.get(url).content
with open(save_path, "wb") as file:
    file.write(file_data)

data = np.load(save_path)

images, poses, focal = data["images"], data["poses"], data["focal"]

print(images.shape)  # (106, 100, 100, 3)
print(poses.shape)   # (106, 4, 4)
print(focal)         # ~138.89

Let’s visualize a few images:

fig, ax = plt.subplots(1, 5, figsize=(20, 12))
for i in range(5):
    ax[i].imshow(images[i])
    ax[i].axis("off")

nerf

At this stage, we only have 2D images—no meshes, no depth, no 3D information.

Step 3: Rays, Not Images

NeRF does not learn from images directly. Instead:

We configure how this sampling works:

from deepvision.datasets import load_tiny_nerf
import tensorflow as tf

config = {
    'img_height': 100,
    'img_width': 100,
    'pos_embed': 32,
    'num_ray_samples': 64,
    'batch_size': 1
}

Why these parameters matter:

Step 4: Preparing the Dataset

We now load the dataset in a form suitable for NeRF training.

train_ds, valid_ds = load_tiny_nerf(
    pos_embed=config['pos_embed'],
    num_ray_samples=config['num_ray_samples'],
    save_path='tiny_nerf.npz',
    validation_split=0.2,
    backend='tensorflow'
)

train_ds = train_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)
valid_ds = valid_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)

Each batch contains:

This is where NeRF diverges from traditional vision pipelines.

Step 5: Building the NeRF Model

The NeRF model itself is just a fully connected neural network.

No CNNs. No attention.

import deepvision

num_pos = (
    config['img_height'] *
    config['img_width'] *
    config['num_ray_samples']
)

input_features = 6 * config['pos_embed'] + 3

model = deepvision.models.NeRFMedium(
    input_shape=(num_pos, input_features),
    backend='tensorflow'
)

We compile it using a simple MSE loss:

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.MeanSquaredError()
)

model.summary()

Despite its simplicity, this network will learn a continuous 3D scene representation.

Step 6: Training the NeRF

Training NeRF feels slow—and that’s normal.

callbacks = [tf.keras.callbacks.ReduceLROnPlateau()]

history = model.fit(
    train_ds,
    epochs=50,
    validation_data=valid_ds,
    callbacks=callbacks
)

What you’ll observe:

NeRF is literally learning how matter occupies space.

Step 7: Rendering RGB Images and Depth Maps

This is the most satisfying part.

We take trained NeRF predictions and render an image from rays:

from deepvision.models.volumetric.volumetric_utils import (
    nerf_render_image_and_depth_tf
)

for batch in train_ds.take(5):
    (images, rays) = batch
    (rays_flat, t_vals) = rays

    image_batch, depth_maps, _ = nerf_render_image_and_depth_tf(
        model=model,
        rays_flat=rays_flat,
        t_vals=t_vals,
        img_height=config['img_height'],
        img_width=config['img_width'],
        num_ray_samples=config['num_ray_samples']
    )

    fig, ax = plt.subplots(1, 2, figsize=(10, 4))
    ax[0].imshow(tf.squeeze(image_batch[0]))
    ax[0].set_title("Rendered RGB")
    ax[1].imshow(tf.squeeze(depth_maps[0]))
    ax[1].set_title("Depth Map")
    plt.show()

The depth map is the real proof:

The network has learned actual 3D structure, not just textures.

What You Learn by Building NeRF Once

After doing this end-to-end, a few things become obvious:

This understanding transfers directly to:


Final Thoughts

NeRF looks intimidating from the outside, but once you build it, the math becomes intuitive, the architecture feels elegant, and the results feel almost magical.

If you work in computer vision, graphics, or multimodal AI, NeRF isn’t optional knowledge anymore. And the fastest way to understand it is exactly this:

  1. Load real data.
  2. Cast real rays.
  3. Render a real scene.

That’s how NeRF stops being hype, and starts being engineering.