Neural Radiance Fields (NeRF) are one of those ideas that sound abstract until you actually build one. Then suddenly, everything clicks.

In this article, we’ll train a NeRF end-to-end, starting from raw images and camera poses, all the way to rendering RGB images and depth maps. The goal here is to use a minimal dataset to understand how NeRF works in practice.

What We’re Building

By the end of this walkthrough, we will:

Load a real NeRF dataset
Convert pixels into camera rays
Train a NeRF model
Render both novel views and depth maps

All with a compact setup that runs locally.

Step 1: Installing the Toolkit

To avoid reimplementing NeRF from scratch, we’ll use deepvision-toolkit, which provides datasets, models, and rendering utilities.

!pip install deepvision-toolkit

This gives us:

A ready-to-use NeRF model
Tiny NeRF dataset loader
Volume rendering utilities

Step 2: Downloading the Tiny NeRF Dataset

We’ll use the Tiny NeRF dataset released by the original NeRF authors. It contains:

106 RGB images (100×100)
Camera poses (4×4 matrices)
A single focal length

import requests
import numpy as np
import matplotlib.pyplot as plt

url = "https://people.eecs.berkeley.edu/~bmild/nerf/tiny_nerf_data.npz"
save_path = 'tiny_nerf.npz'

file_data = requests.get(url).content
with open(save_path, "wb") as file:
    file.write(file_data)

data = np.load(save_path)

images, poses, focal = data["images"], data["poses"], data["focal"]

print(images.shape)  # (106, 100, 100, 3)
print(poses.shape)   # (106, 4, 4)
print(focal)         # ~138.89

Let’s visualize a few images:

fig, ax = plt.subplots(1, 5, figsize=(20, 12))
for i in range(5):
    ax[i].imshow(images[i])
    ax[i].axis("off")

nerf

At this stage, we only have 2D images—no meshes, no depth, no 3D information.

Step 3: Rays, Not Images

NeRF does not learn from images directly. Instead:

Each pixel becomes a ray
Each ray is sampled at multiple 3D points
Each point is passed through the NeRF network

We configure how this sampling works:

from deepvision.datasets import load_tiny_nerf
import tensorflow as tf

config = {
    'img_height': 100,
    'img_width': 100,
    'pos_embed': 32,
    'num_ray_samples': 64,
    'batch_size': 1
}

Why these parameters matter:

pos_embed controls positional encoding frequency
num_ray_samples controls rendering quality vs speed

Step 4: Preparing the Dataset

We now load the dataset in a form suitable for NeRF training.

train_ds, valid_ds = load_tiny_nerf(
    pos_embed=config['pos_embed'],
    num_ray_samples=config['num_ray_samples'],
    save_path='tiny_nerf.npz',
    validation_split=0.2,
    backend='tensorflow'
)

train_ds = train_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)
valid_ds = valid_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)

Each batch contains:

Ground truth pixel colors
Flattened rays
Sample depths along each ray

This is where NeRF diverges from traditional vision pipelines.

Step 5: Building the NeRF Model

The NeRF model itself is just a fully connected neural network.

No CNNs. No attention.

import deepvision

num_pos = (
    config['img_height'] *
    config['img_width'] *
    config['num_ray_samples']
)

input_features = 6 * config['pos_embed'] + 3

model = deepvision.models.NeRFMedium(
    input_shape=(num_pos, input_features),
    backend='tensorflow'
)

We compile it using a simple MSE loss:

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.MeanSquaredError()
)

model.summary()

Despite its simplicity, this network will learn a continuous 3D scene representation.

Step 6: Training the NeRF

Training NeRF feels slow—and that’s normal.

callbacks = [tf.keras.callbacks.ReduceLROnPlateau()]

history = model.fit(
    train_ds,
    epochs=50,
    validation_data=valid_ds,
    callbacks=callbacks
)

What you’ll observe:

Early outputs look foggy
Geometry emerges before color
Fine details appear last

NeRF is literally learning how matter occupies space.

Step 7: Rendering RGB Images and Depth Maps

This is the most satisfying part.

We take trained NeRF predictions and render an image from rays:

from deepvision.models.volumetric.volumetric_utils import (
    nerf_render_image_and_depth_tf
)

for batch in train_ds.take(5):
    (images, rays) = batch
    (rays_flat, t_vals) = rays

    image_batch, depth_maps, _ = nerf_render_image_and_depth_tf(
        model=model,
        rays_flat=rays_flat,
        t_vals=t_vals,
        img_height=config['img_height'],
        img_width=config['img_width'],
        num_ray_samples=config['num_ray_samples']
    )

    fig, ax = plt.subplots(1, 2, figsize=(10, 4))
    ax[0].imshow(tf.squeeze(image_batch[0]))
    ax[0].set_title("Rendered RGB")
    ax[1].imshow(tf.squeeze(depth_maps[0]))
    ax[1].set_title("Depth Map")
    plt.show()

The depth map is the real proof:

The network has learned actual 3D structure, not just textures.

What You Learn by Building NeRF Once

After doing this end-to-end, a few things become obvious:

NeRF is neural graphics, not vision
Camera geometry matters more than data size
Rendering is the bottleneck
MLPs are more powerful than they look

This understanding transfers directly to:

Instant-NGP
Gaussian Splatting
Dynamic NeRFs
Robotics and 3D perception systems

Final Thoughts

NeRF looks intimidating from the outside, but once you build it, the math becomes intuitive, the architecture feels elegant, and the results feel almost magical.

If you work in computer vision, graphics, or multimodal AI, NeRF isn’t optional knowledge anymore. And the fastest way to understand it is exactly this:

Load real data.
Cast real rays.
Render a real scene.

That’s how NeRF stops being hype, and starts being engineering.

Olayinka Peter

⟵

Training a NeRF End-to-End: From Images to 3D Scenes