Neural Radiance Fields (NeRF) are one of those ideas that sound abstract until you actually build one. Then suddenly, everything clicks.
In this article, we’ll train a NeRF end-to-end, starting from raw images and camera poses, all the way to rendering RGB images and depth maps. The goal here is to use a minimal dataset to understand how NeRF works in practice.
What We’re Building
By the end of this walkthrough, we will:
- Load a real NeRF dataset
- Convert pixels into camera rays
- Train a NeRF model
- Render both novel views and depth maps
All with a compact setup that runs locally.
Step 1: Installing the Toolkit
To avoid reimplementing NeRF from scratch, we’ll use deepvision-toolkit, which provides datasets, models, and rendering utilities.
!pip install deepvision-toolkitThis gives us:
- A ready-to-use NeRF model
- Tiny NeRF dataset loader
- Volume rendering utilities
Step 2: Downloading the Tiny NeRF Dataset
We’ll use the Tiny NeRF dataset released by the original NeRF authors. It contains:
- 106 RGB images (100×100)
- Camera poses (4×4 matrices)
- A single focal length
import requests
import numpy as np
import matplotlib.pyplot as plt
url = "https://people.eecs.berkeley.edu/~bmild/nerf/tiny_nerf_data.npz"
save_path = 'tiny_nerf.npz'
file_data = requests.get(url).content
with open(save_path, "wb") as file:
file.write(file_data)
data = np.load(save_path)
images, poses, focal = data["images"], data["poses"], data["focal"]
print(images.shape) # (106, 100, 100, 3)
print(poses.shape) # (106, 4, 4)
print(focal) # ~138.89Let’s visualize a few images:
fig, ax = plt.subplots(1, 5, figsize=(20, 12))
for i in range(5):
ax[i].imshow(images[i])
ax[i].axis("off")
At this stage, we only have 2D images—no meshes, no depth, no 3D information.
Step 3: Rays, Not Images
NeRF does not learn from images directly. Instead:
- Each pixel becomes a ray
- Each ray is sampled at multiple 3D points
- Each point is passed through the NeRF network
We configure how this sampling works:
from deepvision.datasets import load_tiny_nerf
import tensorflow as tf
config = {
'img_height': 100,
'img_width': 100,
'pos_embed': 32,
'num_ray_samples': 64,
'batch_size': 1
}Why these parameters matter:
pos_embedcontrols positional encoding frequencynum_ray_samplescontrols rendering quality vs speed
Step 4: Preparing the Dataset
We now load the dataset in a form suitable for NeRF training.
train_ds, valid_ds = load_tiny_nerf(
pos_embed=config['pos_embed'],
num_ray_samples=config['num_ray_samples'],
save_path='tiny_nerf.npz',
validation_split=0.2,
backend='tensorflow'
)
train_ds = train_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)
valid_ds = valid_ds.batch(config['batch_size']).prefetch(tf.data.AUTOTUNE)Each batch contains:
- Ground truth pixel colors
- Flattened rays
- Sample depths along each ray
This is where NeRF diverges from traditional vision pipelines.
Step 5: Building the NeRF Model
The NeRF model itself is just a fully connected neural network.
No CNNs. No attention.
import deepvision
num_pos = (
config['img_height'] *
config['img_width'] *
config['num_ray_samples']
)
input_features = 6 * config['pos_embed'] + 3
model = deepvision.models.NeRFMedium(
input_shape=(num_pos, input_features),
backend='tensorflow'
)We compile it using a simple MSE loss:
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.MeanSquaredError()
)
model.summary()Despite its simplicity, this network will learn a continuous 3D scene representation.
Step 6: Training the NeRF
Training NeRF feels slow—and that’s normal.
callbacks = [tf.keras.callbacks.ReduceLROnPlateau()]
history = model.fit(
train_ds,
epochs=50,
validation_data=valid_ds,
callbacks=callbacks
)What you’ll observe:
- Early outputs look foggy
- Geometry emerges before color
- Fine details appear last
NeRF is literally learning how matter occupies space.
Step 7: Rendering RGB Images and Depth Maps
This is the most satisfying part.
We take trained NeRF predictions and render an image from rays:
from deepvision.models.volumetric.volumetric_utils import (
nerf_render_image_and_depth_tf
)
for batch in train_ds.take(5):
(images, rays) = batch
(rays_flat, t_vals) = rays
image_batch, depth_maps, _ = nerf_render_image_and_depth_tf(
model=model,
rays_flat=rays_flat,
t_vals=t_vals,
img_height=config['img_height'],
img_width=config['img_width'],
num_ray_samples=config['num_ray_samples']
)
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
ax[0].imshow(tf.squeeze(image_batch[0]))
ax[0].set_title("Rendered RGB")
ax[1].imshow(tf.squeeze(depth_maps[0]))
ax[1].set_title("Depth Map")
plt.show()The depth map is the real proof:
The network has learned actual 3D structure, not just textures.
What You Learn by Building NeRF Once
After doing this end-to-end, a few things become obvious:
- NeRF is neural graphics, not vision
- Camera geometry matters more than data size
- Rendering is the bottleneck
- MLPs are more powerful than they look
This understanding transfers directly to:
- Instant-NGP
- Gaussian Splatting
- Dynamic NeRFs
- Robotics and 3D perception systems
Final Thoughts
NeRF looks intimidating from the outside, but once you build it, the math becomes intuitive, the architecture feels elegant, and the results feel almost magical.
If you work in computer vision, graphics, or multimodal AI, NeRF isn’t optional knowledge anymore. And the fastest way to understand it is exactly this:
- Load real data.
- Cast real rays.
- Render a real scene.
That’s how NeRF stops being hype, and starts being engineering.