TIL: World Model

3 min read

World Model in AI is an internal simulation or understanding of how the world works. In Reinforcement learning or robotics “World” means,

  • The surroundings (Road, Car, Obstacles)

  • The rule of how things behave (Physics, rules, cause)

  • The result of action (Push, Pull, Move)

World model lets an AI agent predicts what happen next, plan action, learn efficiently without relying entirely on real-world interactions.

In traditional AI, an AI agent learns everything re-actively, See or observe the input, predict the output and perform the actions. They don’t understand or imagine what might happen next.

But in World Model, AI agent learn how the environment behaves, simulate possible futures internally, let the AI make the decisions without trial & error.
This is very crucial where training in real-world training is critical or dangerous like robotics, automated car, healthcare.

In one of the interview Andrej Karpathy told that “Tesla is goal to build neural network world model for driving environment”. Let’s explain the World model in the context of Tesla’s auto-pilot.

In Tesla’s Auto-pilot mode, the car doesn’t react to camera inputs. It tries to understand the environment around the car and predict the next action.

World Model of Tesla Auto-Pilot

Typical Tesla world includes,

  • Road Signs, Signals, Lanes, Road edges

  • Car, Pedestrian, Trucks

  • Physical motion - How other vehicles are move

  • Time - What will happen in next few secs

All this together = the “world” that the Tesla needs to understand and simulate.

How Tesla build its world Models

Tesla captures videos from its all 8 cameras and process each frame with help of neural network and prepares 3D Internal model like below.

This is called a Vector Space World Model

It’s just not “seeing pixel” - It’s converting scene into structured understanding like this.

Car1: Moving towards North, Speed 20 KMPH,
Pedestrian: Moving right, 8m ahead,
Signal: Red
Lane edge: 1.2m left

These structured understanding helps Tesla to predict the future events,

  • Will car slow down?

  • Will pedestrian cross the road?

  • Will signal turns green?

The planning unit

Once Tesla model understands the world, another layer (planning) uses this model to predict the actions,

  • Should we brake?

  • Should we slow down?

  • Should we accelerate?

This planning happens inside the world model, before any real action is taken.

Summary of Flow

Cameras  Neural Networks  World Model  Prediction  Planner  Control (steer/brake)

Real-World Example

When a Tesla approaches a curve:

  • It “knows” that’s a curve from its world model, not just pixel edges.

  • It predicts how the road continues, even if the camera view is partly blocked.

  • It adjusts speed before the turn - because it imagined the road ahead.


© 2025