TIL: World Model
World Model in AI is an internal simulation or understanding of how the world works. In Reinforcement learning or robotics “World” means,
The surroundings (Road, Car, Obstacles)
The rule of how things behave (Physics, rules, cause)
The result of action (Push, Pull, Move)
World model lets an AI agent predicts what happen next, plan action, learn efficiently without relying entirely on real-world interactions.
In traditional AI, an AI agent learns everything re-actively, See or observe the input, predict the output and perform the actions. They don’t understand or imagine what might happen next.
But in World Model, AI agent learn how the environment behaves, simulate possible futures internally, let the AI make the decisions without trial & error.
This is very crucial where training in real-world training is critical or dangerous like robotics, automated car, healthcare.
In one of the interview Andrej Karpathy told that “Tesla is goal to build neural network world model for driving environment”. Let’s explain the World model in the context of Tesla’s auto-pilot.
In Tesla’s Auto-pilot mode, the car doesn’t react to camera inputs. It tries to understand the environment around the car and predict the next action.
World Model of Tesla Auto-Pilot
Typical Tesla world includes,
Road Signs, Signals, Lanes, Road edges
Car, Pedestrian, Trucks
Physical motion - How other vehicles are move
Time - What will happen in next few secs
All this together = the “world” that the Tesla needs to understand and simulate.
How Tesla build its world Models
Tesla captures videos from its all 8 cameras and process each frame with help of neural network and prepares 3D Internal model like below.
This is called a Vector Space World Model
It’s just not “seeing pixel” - It’s converting scene into structured understanding like this.
Car1: Moving towards North, Speed 20 KMPH,
Pedestrian: Moving right, 8m ahead,
Signal: Red
Lane edge: 1.2m left
These structured understanding helps Tesla to predict the future events,
Will car slow down?
Will pedestrian cross the road?
Will signal turns green?
The planning unit
Once Tesla model understands the world, another layer (planning) uses this model to predict the actions,
Should we brake?
Should we slow down?
Should we accelerate?
This planning happens inside the world model, before any real action is taken.
Summary of Flow
Cameras → Neural Networks → World Model → Prediction → Planner → Control (steer/brake)
Real-World Example
When a Tesla approaches a curve:
It “knows” that’s a curve from its world model, not just pixel edges.
It predicts how the road continues, even if the camera view is partly blocked.
It adjusts speed before the turn - because it imagined the road ahead.