World Models: The Next AI Platform

Machines That Understand Reality

In a research lab in San Francisco, a robotic arm is learning to fold a t-shirt. It has never touched this particular shirt before. The fabric is unfamiliar, the lighting is different, the table height has changed. None of that matters. The robot folds the shirt on its first try.

This isn't narrow automation. It's not a pre-programmed sequence. The robot has learned something deeper: a model of how the physical world works. It understands that fabric drapes, that gravity pulls, that edges can be gripped and manipulated. It can generalize.

This is the promise of world models -- and it represents the most important shift in artificial intelligence since the transformer architecture gave us ChatGPT.

The Limits of Language

For the past five years, AI progress has been defined by a single paradigm: predict the next token. GPT-4, Claude, Gemini -- all of them are, at their core, extraordinarily sophisticated autocomplete engines. They predict what word comes next, and they do it so well that emergent capabilities -- reasoning, coding, analysis -- fall out almost as a side effect.

But language models have a ceiling. They understand descriptions of reality. They don't understand reality itself.

Ask an LLM to describe how to catch a ball, and it will give you a fluent, accurate answer. But it has no internal model of trajectories, no sense of timing, no understanding of what "catching" actually requires in physical space. It's pattern-matching on text written by humans who do understand these things.

World models flip this. Instead of learning from language about the world, they learn from the world directly -- from video, from physics simulations, from sensor data, from interaction. They build internal representations of how things move, change, and respond to actions.

Yann LeCun, Meta's chief AI scientist, has been banging this drum for years. He calls world models "the missing piece" of artificial intelligence -- the thing that will take us from systems that talk to systems that act. His proposed architecture, the Joint Embedding Predictive Architecture (JEPA), is designed specifically to learn these kinds of representations.

"A world model allows a system to predict the consequences of its actions," LeCun wrote. "This is essential for planning, for reasoning, and for any form of intelligent behavior in the real world."

The Companies Building the Future

The capital is moving fast. A new generation of startups is racing to build world models -- and the funding rounds reflect the stakes.

Physical Intelligence has raised over $400 million to build what they call a "foundation model for robotics." Their approach: train a single model that can control many different robots across many different tasks. The key insight is that physical intelligence is transferable. A model that understands how to manipulate objects in one context can generalize to new objects, new environments, new robots. Their backers include Jeff Bezos, OpenAI, Thrive Capital, and Khosla Ventures.

Skild AI has raised $300 million at a $1.5 billion valuation to build a "scalable robot brain." Their bet: simulation-first training. By learning in rich virtual environments, robots can acquire millions of hours of experience before touching the real world. The model learns physics, cause-and-effect, and spatial reasoning in simulation -- then transfers that knowledge to physical hardware.

World Labs, founded by Fei-Fei Li -- the Stanford professor who created ImageNet and catalyzed the deep learning revolution -- is building "large world models" explicitly. Her thesis: just as large language models learned the structure of text, large world models will learn the structure of reality. The company has raised $230 million and is one of the clearest signals that the field's center of gravity is shifting.

Figure AI has raised over $750 million to build humanoid robots powered by foundation models. Their Figure 02 robot can watch a human perform a task, understand the underlying physics and goals, and replicate the behavior. It's not mimicry -- it's comprehension.

1X Technologies, backed by OpenAI, is deploying humanoid robots in real-world environments today. Their NEO robot is designed for home use -- a general-purpose physical assistant that can navigate, manipulate, and interact with the messiness of human spaces.

Wayve, based in London, has raised over $1 billion to build world models for autonomous driving. Their GAIA-1 model generates realistic driving videos by learning the structure of road environments. It doesn't just recognize objects -- it understands how traffic flows, how vehicles interact, how scenarios evolve. This is world modeling applied to one of the hardest real-world domains.

Why This Is a Platform Shift

The pattern should look familiar. In the 2010s, cloud infrastructure became a horizontal platform -- Amazon, Google, and Microsoft built the substrate, and thousands of companies built on top. In the 2020s, large language models became a horizontal platform -- OpenAI, Anthropic, and others built the substrate, and thousands of applications emerged.

World models are the next layer.

They won't live in one vertical. They'll sit underneath many:

Robotics: Manipulation, locomotion, assembly, logistics
Autonomous vehicles: Cars, trucks, drones, delivery robots
Scientific discovery: Simulating experiments, predicting outcomes, accelerating research
Gaming and simulation: NPCs that actually understand their world, not just scripted behaviors
Industrial automation: Factories that adapt, reconfigure, and optimize in real-time
Defense: Autonomous systems that plan, coordinate, and respond to dynamic environments

The companies building world models today are building infrastructure. The applications will come from everywhere.

The Technical Frontier

What makes world models hard is also what makes them valuable: they require learning representations that capture the structure of reality, not just surface patterns.

Current approaches fall into a few camps:

Video prediction models learn by watching massive amounts of video and predicting what happens next. If you can predict the next frame of a video, you've implicitly learned something about physics, object permanence, and causality. Google's Genie and Meta's V-JEPA are examples.

Simulation-trained models learn in physics engines -- virtual environments where they can interact, fail, and iterate millions of times. This is how Skild and Physical Intelligence train their systems. The challenge is transferring that knowledge to the real world (the "sim-to-real" gap).

Hybrid architectures combine language models with world models. The language model provides reasoning and planning; the world model provides grounding in physical reality. This is where many researchers believe the field is heading -- systems that can think and act.

The Investment Thesis

Here's why this matters for investors:

Timing: The enabling technologies have converged. Compute is cheap enough. Simulation is rich enough. And large language models have solved the interface layer -- you can now talk to robots in natural language. World models are the missing piece that connects intelligence to action.

Market size: Every industry that involves physical systems -- manufacturing, logistics, transportation, agriculture, construction, healthcare -- is a potential market. We're not talking about a niche. We're talking about the physical economy.

Moat dynamics: World models are expensive to train and hard to build. The companies that get there first will have data flywheels, talent density, and deployment advantages that compound over time. This is a scale game.

Platform economics: If world models become the substrate for robotics and autonomy, the companies building them will capture value the way cloud providers and foundation model companies do today -- through infrastructure leverage.

The Bet

Large language models taught machines to speak.

World models will teach them to understand.

The companies building this technology today -- Physical Intelligence, Skild, World Labs, Figure, Wayve, 1X -- are constructing the foundation for the next era of AI. Not chatbots. Not copilots. Autonomous systems that operate in the real world.

In five years, "world model" will be as common a term as "foundation model" is today. The question is which companies will own the platform.