While everyone was busy arguing about GPT-5.4 benchmarks and context window sizes this month, a Turing Award winner quietly closed the largest seed round in European history — $1.03 billion — on the thesis that the entire LLM paradigm is fundamentally wrong for real intelligence. Yann LeCun's AMI Labs isn't trying to build a better chatbot. They're building something that actually understands how the physical world works.
The Core Argument: LLMs Don't Understand Anything
LeCun has been making this case for years, but now he has a billion dollars to prove it. His argument boils down to a deceptively simple observation: predicting the next token in a text sequence is a fundamentally different task from understanding cause and effect in physical reality.
Think about it from a robotics perspective. You can feed GPT-5.4 every robotics paper ever written, and it'll generate beautiful descriptions of how a robotic arm should pick up a glass. But it can't actually predict what happens when that arm nudges the glass sideways instead. It doesn't model gravity. It doesn't model friction. It doesn't know that glass is fragile. What it has is statistics about words that describe these things — and that gap matters enormously once you leave the text domain.
LeCun's counter-proposal is world models — systems trained on video, audio, and sensor data that learn to predict what happens next in physical environments. Not at the raw pixel level (way too noisy and computationally wasteful), but in compressed, abstract representation spaces. The architecture driving all of this is called JEPA: Joint Embedding Predictive Architecture, which LeCun first introduced back in 2022 and iterated on through V-JEPA and V-JEPA2 during his time at Meta.
The claim is stark: hallucination, brittleness, and the inability to plan aren't bugs that more scale will fix. They're symptoms of optimizing for the wrong objective entirely.
JEPA in 60 Seconds
Traditional language models predict the next token autoregressively. Diffusion models reconstruct raw pixel data. JEPA does neither. It takes an input — a video frame, a sensor reading — encodes it into a compact latent representation, then predicts the representation of the next state rather than the raw sensory data itself.
Why bother? Because real-world sensor streams contain massive amounts of unpredictable noise. Leaves flutter randomly in wind. Shadows shift with clouds. Trying to predict every pixel of the next video frame forces a model to hallucinate details it fundamentally cannot know. JEPA sidesteps this by operating only in abstract space — predicting something like "the car moved two meters forward" without needing to reconstruct every reflection on its windshield.
This mirrors how human cognition works. You don't simulate every photon entering your retina. You maintain a compressed mental model of objects, physics, and spatial relationships, and you run predictions at that level. That's what hierarchical planning in latent space means in practice.
| LLMs (Autoregressive) | World Models (JEPA) | |
|---|---|---|
| Training data | Text tokens | Video, audio, sensor streams |
| Predicts | Next token | Next abstract state in latent space |
| Uncertainty | Hallucinates details | Ignores unpredictable noise by design |
| Physical reasoning | Emergent at best | Core training objective |
| Planning | Chain-of-thought, serial | Hierarchical, latent space |
| Sweet spot | Language, code, analysis | Robotics, industrial control, embodied AI |
The Money
1.03B seed. 3.5B pre-money valuation. Europe's largest seed round ever. Backed by Bezos Expeditions, Nvidia, Samsung, Toyota Ventures, and Temasek, plus individual checks from Eric Schmidt, Mark Cuban, and Tim Berners-Lee. CEO is Alexandre LeBrun, previously founder of healthcare AI company Nabla. Offices in Paris, New York, Montreal, and Singapore.
What This Means If You Build Things
Should you drop your current stack and start learning JEPA? No. AMI has been explicit that year one is pure research, with product timelines "measured in years rather than quarters." No API to call. No weights on Hugging Face. No repo to clone.
But if you've spent any time building LLM-powered systems that need to reason about physical space, you already know the frustration. Chain-of-thought prompting works shockingly well for text-native problems. It's a crude hack when the task requires spatial reasoning, physics prediction, or understanding what happens when a robot arm collides with an obstacle.
The signal here isn't AMI the company — it's the investor thesis. The world's largest robotics and manufacturing companies have concluded that scaling language models further won't deliver embodied intelligence. Toyota didn't write a nine-figure check because they want a smarter chatbot. They want AI that operates on factory floors. That's a data point worth sitting with, even if you're building text-centric products today.
And this isn't happening in isolation. Fei-Fei Li's World Labs raised a billion at a similar time, pursuing spatial intelligence from a different angle. The pattern is clear: serious hardware money is migrating away from pure language model plays toward systems that grapple with physical reality.
The Skeptic's Take
Fair pushback: can world models produce economically useful systems before LLM agents eat the entire market? Tool-using agents ship today and improve monthly. Meanwhile, world models need video and sensor data at internet scale, and that corpus is fragmented across private robotics labs and industrial companies with zero incentive to share.
Where I'd Actually Bet
Both paradigms survive. Language models won't disappear — they're genuinely excellent at language, code, and everything text-shaped. But they'll plateau hard for embodied tasks, and something JEPA-derived will fill that vacuum.
The most interesting near-term outcome is probably hybrid architectures: a language model handling communication and high-level planning, with a world model underneath running physics and spatial prediction. If AMI publishes open research — and LeCun's track record at Meta suggests they will — JEPA modules could start plugging into existing agent frameworks within a couple of years.
LeBrun himself joked that "world models will be the next buzzword" and predicted "every company will call itself a world model to raise funding in six months." He's probably right. That's exactly what makes understanding the actual architecture worth your time now, while you can still tell the real thing from the inevitable rebrands.