Foundation models have made robotics demos dramatically more impressive. A robot can now watch a video, interpret a natural language command, and attempt a complex manipulation task. The progress is real.
But there's a gap — a large, persistent, and underappreciated gap — between what works in a demo and what works in deployment.
The demo-to-deployment gap
In a research lab, you control the environment. The lighting is consistent. The objects are known. The table height is fixed. The camera angle is calibrated. Failure is acceptable — you just reset and try again.
In the real world, none of this holds. Lighting changes. Objects are unfamiliar. Surfaces vary. The robot has to work the first time, every time, or the customer loses confidence.
What still breaks
From conversations with robotics founders and operators, the same failure modes keep coming up:
- Perception reliability: Foundation models can identify objects in open-vocabulary settings, but they still struggle with reflective surfaces, transparent objects, and cluttered environments.
- Manipulation precision: Language-conditioned policies can do impressive things, but fine-grained manipulation — inserting a cable, tightening a screw, handling flexible materials — remains hard.
- Error recovery: When something goes wrong, most systems don't recover gracefully. They stop, or worse, they continue in a degraded state.
- Edge cases at scale: The long tail of real-world situations is enormous. Every deployment environment introduces new edge cases.
The infrastructure gap
Beyond the models themselves, there's a deeper infrastructure problem. Most robotics companies lack:
- Reliable data pipelines for continuous improvement
- Standardized evaluation frameworks for real-world performance
- Hardware procurement processes that can scale
- Deployment playbooks that work across different customer sites
This infrastructure gap is, in many ways, harder to close than the model gap. And it's where I think the biggest opportunities lie.
This is a working note. I'll continue updating it as I learn more.