Robots That Learn on the Job: The Rise of Self-Training AI

Gábor Bíró • August 12, 2024

5 min read

Imagine robots that don't just follow pre-programmed instructions but actually learn and adapt while performing tasks in our unpredictable world. Researchers at MIT have recently developed a novel algorithm called "Estimate, Extrapolate, and Situate" (EES), marking a significant step in this direction. This innovation promises to enhance robotics by enabling machines to train themselves effectively, reducing the need for constant human intervention and potentially revolutionizing their capabilities across numerous domains.

Robots That Learn on the Job: The Rise of Self-Training AI

The Challenge: Bridging the Gap Between Code and Reality

Traditionally, programming robots for complex, real-world tasks has been laborious. Robots often operate based on rigid code or models trained extensively in simulations. However, the real world is messy and unpredictable. A pre-programmed robot might fail if an object isn't exactly where expected, or if an unforeseen obstacle appears. While methods like reinforcement learning (RL) allow robots to learn through trial-and-error, doing this directly in the physical world can be slow, unsafe, and data-intensive. Training purely in simulation often suffers from the "sim-to-real" gap, where strategies learned in virtual environments don't transfer perfectly to physical reality. This is where algorithms like EES come into play, aiming to give robots more robust, adaptive learning capabilities directly within their working environment.

EES Explained: Estimate, Extrapolate, Situate

The EES algorithm integrates the strengths of Large Language Models (LLMs) – known for their text-based reasoning and vast world knowledge – with real-time robot motion data. This fusion allows household robots, for example, to adapt more effectively to new tasks and environments. But how does it work? Let's break down the name:

Estimate: The robot constantly assesses its current physical state and its relationship to the ongoing task. For instance, if tasked with clearing a table, it estimates "I am holding a cup" or "My hand is empty and near the table."
Extrapolate: Using its understanding of the task (often derived from LLM-based planning) and its current state, the robot predicts potential next steps and their outcomes. "If I move forward, I can place the cup in the sink," or "If I encounter an obstacle, I need to find an alternative path."
Situate: This crucial step involves the robot contextualizing its current state and actions within the broader goal of the task. It links its physical state ("holding cup") to a natural language label or sub-goal provided by the LLM ("transporting cup to sink"). If interrupted (e.g., needing to put the cup down temporarily to open a door), the robot understands it hasn't failed the overall task but merely paused a sub-task. It can then resume logically, knowing *why* it paused and what the next step towards the ultimate goal ("clear the table") should be.

This cycle allows robots to break down complex chores into logical sub-tasks. Crucially, if interrupted or facing an unexpected situation, they don't necessarily need to restart the entire process. Instead, they can re-estimate, re-extrapolate, and re-situate themselves within the task flow, significantly boosting efficiency and resilience, especially for complex household duties.

The Power of LLMs: Injecting "Common Sense" into Motion

Large Language Models play a pivotal role in enhancing the capabilities of these self-training robots. It's not just about understanding language commands; LLMs provide a form of "common sense" reasoning. By connecting the robot's motion data and sensor readings with the LLM's knowledge base, the system enables robots to:

Logically decompose tasks: An LLM can break down a high-level command like "clean the kitchen counter" into a sequence of actionable steps (e.g., identify clutter, pick up items, wipe surface).
Reason about objects and environments: The LLM understands that a glass is fragile, a sponge is for wiping, and putting electronics in the sink is a bad idea.
Handle ambiguity and interruptions gracefully: If a robot encounters an unknown object while cleaning, the LLM can help infer its likely properties or suggest asking a human. If a sub-task fails, the LLM helps the robot understand the context and attempt corrective actions relevant to the overall goal.

This integration automates identifying and sequencing sub-tasks, simplifying the process of teaching complex behaviors. It moves beyond simple pattern matching in motion data, adding a layer of semantic understanding that makes the robot's behavior more flexible and adaptive, paving the way for more versatile and intelligent household robots requiring minimal human guidance.

Self-Training and Adaptation in Action

The EES algorithm empowers robots to autonomously refine their skills and continuously improve performance. By constantly estimating, extrapolating, and situating, they build a better understanding of how their actions affect the environment. This allows them to make more informed decisions over time. This capability is particularly valuable for household robots encountering unfamiliar objects or layouts in users' homes. With EES, a robot tasked with setting the table in a new house could adapt its grasping strategy for unfamiliar plates or navigate around unexpected furniture, modifying its behavior to successfully complete tasks even in previously unseen environments.

Broader Implications for the Robotics Industry

Self-training algorithms like EES have far-reaching consequences. By enabling robots to adapt to new environments and tasks without extensive reprogramming, this technology can significantly reduce deployment costs and increase the versatility of robotic systems across various sectors. Healthcare could benefit from assistants adapting to patient needs in homes, manufacturing from robots quickly learning new assembly variations, and logistics from machines handling diverse and unexpected packages or warehouse layouts. Furthermore, this fusion of AI and robotics could accelerate the development of truly helpful home assistant robots, potentially revolutionizing elder care and rehabilitation services by providing adaptable, multi-functional support within domestic settings.

Challenges and the Road Ahead

While promising, challenges remain. Ensuring safety, especially as robots learn autonomously in human environments, is paramount. Bridging the subtle differences between simulation and reality (the sim-to-real gap) continues to be an area of active research. The computational resources required for running sophisticated models like LLMs onboard robots and the need for diverse real-world training data also present hurdles. However, the progress is undeniable.

Conclusion:

Overall, the EES algorithm and similar approaches represent a new frontier in robotics. By enabling robots to learn and adapt on the job, leveraging the reasoning power of LLMs, we are moving closer to creating machines that are not just tools, but truly intelligent partners. This advancement holds the potential to significantly impact not only household robotics but also a wide array of industries in the near future, making robots more capable, versatile, and integrated into our lives.

Recommended

Do We Get Better Answers Querying Models in English?

December 30, 2024 • 7 min read

When using Large Language Models (LLMs) like GPT-4o or Claude Sonnet, a common question arises, particularly for the vast number of users worldwide who interact with these tools in languages other than English: which language should one use to achieve the most effective results? While the multilingual capabilities of these models allow for effective communication in numerous languages, their performance often seems diminished compared to interactions conducted purely in English. This exploration delves into why that might be the case and when switching to English could be beneficial.

OpenAI Launches GPT-4o: Faster, Cheaper, and Natively Multimodal

May 14, 2024 • 2 min read

OpenAI recently unveiled its latest flagship language model, GPT-4o. The name, derived from "omni," signifies a major leap forward in artificial intelligence, as the model is natively capable of handling text, audio, and vision inputs and outputs. This inherently multimodal approach unlocks new possibilities for both developers and users, further solidifying OpenAI's position at the forefront of AI innovation.

Self-Driving Offensive: Shenzhen, the Future City of the Driverless Revolution

July 10, 2025 • 3 min read

Shenzhen, China's premier technology hub, is spearheading the autonomous vehicle revolution. But this isn't just about futuristic robotaxis. The city is aggressively deploying autonomous technology to boost core industries and fundamentally redesign urban services, from logistics to public sanitation.

Hiroshi Ishiguro - The Man Who Made a Copy of Himself

August 31, 2024 • 3 min read

The development of human-like robots has yielded impressive results in recent years, but it continues to raise numerous questions. Robotics researchers, including Hiroshi Ishiguro, are working to integrate robots more deeply into our daily lives, assisting with various tasks such as elder care, patient monitoring, or even performing household chores.

Nvidia Unveils Blackwell: The Next-Generation AI Superchip Platform

March 19, 2024 • 3 min read

Nvidia, a leader in accelerated computing and AI, has unveiled its highly anticipated next-generation platform built around the powerful Blackwell GPU. Announced at the company's GTC 2024 conference, this new architecture, named after mathematician David Blackwell, succeeds the influential Hopper generation (H100/H200). Significantly, Blackwell represents Nvidia's first foray into a chiplet-based design for its data center GPUs, integrating two large GPU dies manufactured using a custom TSMC 4NP process node.

How Humanoids Are Shaping the Future of Work

July 23, 2025 • 5 min read

Stepping from the pages of science fiction into the real-world factory floors and logistics centers, humanoid robotics is on the verge of a dramatic transformation. What were once captivating tech demos are now becoming a realistic solution for a new era of automation and human-robot collaboration

Quantum Memory: The Critical Component Powering the Quantum Internet

April 29, 2024 • 4 min read

The vision of a quantum internet—a network leveraging the strange laws of quantum mechanics for revolutionary communication capabilities—hinges on the development of several key technologies. Among these, quantum memory stands out as a truly indispensable component. Essential for the practical operation of quantum networks, quantum memory provides the crucial capability to store fragile quantum information, acting as a vital interface between communication links and local processing nodes within the network.