LLMs in Robotics: Why Human-Level AI is Still a Challenge

Recent experiments by Andon Labs have shed light on the limitations of Large Language Models (LLMs) when it comes to robotics. Despite their impressive capabilities in generating human-like text, LLMs still face significant hurdles when it comes to embodying their intelligence in physical robots.

Andon Labs tested various LLMs, such as Gemini 2.5 Pro, Claude Opus 4.1, and GPT-5, by embedding them into a basic vacuum robot. The goal was to isolate the LLM brains and decision-making processes from the robotic functions, in order to assess their readiness for physical embodiment.

However, the results of the experiment were less than ideal. The LLMs showed a ‘doom spiral’ and exhibited humor, rather than the expected human-level intelligence. This highlights the gap between the capabilities of LLMs in understanding and generating text, and their ability to translate this intelligence into reliable and safe physical actions.

Furthermore, Andon Labs’ research revealed that LLMs struggle with introspective awareness and describing their own internal processes. The messy internal logs of the LLMs contrasted with their cleaner external communication, indicating a lack of clarity in their ‘thoughts.’

Overall, the experiment conducted by Andon Labs serves as a reality check for the field of LLM technology in robotics. While LLMs offer unprecedented capabilities in language processing, achieving human-level AI remains a significant challenge.