In an innovative fusion of robotics and artificial intelligence, Boston Dynamics has reimagined its four-legged mechanical marvel, Spot, as a charismatic tour guide. Infused with the power of OpenAI’s ChatGPT and other large language models (LLMs), Spot has been transformed from an inspection assistant into an interactive robot that can converse, answer questions, and offer tours with a touch of entertainment and nuance. This evolution in Spot’s capabilities is a result of Boston Dynamics’ exploration of the vast potential of foundational models—complex AI systems trained on extensive datasets that can exhibit emergent behaviors.
From Inspection to Interaction
Spot, previously recognized for its inspection prowess, now dons new capabilities as it wanders the halls of Boston Dynamics. Equipped with an array of sensors and AI-driven speech and text recognition tools, Spot demonstrates a remarkable ability to interact with humans in real-time. This interaction is not just about providing dry facts; it’s about creating an engaging, informative experience that might include a bit of improvised role-playing or even humor.
The Technical Ensemble
The transformation required Spot to be outfitted with a vibration-resistant mount for speakers, enabling it to project its newfound voice. Controlled by an offboard computer using the Spot SDK, the robot integrates OpenAI’s ChatGPT API, upgraded to GPT-4, and various open-source LLMs. Spot’s tour guide persona is also enhanced by visual question-answering models that allow it to describe objects it “sees” with its cameras and answer questions about them.
Spot’s interactions during the tours revealed unexpected behaviors, such as independently seeking help or identifying its ‘parents’ among older robot models. These actions highlight the AI’s capacity to draw statistical associations and adapt to new contexts, although the Boston Dynamics team is quick to clarify that this doesn’t imply the LLMs are conscious or intelligent in a human-like way.
The Human Touch
To add to Spot’s human-like interactions, the team utilized text-to-speech services and programmed body language into the robot, allowing it to turn towards and ‘speak’ to people—its robotic arm mimicking the movements of a human mouth.
Challenges and Prospects
Despite the successes, the team acknowledges the limitations, such as the LLM’s propensity to fabricate responses or the awkwardness of delayed replies. Nonetheless, the team is optimistic about the future, envisioning a world where robots understand and act upon verbal instructions, reducing the learning curve for human users and enhancing the robots’ utility across various domains.
Spot’s new role as a tour guide represents a significant stride in the ongoing convergence of AI and robotics. It underscores the potential for these technologies to provide not only functional benefits but also cultural context and a touch of whimsy to our interactions with machines. The experience gleaned from this proof-of-concept promises to pave the way for even more sophisticated and seamless human-robot collaborations in the future.