In a groundbreaking approach, Stanford University, MIT, and Harvey Mudd researchers have developed a new method enabling language models (LMs) to learn complex problem-solving through search. By teaching LMs to undertake searches and backtrack, encapsulated in a serialized string format called the Stream of Search (SoS), they have significantly enhanced their decision-making and strategic capabilities. Initial trials in the game of Countdown revealed a 25% accuracy increase after pretraining, with further finetuning leading to a 36% success rate in previously unresolved challenges. This innovation represents a leap forward in the LMs’ ability to self-improve and autonomously devise new problem-solving strategies.
The development of language models has historically revolved around improving their predictive accuracy and generative capabilities. Prior research has shown that incorporating symbolic search methods into LMs can improve their performance, particularly for tasks that require planning or advanced reasoning. However, these methods have largely been applied during the inference stage, rather than during training. This new approach of integrating search into the training process represents a notable shift towards more dynamic and adaptable language models.
What Is the Stream of Search?
The Stream of Search is a novel representation of the search process as a string sequence, allowing language models to understand and learn from search patterns. By pretraining models on these SoS sequences, they gain the ability to not only execute searches but also to improve their strategies over time. Previous research lacked this focus on training LMs to search, a limitation that the SoS method aims to overcome.
How Does Training on Search Affect LMs?
Recent studies have suggested that exposing LMs to both optimal solutions and suboptimal search trajectories can significantly enhance their problem-solving capabilities. By training models on datasets that include these trajectories, the LMs begin to outperform those trained solely on optimal solutions. Furthermore, self-improvement strategies derived from reinforcement learning illustrate that models can improve their efficiency in navigating complex search spaces.
What Makes SoS Significant for Problem-Solving?
The SoS framework’s primary innovation lies in its ability to teach language models how to simulate search processes within language. This contrasts with previous approaches that relied upon symbolic search algorithms, which were separate from the LMs’ internal functioning. A scientific paper published in the Journal of Artificial Intelligence Research titled “Teaching Language Models to Plan” highlights the importance of integrating planning capabilities within language models. It corroborates the notion that LMs trained on the SoS framework could better learn to plan and solve problems, indicating a significant step towards more intelligent and autonomous systems.
Useful Information for the Reader
- Stream of Search enables self-improvement in LMs.
- Training on search trajectories improves problem-solving.
- SoS can potentially enhance real-world task management.
The SoS framework marks a paradigm shift by integrating search directly into the LM training phase, addressing previous limitations, and enabling backtracking and exploration. This approach potentially generates more robust “world models” within the LMs, offering a more generalized application to a variety of tasks. While the framework has been tested with the Countdown game, its implications for real-world problem-solving are vast. Future research may improve SoS by refining its operations and exploring its applicability across different domains. The Stream of Search not only augments the problem-solving prowess of LMs but also paves the way for more autonomous and strategic artificial intelligence systems.