4 July, 2025
sakana-ai-s-treequest-revolutionizes-ai-collaboration-with-multi-model-teams

In a groundbreaking development for artificial intelligence, Japanese AI lab Sakana AI has unveiled a novel technique that allows multiple large language models (LLMs) to collaborate on tasks, creating what they describe as an AI “dream team.” This innovative method, known as Multi-LLM AB-MCTS, enables models to engage in trial-and-error processes, leveraging their individual strengths to tackle complex problems that are beyond the reach of any single model.

The introduction of this technique represents a significant advancement for enterprises seeking to develop more robust AI systems. By not being restricted to a single provider or model, businesses can dynamically utilize the best features of various frontier models, assigning the most suitable AI for each segment of a task to achieve superior outcomes.

The Power of Collective Intelligence

As frontier AI models continue to evolve, each exhibits unique strengths and weaknesses based on its training data and architecture. Some models excel in coding, while others shine in creative writing. Sakana AI’s researchers argue that these differences should be seen as assets rather than limitations.

“We see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,” the researchers state in their blog post. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.”

This philosophy mirrors the human experience, where diverse teams often achieve greater results than individuals working alone. By fostering collaboration among AI models, Sakana AI aims to unlock new levels of problem-solving capabilities.

Thinking Longer at Inference Time

Sakana AI’s algorithm is an “inference-time scaling” technique, a research area that has gained popularity over the past year. While much of AI research has focused on “training-time scaling”—increasing model size and training data—inference-time scaling enhances performance by optimizing computational resources post-training.

Common methods include using reinforcement learning to generate longer, more detailed chain-of-thought sequences, as seen in models like OpenAI’s o3 and DeepSeek-R1. Another approach is repeated sampling, akin to brainstorming, where the model receives the same prompt multiple times to produce various solutions. Sakana AI’s framework advances these ideas by offering a more strategic version of Best-of-N (repeated sampling).

“Our framework offers a smarter, more strategic version of Best-of-N,” Takuya Akiba, research scientist at Sakana AI, told VentureBeat. “It complements reasoning techniques like long CoT through RL, maximizing performance within a limited number of LLM calls.”

How Adaptive Branching Search Works

The core of the new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). This enables an LLM to conduct trial-and-error by balancing two search strategies: “searching deeper” and “searching wider.” The former involves refining promising answers, while the latter generates new solutions from scratch.

AB-MCTS combines these approaches, allowing the system to improve a good idea or pivot to a new direction if necessary. This is achieved using Monte Carlo Tree Search (MCTS), a decision-making algorithm used by DeepMind’s AlphaGo. AB-MCTS uses probability models to decide whether to refine an existing solution or generate a new one.

Multi-LLM AB-MCTS takes this further by determining not only “what” to do (refine vs. generate) but also “which” LLM should perform the task. Initially, the system uses a balanced mix of available LLMs, learning which are most effective as it progresses, and reallocating workload accordingly.

Putting the AI ‘Dream Team’ to the Test

The researchers tested their Multi-LLM AB-MCTS system on the ARC-AGI-2 benchmark, designed to assess human-like visual reasoning capabilities. Using a combination of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1, the team achieved a 30% success rate on 120 test problems, outperforming any single model.

The system demonstrated its ability to dynamically assign the best model for each problem. In one instance, an incorrect solution by the o4-mini model was corrected by DeepSeek-R1 and Gemini-2.5 Pro, showcasing the power of collective intelligence.

“This demonstrates that Multi-LLM AB-MCTS can flexibly combine frontier models to solve previously unsolvable problems,” the researchers write. “By creating an ensemble with a model less likely to hallucinate, it could achieve powerful logical capabilities and strong groundedness.”

From Research to Real-World Applications

To facilitate the application of this technique, Sakana AI has released the underlying algorithm as an open-source framework called TreeQuest, available under an Apache 2.0 license. This provides a flexible API for implementing Multi-LLM AB-MCTS with custom scoring and logic.

While still in the early stages of applying AB-MCTS to business problems, the research reveals significant potential in areas such as complex algorithmic coding and machine learning accuracy improvement. AB-MCTS could also optimize performance metrics of existing software, like reducing web service response latency.

The release of TreeQuest as an open-source tool could pave the way for a new class of powerful and reliable enterprise AI applications, offering businesses a strategic advantage in leveraging AI for complex problem-solving.