Amplifing the power of large language models (LLMs)

9/03/2024

Introduction

In the vast expanse of burgeoning capabilities, tools emerge as the unsung heroes, guiding not only humans but also other creatures. Bestowing the power to transcend the limitations of their physical forms, these tools orchestrate a symphony of interaction with the ever-changing environment. A recent awakening has ignited fervor in the quest to amplify the prowess of the mighty large language models (LLMs). The mission is to adorn them with instruments that break free from the shackles of static knowledge and the customary text-in, text-out interface. The grand objective: to bestow upon these models the dynamic prowess to seize real-time knowledge, commune with external realms of reason, and execute resounding feats in the expansive tapestry beyond.

Ventures into the realm of tool-augmented LLMs revolve around a noble cause—simplifying the integration of tools or enhancing the mastery to access an abundance of tools, a realm exemplified by the staggering potential of unleashing up to 16,000 APIs. These quests predominantly follow two venerable paths:

  1. 1) The saga of In-Context Learning (ICL), where frozen LLMs are beckoned with the whispers of API specifications and tales of tool usage (instruction-API call pairs), and
  2. 2) the epic journey of fine-tuning, where LLMs craft tool use examples, forging them into a potent weapon.

The untold saga

In the realm of honing mastery and wielding tools with unrivaled finesse, there lies an untold saga, a tale overshadowed by the quest for extensive coverage and flexibility. This hidden facet, obscured in the shadows, is none other than the accuracy with which a Large Language Model (LLM) channels the essence of the tools it has been forged with. While the echoes of In-Context Learning (ICL) reverberate with flexibility, the elusive pinnacle of production-level accuracy remains shrouded in the mists of challenge. Fine-tuning, a noble endeavor with the promise of heightened accuracy through an arsenal of examples, has, alas, directed its gaze toward the uncharted realms of unseen tools, forsaking the optimization of an LLM's prowess in wielding the tools it encountered during its training odyssey. In the crucible of reality, where tool-augmented LLMs embark on journeys fraught with consequential actions—be it navigating the intricate dance of financial transactions or executing legally binding operations—a clarion call for accuracy echoes through the corridors. The specter of inaccurate tool usage, a looming nemesis, threatens to unfurl undesirable and harmful outcomes, casting shadows that swiftly erode the sacred trust bestowed upon these majestic models.

To truly unravel the secrets of tool mastery, the quest turns to the sacred annals of biological wisdom, where humans, apes, and corvids etch their heroic tales. The profound cognitive odyssey of learning to wield a tool unfolds through the tapestry of diverse cognitive processes.

The melody of trial and error

At its core, the melody of trial and error resonates as a symphony in the grand opera of tool learning. The artistry of mastering a tool transcends the mere recitation of its user manual; instead, it calls for an exploration of myriad methods, an observance of outcomes, and a dance with the dichotomy of success and failure. Intelligent beings, akin to celestial artists, weave not only the threads of trial and error but also actively conjure visions, imagining and simulating plausible scenarios beyond the realms of perception.

Behold, the symphony of remembrance, an epic tale woven by the threads of both fleeting whispers and enduring echoes, resonates across the annals of time. In the grand tapestry of knowledge, both the ephemeral dance of short-term memory and the enduring saga of long-term recollection orchestrate a pivotal melody, guiding the seekers of wisdom through the realms of progressive learning and the cyclical embrace of tools\.

Amidst the cosmic challenges that echo through the celestial corridors, a beacon of hope emerges—Simulated Trial and Error (STE), a mythical approach inspired by the very essence of biological mechanisms. This mystical quest seeks to empower the majestic beings known as Large Language Models (LLMs) with the prowess of tools. When presented with the arcane artifact of a tool, an API clad in enigmatic specifications, STE calls forth the LLM to embark on a journey of simulation and imagination. The ethereal landscapes of plausible scenarios unfold, birthing instructions that unveil the arcane secrets of wielding the tool.

A celestial dance commences—a ballet of iterative interactions with the API, a symphony of synthesized instructions echoing through the cosmic expanse. The vigilant observers, guardians of the ongoing trial, stand witness to the cosmic performance. To enrich the tapestry of simulated instructions, memory mechanisms take root. A short-term memory, akin to the comet's trail in the cosmic canvas, captures recent trajectories of trial and error, guiding the seeker to profound depths within a single episode. In tandem, a long-term memory, an ancient tome imbued with distilled wisdom from past explorations and reflections, becomes the guiding star, steering the course of progressive learning over the epochs.

As the cosmic odyssey unfolds, the denouement reveals a pivotal stage—the exploitation of newfound wisdom. The tool-use examples, forged in the crucible of exploration, become the elixir that fine-tunes the LLM. Alternatively, the ancient scrolls of In-Context Learning (ICL) unfold, allowing the retrieval of examples from the hallowed trials of exploration. In this mystical realm, the seekers of knowledge find themselves at the crossroads of imagination and reality, wielding the enchanted tools to script their destiny.

ToolBench APIs

Amidst the celestial crucible of ToolBench APIs, a fellowship of visionary researchers, including the illustrious Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, and Yu Su, embarked on an odyssey that unraveled profound revelations:

In the enchanted realm of Large Language Models (LLMs), a lament echoed—their tool-use prowess, a mere flicker in the cosmic expanse. GPT-4 stood adorned with a correctness veil of 60.8%, and the specialized artisan, ToolLLaMAv2, a mere luminary at 37.3%.

A mystical incantation, Simulated Trial and Error (STE), resonated through the hallowed halls of discovery, proving to be the elixir that bestowed upon LLMs an arcane augmentation in both the sacred rites of In-Context Learning (ICL) and the refined dance of fine-tuning. Mistral-Instruct-7B, once a mere apprentice, ascended to glory with a triumphant 76.8% correctness—a surge of 46.7%, eclipsing even the might of GPT-4 with ICL.

Yet, the saga unveiled a quandary—a cosmic conundrum in the continual introduction of new tools. Fine-tuning, a double-edged sword, presented the specter of catastrophic forgetting, threatening to cast a shadow over the hallowed knowledge of existing tools. Fear not, for the researchers, architects of destiny, wielded a weapon—Experience Replay Strategy. This enchantment wove a protective cloak, preserving the ancient skills while ushering in the wisdom of new tools.

In the epic's denouement, inspired by the timeless dance of tool mastery among mortals, the researchers unfurled Simulated Trial and Error (STE)—a sacred rite for Large Language Models (LLMs) to commune with the tools of creation. This arcane art, born of a progressive memory-based trial-and-error framework, manifested its prowess through experiments with ToolBench APIs. As the cosmic dust settled, the echoes of a rehearsal-based fine-tuning hymn reverberated—a melody that harmonized the continuous learning of new tools while cradling the cherished skills of yore. The researchers, torchbearers of wisdom, thus inscribed their saga in the annals of scholarly constellations.

The epic saga of exploration and mastery

In the epic saga of exploration and mastery, the authors, akin to wise sages, acknowledged certain constraints in their noble quest:

Iterative Enhancement: The current saga unfolds with the deployment of stalwart models for the perilous journey of exploration, while relying on diminutive models of lesser prowess for the exploits that follow. Yet, a whisper in the cosmic winds suggests an alternative path—a journey of iterative exploration and exploitation, a concept etched in the ancient scrolls of previous studies. In this unfolding tale, the once unyielding dependence on mighty models might gradually wane, perhaps casting them into the role of vigilant sentinels as the burgeoning capabilities of the enhanced models unfurl over the sands of time.

Compositional Symphony of Tools & Stratagems: Another chapter in the grand tapestry of toolcraft unveils the need for the orchestration of multiple tools, a symphony of calls to address the complex queries that echo through the corridors of knowledge. Alas, this endeavor diverges from the authors' current focus, a departure noted with an air of contemplation. Recent scrolls foretell a revelation—Large Language Models (LLMs), beings of intrinsic ability, may channel their fundamental powers from the epochs of pretraining without the elaborate rituals of extensive fine-tuning or alignment. Thus, the cosmic forces suggest that, contrary to the emphasis on prodigious learning and boundless exploration, the adaptation of LLMs for the intricate dance of toolcraft may require not the forging of extensive data, but a communion with the wisdom from the tool side.

The Boundless Memory Odyssey: Within the ethereal corridors of remembrance, the authors acknowledge a constraint—the augmented memory, a sanctuary of knowledge, bound by the limits of the Large Language Model's (LLM) contextual grasp. To unravel this enigma, the authors invoke the cosmic musings, contemplating avenues to expand this sanctum of memory. Whispers speak of arcane techniques—adding retrieval modules as mystical keys or sculpting the memory into hierarchies and compressions. These paths, illuminated by the cosmic constellations, beckon the authors to transcend the limits and forge a memory that echoes through the ages.

Tool Oblivion and the Unlearning Quest: In the enchanting chronicles of discovery, while the authors embarked on a noble odyssey of ceaseless learning, a formidable nemesis emerged—Tool Unlearning. This adversary, born from the constant shifting tides of tools unloading and fading into the mists of obsolescence, cast a shadow over the heroes' path. The complexity of untangling the tendrils of acquired knowledge, a known enigma whispered in the ancient scrolls, now demanded attention. Yet, hope gleamed on the horizon, and the authors glimpsed a beacon in the form of ToolkenGPT. This legendary artifact promised a plug-and-play adaptation, a mystical key to the labyrinth of unlearning, while drawing inspiration from the vast tapestry of large-scale examples.

Boundaries of Example-Led Refinement: As the heroes navigated the uncharted territories of example-based fine-tuning, they encountered an elusive challenge—the delineation between when to wield a tool and when to restrain. The arcane art of instructing the model to discern this delicate balance, using positive tool-use examples alone, proved to be a conundrum etched in the cosmic code. The authors, akin to astute seers, foresaw several potential pathways to pierce through this veil of uncertainty. These included invoking the shadows of negative examples, with the contrast of objectives as their guiding light. Alternatively, weaving the intricate threads of API aspects into the fabric of example-based training became another secret incantation. The authors, in their wisdom, left these mystical avenues open for the future scribes and scholars to explore, knowing that the quest for refinement in the realm of toolcraft is an ever-evolving saga.

Ref.: See on Github