Tool Use
The ability of a language model to call external tools --- APIs, code interpreters, databases, file systems --- rather than only generating text, turning it from a text predictor into a system that can act on the world.
What is it?
A language model on its own can only do one thing: predict the next token in a sequence of text. It cannot check the weather, query a database, send an email, or run a calculation it is unsure about. It can only write about doing those things. Tool use (also called function calling) is the capability that bridges this gap --- it lets a model recognise when it needs external help, call the right tool with the right inputs, and incorporate the result back into its reasoning.1
Tool use is what separates a chatbot from an agent.2 A chatbot answers your question from its training data. A tool-using agent can look up today’s stock price, execute a Python snippet to verify its arithmetic, search the web for recent events, or create a file on disk. The model itself does not execute the tool --- it generates a structured request (a function call), the host system executes it, and the result is fed back to the model for the next step.
The progression is clear: a text-only LLM can only generate language; an LLM with function calling can invoke pre-defined tools the developer has wired up; an LLM with MCP server connections can discover and use tools dynamically across standardised servers.3 Each step increases what the model can do in the world, moving it along the autonomy spectrum described in the agentic-systems parent card.
In plain terms
A language model without tools is like a brilliant advisor locked in a room with no phone, no computer, and no door. They can give you impressive advice based on what they already know, but they cannot check a fact, look something up, or do anything for you. Tool use gives them a phone, a laptop, and a set of keys --- now they can call people, run searches, and take action on your behalf.
At a glance
The tool-call loop (click to expand)
sequenceDiagram participant U as User participant L as LLM participant T as Tool U->>L: Task or question L->>L: Reason about what to do L->>T: Tool call with parameters T-->>L: Tool result L->>L: Incorporate result and reason again L-->>U: Final answer or next tool callKey: The user sends a task. The LLM reasons about what tool to call, generates a structured tool call, receives the result, and either answers the user or calls another tool. This loop can repeat multiple times before the final response.
How does it work?
1. Tool definitions as typed contracts
Before a model can use a tool, it needs to know what tools are available. Each tool is described to the model as a typed contract: a name, a natural-language description of what it does, a set of parameters with types, and a return type.1
For example, a weather tool might be defined as:
{
"name": "get_weather",
"description": "Returns the current weather for a given city",
"parameters": {
"city": { "type": "string", "description": "City name" },
"units": { "type": "string", "enum": ["celsius", "fahrenheit"] }
},
"returns": { "type": "object" }
}The quality of this definition directly affects how reliably the model uses the tool. Vague descriptions lead to wrong tool selections. Missing parameter descriptions lead to incorrect inputs. Tool design is API design --- the same principles of clarity and completeness apply.4
Think of it like...
A tool definition is a menu item in a restaurant. The name tells the LLM what dish is available, the description explains what it contains, and the parameters are the customisation options (size, spice level, side dish). If the menu is poorly written, the customer orders the wrong thing.
2. The tool-call loop
Tool use follows a repeating cycle known as the ReAct pattern (Reasoning + Acting): the model reasons about what to do, acts by calling a tool, observes the result, and reasons again.5
A single user request can trigger multiple loops. Consider an agent asked “What is the weather in Lausanne and should I bring an umbrella?” The loop might look like:
- Reason: I need the current weather for Lausanne
- Act: Call
get_weather(city="Lausanne", units="celsius") - Observe: Result says 14C, 80% chance of rain
- Reason: High rain probability means the user should bring an umbrella
- Respond: “It is 14C in Lausanne with an 80% chance of rain --- bring an umbrella.”
The model does not execute the tool itself. The host application intercepts the tool call, executes it, and feeds the result back. This separation is important for security --- the model proposes actions, but the system controls what actually happens.2
Think of it like...
A doctor (the LLM) orders a blood test (the tool call). The doctor does not draw the blood or run the analysis --- the lab does. The lab sends results back to the doctor, who interprets them and decides the next step. The doctor’s skill is in knowing which test to order and how to interpret the results, not in operating the lab equipment.
Example: multi-step tool use (click to expand)
A user asks a coding agent: “Find all Python files in the project that import pandas and count them.”
Loop 1:
- Reason: I need to search for Python files importing pandas
- Act: Call
grep_files(pattern="import pandas", glob="**/*.py")- Observe: Returns 7 matching files with paths
Loop 2:
- Reason: I have the list, now I need to count them
- Act: No tool needed --- I can count from the result
- Respond: “There are 7 Python files that import pandas” followed by the file list
The agent decided it needed one tool call and could handle the second step (counting) internally. Good tool use means calling tools only when necessary.
3. What makes tool use reliable
Not all tool-using agents are equally reliable. Several design factors determine how well an agent uses its tools:4
- Tool granularity: Tools that do one thing well are easier for models to use correctly than Swiss-army-knife tools that do many things. A
search_webtool is clearer than ado_researchtool that searches, summarises, and formats. - Parameter validation: Checking that the model’s tool call has valid parameters before execution prevents cascading errors. If the model passes an invalid date format, catch it early.
- Error handling: Tools should return clear error messages the model can reason about. “404: City not found” is actionable. A raw stack trace is not.
- Tool count: Models perform better with fewer, well-described tools than with dozens of overlapping options. Anthropic’s research suggests that tool selection accuracy decreases as the number of available tools grows.1
Key distinction
The model does not run tools --- it requests tool calls. The host system decides whether to execute them. This is where guardrails come in: the system can refuse, modify, or require approval for any tool call before execution.
4. The MCP pattern
The Model Context Protocol (MCP) is an open standard developed by Anthropic that standardises how AI models connect to external tools and data sources.3 Before MCP, every tool integration required custom code --- each model provider had its own function-calling format, and each tool had its own API. MCP provides a universal interface.
The analogy Anthropic uses is “USB-C for AI”: just as USB-C provides a single connector standard for charging, data transfer, and display across devices, MCP provides a single protocol for tool discovery, invocation, and result handling across AI models and tool providers.3
In practice, an MCP server exposes a set of tools (with typed definitions), and an MCP client (the AI application) can discover and call those tools using a standardised JSON-RPC protocol. This means:
- A tool built once as an MCP server works with any MCP-compatible model
- An AI application can discover new tools at runtime without code changes
- Tool definitions are standardised, reducing integration errors
Concept to explore
MCP is evolving rapidly. As of early 2026, it is supported by Claude, Cursor, Windsurf, and a growing ecosystem of tool servers. See the MCP documentation for the current specification.
Yiuno example (click to expand)
This knowledge system uses MCP servers during card creation. When Claude Code writes a concept card, it connects to an Exa MCP server for web search and a file system MCP server for reading and writing files. The agent does not need custom code for each --- it discovers the available tools through the MCP protocol and calls them using standardised tool definitions. The same agent could connect to different MCP servers (a database, a calendar, a code execution environment) without any changes to its core logic.
Why do we use it?
Key reasons
1. Grounding in reality. A text-only model can only draw on its training data, which has a knowledge cutoff and may contain errors. Tool use lets the model check facts, access live data, and verify its own outputs against real-world sources.2
2. Extending capabilities. Language models are poor at precise arithmetic, cannot access private databases, and do not know what happened after their training cutoff. Tools fill these gaps --- a calculator for maths, a database connector for private data, a web search for current events.1
3. Taking action. Without tools, a model can only describe what should be done. With tools, it can do it --- send the email, create the file, deploy the code, book the appointment.4
4. Composability. Tools can be combined in sequences to accomplish complex tasks that no single tool could handle. Search the web, extract key facts, write a summary, save it to a file --- each step uses a different tool, orchestrated by the model’s reasoning.5
When do we use it?
- When the task requires information the model does not have (live data, private databases, recent events)
- When the task involves computation the model should not attempt in its head (complex maths, data analysis, code execution)
- When the task requires side effects in the real world (sending messages, creating files, calling APIs, modifying databases)
- When accuracy matters more than speed and the model should verify rather than guess
- When building agentic systems that need to operate autonomously across multiple steps
Rule of thumb
If the task requires the model to know something it was not trained on, calculate something precisely, or change something in the real world, it needs tool use. If the task is pure language generation from general knowledge, tools are unnecessary overhead.
How can I think about it?
The Swiss Army knife
Imagine a person stranded on a hiking trail with a broken backpack strap. They are intelligent and resourceful, but without tools they can only describe how to fix it. Hand them a Swiss Army knife and suddenly they can cut cord, punch holes, tighten screws, and actually repair the strap.
- The hiker = the language model (reasoning capability)
- The Swiss Army knife = the set of available tools
- Each blade/tool = a specific function (web search, code execution, file write)
- Choosing which blade to open = tool selection (the model picks the right function)
- The fixed strap = the real-world outcome (not just a description of how to fix it)
- The knife’s instruction manual = the tool definitions (what each blade does and how to use it)
The hiker’s intelligence determines when and how to use each tool. The tools determine what is actually possible. Neither is useful without the other.
The call centre agent with a computer
A call centre agent answering phones from memory can handle simple, common questions. Give them a computer with access to the customer database, order system, and knowledge base, and they can handle almost anything.
- The call centre agent = the LLM (conversational ability)
- The computer and its applications = the tools
- Looking up a customer record = a database query tool call
- Processing a refund = an API tool call with side effects
- Checking the knowledge base = a search tool call
- The agent deciding what to look up = the reasoning step in the tool-call loop
- Company policy on what actions require supervisor approval = guardrails
The agent’s value comes from combining conversational skill with system access. Without the computer, they are limited to memorised answers. Without conversational skill, the computer sits unused.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| llm-pipelines | How language models are connected to tools and data sources in production | stub |
| orchestration | Coordinating multiple tool-using agents in complex workflows | stub |
| guardrails | Constraints that control what tools an agent can use and when | stub |
| apis | The interface pattern that most tools expose for integration | complete |
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain why a language model without tool use is fundamentally limited, even if it is highly capable at generating text.
- Name the four components of a tool definition and describe why each matters for reliable tool use.
- Distinguish between tool use and the ReAct pattern. How does the ReAct pattern build on basic tool calling?
- Interpret this scenario: an agent is given 50 tools but consistently calls the wrong one for database queries. What design factor is most likely at fault, and how would you fix it?
- Connect tool use to the Model Context Protocol: what problem does MCP solve that basic function calling does not?
Where this concept fits
Position in the knowledge graph
graph TD AIML[AI and Machine Learning] --> AS[Agentic Systems] AS --> TU[Tool Use] AS --> LLM[LLM Pipelines] AS --> ORCH[Orchestration] AS --> GR[Guardrails] TU -.->|calls| API[APIs] style TU fill:#4a9ede,color:#fffRelated concepts:
- apis --- tools typically expose their functionality through APIs; tool use is the agent-side pattern for calling them
- llm-pipelines --- tool use is a core component of how language models are wired into production systems
- orchestration --- when multiple agents use tools, orchestration determines which agent calls which tool and when
- guardrails --- constraints that control which tools an agent can call and what approval is needed
Sources
Further reading
Resources
- Function Calling (Agent Wiki) --- Comprehensive reference covering function calling mechanics, model support, and integration patterns
- Tool Use: How AI Agents Interact with the Real World (Hopx AI) --- Practical guide with the “brain without a body” framing and real-world examples
- The Anatomy of Tool Calling in LLMs (martinuke0) --- Deep technical dive into how tool calling works inside language models
- Model Context Protocol Documentation --- Official MCP specification and guides for building MCP servers and clients
- The ReAct Pattern Explained (Cowork Ink) --- Complete guide to the Reasoning + Acting loop with benchmarks and failure modes
Footnotes
-
Agent Wiki. (2026). Function Calling. Agent Wiki. ↩ ↩2 ↩3 ↩4
-
Hopx AI. (2025). Tool Use: How AI Agents Interact with the Real World. Hopx AI. ↩ ↩2 ↩3
-
Web4Agents. (2026). Model Context Protocol (MCP) --- Docs. Web4Agents. ↩ ↩2 ↩3
-
AgenticCareers. (2026). Tool Use and Function Calling: The Practical Developer’s Guide for 2026. AgenticCareers. ↩ ↩2 ↩3
-
Cowork Ink. (2026). The ReAct Pattern Explained: AI Agent Reasoning in 2026. Cowork Ink. ↩ ↩2
