Agent Tools

How agents interact with the world.

How tool calling works

When you send a message, the agent decides which tools (if any) to use to complete the task. The LLM provider returns structured tool calls with a name, arguments, and a unique ID. The agent executes each tool, collects the results, and feeds them back to the LLM for the next step. This loop continues until the agent has a final text response.

Tool specs are sent to the LLM as part of the API request. Providers that support native function calling (Anthropic, OpenAI, OpenRouter) use structured tool schemas. For other providers, tool specs are injected into the system prompt as markdown and the agent parses XML tool calls from the response.

Available tools

Every agent has access to a core set of tools. The browser tool is conditionally enabled based on agent config.

shell

Always available

Execute shell commands in the workspace directory. This is the agent's most versatile tool - it can run any command the workspace user has access to: Python scripts, Node.js, curl, git, package managers, compilers, and anything else installed in the container.

  • 60-second timeout per command.
  • Output capped at 1 MB to prevent context overflow.
  • Environment sanitized - only safe variables (PATH, HOME, TERM, LANG) passed through.
  • All commands except a short blocklist (shutdown, reboot, supervisorctl, kill) are permitted.

browser

Enabled when browser config is on

Control the real Chrome browser visible via VNC. The agent connects to Chrome through the DevTools Protocol (CDP) and can perform 17 distinct actions:

navigateclicktype_textpress_keyscreenshotget_textget_htmlget_urlget_titlefind_elementsevaluate_jswaitscrollhoverset_cookieget_cookiesclose_tab

The browser is the same one you see in the VNC viewer. When the agent navigates to a page, you watch it happen live. Domain restrictions can be configured per agent. In container mode, all domains are allowed by default.

memory_store / memory_recall / memory_forget

Always available

Long-term memory that persists across conversations. The agent can store facts, preferences, and notes under categorized keys, then recall them by semantic search later. Three categories:

  • core - permanent facts and preferences.
  • daily - session-scoped notes.
  • conversation - context from the current thread.

Recall returns scored results ranked by relevance, so the agent can find the most pertinent memories even with vague queries. Forget removes a memory by its exact key.

Security model

The Docker container is the security boundary. The agent can run almost any command and browse any domain. Security relies on container isolation: dropped capabilities, no-new-privileges, forbidden system paths, and a short blocklist of destructive commands.

The tool loop

A single message can trigger multiple rounds of tool use. Here's the flow:

  1. Your message is sent to the LLM with the conversation history and tool specs.
  2. The LLM returns text, tool calls, or both.
  3. If there are tool calls, the agent executes each one and collects results.
  4. Tool results are appended to the conversation and sent back to the LLM.
  5. Steps 2–4 repeat until the LLM returns a response with no tool calls.
  6. The final text response is streamed to your chat.

During streaming, you see the agent's text as it's generated. Tool executions happen between streaming rounds - you'll see tool use indicators in the chat while the agent works.

For provider-specific inbound/outbound channel flow, see Channels docs.