Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The agent loop

AgentLoop is the commodity while loop at the center of every agent. It composes a Provider, a ToolRegistry, and a ContextStrategy into a loop that calls the LLM, executes tools, manages context, and repeats until the model returns a final text response or a limit is reached.

Quick example

use neuron_context::SlidingWindowStrategy;
use neuron_loop::AgentLoop;
use neuron_tool::ToolRegistry;
use neuron_types::ToolContext;

let provider = Anthropic::from_env()?;
let context = SlidingWindowStrategy::new(20, 100_000);

let mut tools = ToolRegistry::new();
tools.register(MySearchTool);
tools.register(MyCalculateTool);

let mut agent = AgentLoop::builder(provider, context)
    .tools(tools)
    .system_prompt("You are a helpful research assistant.")
    .max_turns(15)
    .parallel_tool_execution(true)
    .build();

let ctx = ToolContext::default();
let result = agent.run_text("Find the population of Tokyo", &ctx).await?;
println!("Response: {}", result.response);
println!("Turns: {}, Tokens: {} in / {} out",
    result.turns, result.usage.input_tokens, result.usage.output_tokens);

Building an AgentLoop

The builder pattern

AgentLoop::builder(provider, context) returns an AgentLoopBuilder with sensible defaults. Only the provider and context strategy are required.

let agent = AgentLoop::builder(provider, context)
    .tools(registry)                    // ToolRegistry (default: empty)
    .system_prompt("You are helpful.")  // SystemPrompt (default: empty)
    .max_turns(10)                      // Option<usize> (default: None = unlimited)
    .parallel_tool_execution(true)      // bool (default: false)
    .usage_limits(limits)               // UsageLimits (default: no limits)
    .hook(my_logging_hook)              // ObservabilityHook (can add multiple)
    .durability(my_durable_ctx)         // DurableContext (optional)
    .build();

Direct construction

You can also construct directly when you need to set the full LoopConfig:

use neuron_loop::{AgentLoop, LoopConfig};
use neuron_types::SystemPrompt;

let config = LoopConfig {
    system_prompt: SystemPrompt::Text("You are a code reviewer.".into()),
    max_turns: Some(20),
    parallel_tool_execution: true,
    ..Default::default()
};

let agent = AgentLoop::new(provider, tools, context, config);

Running the loop

run() – drive to completion

Appends the user message, then loops until the model returns a text-only response or the turn limit is reached.

let result = agent.run(Message::user("Hello!"), &tool_ctx).await?;
// result: AgentResult { response, messages, usage, turns }

run_text() – convenience for text input

Wraps a &str into a Message::user() and calls run():

let result = agent.run_text("What is 2 + 2?", &tool_ctx).await?;

run_stream() – streaming output

Uses provider.complete_stream() for real-time token output. Returns a channel receiver that yields StreamEvents:

let mut rx = agent.run_stream(Message::user("Explain Rust ownership"), &tool_ctx).await;

while let Some(event) = rx.recv().await {
    match event {
        StreamEvent::TextDelta(text) => print!("{text}"),
        StreamEvent::ToolUse { name, .. } => println!("\n[calling {name}...]"),
        StreamEvent::Usage(usage) => println!("\n[{} tokens]", usage.output_tokens),
        StreamEvent::MessageComplete(_) => println!("\n[done]"),
        StreamEvent::Error(err) => eprintln!("Error: {err}"),
        _ => {}
    }
}

Tool execution is handled between streaming turns. The loop streams the LLM response, executes any tool calls, appends results, and streams the next turn.

run_step() – one turn at a time

Returns a StepIterator that lets you advance the loop manually. Between turns you can inspect messages, inject new ones, and modify the tool registry.

let mut steps = agent.run_step(Message::user("Plan a trip"), &tool_ctx);

while let Some(turn) = steps.next().await {
    match turn {
        TurnResult::ToolsExecuted { calls, results } => {
            println!("Executed {} tools", calls.len());
            // Optionally inject guidance between turns
            steps.inject_message(Message::user("Focus on budget options."));
        }
        TurnResult::FinalResponse(result) => {
            println!("Final: {}", result.response);
        }
        TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
            println!("Compacted: {old_tokens} -> {new_tokens} tokens");
        }
        TurnResult::MaxTurnsReached => {
            println!("Hit turn limit");
        }
        TurnResult::Error(e) => {
            eprintln!("Error: {e}");
        }
    }
}

StepIterator exposes:

  • next() – advance one turn
  • messages() – view current conversation
  • inject_message(msg) – add a message between turns
  • tools_mut() – modify the tool registry between turns

Distinguishing text responses from tool calls

TurnResult is the key abstraction for telling apart a direct LLM message from a tool-call round trip. When the model returns plain text and no tool calls, the iterator yields TurnResult::FinalResponse containing the finished AgentResult. When the model requests one or more tool calls, the loop executes them and yields TurnResult::ToolsExecuted with the calls and their results. The loop handles dispatch automatically — you just match on the variant.

let mut steps = agent.run_step(Message::user("What's 2 + 2?"), &tool_ctx);

while let Some(turn) = steps.next().await {
    match turn {
        TurnResult::ToolsExecuted { calls, results } => {
            // The model requested tool calls — they've been executed
            for (call_id, tool_name, input) in &calls {
                println!("Model called tool '{tool_name}' with {input}");
            }
            // results contains the ContentBlock::ToolResult for each call
            // The loop automatically sends these back to the model
        }
        TurnResult::FinalResponse(result) => {
            // The model returned a text response — no more tool calls
            println!("Final answer: {}", result.response);
            println!("Total turns: {}", result.turns);
        }
        TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
            println!("Context compacted: {old_tokens} → {new_tokens} tokens");
            // Loop continues automatically
        }
        TurnResult::MaxTurnsReached => {
            println!("Turn limit reached without a final response");
        }
        TurnResult::Error(e) => {
            eprintln!("Loop error: {e}");
        }
    }
}

If you only need the final result and don’t need turn-by-turn control, use run() or run_text() instead — they drive the loop to completion and return AgentResult directly.

AgentResult

Returned by run(), run_text(), and TurnResult::FinalResponse:

pub struct AgentResult {
    pub response: String,       // Final text response from the model
    pub messages: Vec<Message>, // Full conversation history
    pub usage: TokenUsage,      // Cumulative token usage across all turns
    pub turns: usize,           // Number of turns completed
}

Loop lifecycle

Each iteration of the loop follows this sequence:

  1. Check cancellation – if tool_ctx.cancellation_token is cancelled, return LoopError::Cancelled
  2. Check max turns – if the turn limit is reached, return LoopError::MaxTurns
  3. Check usage limits – if any token, request, or tool call limit is exceeded, return LoopError::UsageLimitExceeded
  4. Fire LoopIteration hooks
  5. Check context compaction – call context.should_compact() and context.compact() if needed
  6. Build CompletionRequest from current messages, system prompt, and tool definitions
  7. Fire PreLlmCall hooks
  8. Call the provider (or durable context if set)
  9. Fire PostLlmCall hooks
  10. Accumulate token usage
  11. Check stop reason:
    • StopReason::Compaction – append message and continue the loop
    • StopReason::EndTurn or no tool calls – extract text and return AgentResult
    • StopReason::ToolUse – proceed to tool execution
  12. Check cancellation again before tool execution
  13. Execute tool calls (parallel or sequential), firing PreToolExecution and PostToolExecution hooks for each
  14. Check usage limits – verify tool call count against limit
  15. Append tool results as a user message and loop back to step 1

How tool call processing works

When the LLM decides to use a tool, the loop handles the entire dispatch cycle automatically. Here is exactly what happens at the code level:

Step 1: LLM returns tool calls. The provider responds with StopReason::ToolUse and one or more ContentBlock::ToolUse blocks in the assistant message. Each block contains a name, input (JSON arguments), and a unique id.

// Inside the loop — the LLM response contains tool calls:
// response.stop_reason == StopReason::ToolUse
// response.message.content == [
//     ContentBlock::Text("Let me look that up."),
//     ContentBlock::ToolUse { id: "call_1", name: "get_weather", input: {"city": "Tokyo"} },
// ]

Step 2: Extract tool calls. The loop filters the assistant message for ContentBlock::ToolUse blocks and collects them as (id, name, input) tuples. The full assistant message (including any text) is appended to the conversation.

let tool_calls: Vec<_> = response.message.content.iter()
    .filter_map(|block| {
        if let ContentBlock::ToolUse { id, name, input } = block {
            Some((id.clone(), name.clone(), input.clone()))
        } else {
            None
        }
    })
    .collect();

// Append the assistant message (with both text and tool use blocks)
self.messages.push(response.message.clone());

Step 3: Execute tools via the registry. Each tool call is dispatched to the ToolRegistry, which finds the matching tool by name, deserializes the JSON input, and calls the tool’s call() method. Pre- and post-execution hooks fire around each call.

// For each tool call, the loop calls execute_single_tool:
// 1. Fire PreToolExecution hooks (can Skip or Terminate)
// 2. Call self.tools.execute(tool_name, input, tool_ctx)
// 3. Fire PostToolExecution hooks
// 4. Wrap the ToolOutput into a ContentBlock::ToolResult

If parallel_tool_execution is true and there are multiple tool calls, all calls run concurrently via futures::future::join_all. Otherwise they execute sequentially.

Step 4: Append results and continue. The tool results are collected into ContentBlock::ToolResult blocks (each linked back to the original call by tool_use_id) and appended as a User message. The loop then continues — the LLM sees the tool results and can respond with text or call more tools.

// Each tool result looks like:
// ContentBlock::ToolResult {
//     tool_use_id: "call_1",
//     content: [ContentItem::Text("{\"temp\": 22, \"conditions\": \"sunny\"}")],
//     is_error: false,
// }

// All results are appended as a single user message
self.messages.push(Message {
    role: Role::User,
    content: tool_result_blocks,
});
// Loop continues → LLM sees results → responds or calls more tools

Special case — ToolError::ModelRetry. If a tool returns Err(ToolError::ModelRetry(hint)), the loop does not propagate an error. Instead, it converts the hint into a ToolResult with is_error: true. The model receives the hint and can retry with corrected arguments:

// Tool returns: Err(ToolError::ModelRetry("city must be a valid name, got '123'"))
// Loop converts to: ContentBlock::ToolResult {
//     tool_use_id: "call_1",
//     content: [ContentItem::Text("city must be a valid name, got '123'")],
//     is_error: true,
// }
// Model sees the error and retries: get_weather({"city": "Tokyo"})

Complete flow diagram

User: "What's the weather in Tokyo?"
    │
    ▼
┌─────────────────────────────────────────────┐
│ Turn 1: LLM call                            │
│   Request: [User: "What's the weather..."]  │
│   Response: ToolUse(get_weather, {city:      │
│             "Tokyo"})                        │
│   StopReason: ToolUse                       │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│ Tool execution                              │
│   Registry dispatches get_weather           │
│   Tool returns: {temp: 22, conditions:      │
│                  "sunny"}                    │
│   Result appended as ToolResult message     │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│ Turn 2: LLM call                            │
│   Request: [User, Assistant(ToolUse),       │
│             User(ToolResult)]               │
│   Response: "It's 22°C and sunny in Tokyo." │
│   StopReason: EndTurn                       │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
              AgentResult {
                  response: "It's 22°C and sunny in Tokyo.",
                  turns: 2,
                  ...
              }

Cancellation

The loop checks ToolContext.cancellation_token at two points:

  1. Top of each iteration (before the max turns check)
  2. Before tool execution (after the LLM returns tool calls)
use tokio_util::sync::CancellationToken;

let token = CancellationToken::new();
let ctx = ToolContext {
    cancellation_token: token.clone(),
    ..Default::default()
};

// Cancel from another task
tokio::spawn(async move {
    tokio::time::sleep(Duration::from_secs(30)).await;
    token.cancel();
});

match agent.run_text("Long task...", &ctx).await {
    Err(LoopError::Cancelled) => println!("Cancelled!"),
    Ok(result) => println!("{}", result.response),
    Err(e) => eprintln!("{e}"),
}

Parallel tool execution

When LoopConfig.parallel_tool_execution is true and the LLM returns multiple tool calls in a single response, all calls execute concurrently via futures::future::join_all. When false (the default), tools execute sequentially in order.

let agent = AgentLoop::builder(provider, context)
    .parallel_tool_execution(true)
    .tools(registry)
    .build();

Parallel execution applies to run() and run_step(). Streaming (run_stream()) always executes tools sequentially.

Usage limits

UsageLimits enforces token and request budgets on the agent loop. When any limit is exceeded, the loop returns LoopError::UsageLimitExceeded with a message describing which limit was hit.

use neuron_loop::AgentLoop;
use neuron_types::UsageLimits;

let limits = UsageLimits::default()
    .with_input_tokens_limit(500_000)
    .with_output_tokens_limit(50_000)
    .with_total_tokens_limit(600_000)
    .with_request_limit(25)
    .with_tool_calls_limit(100);

let agent = AgentLoop::builder(provider, context)
    .tools(registry)
    .usage_limits(limits)
    .build();

Each field is optional – set only the limits you care about. Unset limits are not enforced.

LimitChecked against
input_tokens_limitCumulative TokenUsage.input_tokens across all turns
output_tokens_limitCumulative TokenUsage.output_tokens across all turns
total_tokens_limitSum of cumulative input + output tokens
request_limitNumber of LLM calls made (incremented each turn)
tool_calls_limitNumber of tool executions (incremented per tool call)

The loop checks limits at two points:

  1. Before each LLM call – checks token and request limits against accumulated usage
  2. After tool execution – checks the tool call count against the limit

When a limit is exceeded, the loop stops immediately and returns LoopError::UsageLimitExceeded with a descriptive message (e.g., "output token limit exceeded: 50123 > 50000").

You can also construct UsageLimits directly:

use neuron_types::UsageLimits;

let limits = UsageLimits {
    input_tokens_limit: Some(500_000),
    output_tokens_limit: Some(50_000),
    total_tokens_limit: None,
    request_limit: Some(25),
    tool_calls_limit: None,
};

Or use LoopConfig directly:

use neuron_loop::LoopConfig;
use neuron_types::UsageLimits;

let config = LoopConfig {
    usage_limits: Some(UsageLimits::default()
        .with_total_tokens_limit(1_000_000)
        .with_request_limit(50)),
    ..Default::default()
};

Context compaction

The loop supports two independent compaction mechanisms:

Client-side compaction

Uses the ContextStrategy you provide. Between turns, the loop calls should_compact() and compact() to reduce message history when tokens exceed the configured threshold.

// SlidingWindow compacts by dropping old messages
let agent = AgentLoop::builder(provider, SlidingWindowStrategy::new(20, 100_000))
    .build();

Server-side compaction

When the provider returns StopReason::Compaction, the loop automatically continues without treating it as a final response. The compacted content arrives in ContentBlock::Compaction within the assistant’s message.

No configuration is needed in the loop – it handles this transparently. Set CompletionRequest.context_management on the provider side to enable it.

ToolError::ModelRetry

When a tool returns Err(ToolError::ModelRetry(hint)), the loop converts it to a ToolOutput with is_error: true and the hint as content. The model receives the hint and can retry with corrected arguments.

This does not propagate as LoopError::Tool. The loop continues normally, giving the model a chance to self-correct.

Observability hooks

Add hooks to observe or control loop behavior. Hooks receive events at each step and return HookAction::Continue, HookAction::Skip, or HookAction::Terminate.

use neuron_types::{ObservabilityHook, HookEvent, HookAction, HookError};

struct TokenBudgetHook { max_tokens: usize }

impl ObservabilityHook for TokenBudgetHook {
    async fn on_event(&self, event: HookEvent<'_>) -> Result<HookAction, HookError> {
        match event {
            HookEvent::PostLlmCall { response } => {
                if response.usage.output_tokens > self.max_tokens {
                    return Ok(HookAction::Terminate {
                        reason: "token budget exceeded".into(),
                    });
                }
            }
            _ => {}
        }
        Ok(HookAction::Continue)
    }
}

let agent = AgentLoop::builder(provider, context)
    .hook(TokenBudgetHook { max_tokens: 10_000 })
    .build();

Hook events

EventFired whenSkip/Terminate behavior
LoopIteration { turn }Start of each turnTerminate stops the loop
PreLlmCall { request }Before calling the providerTerminate stops the loop
PostLlmCall { response }After receiving the responseTerminate stops the loop
PreToolExecution { tool_name, input }Before each tool callSkip returns rejection as tool result
PostToolExecution { tool_name, output }After each tool callTerminate stops the loop
ContextCompaction { old_tokens, new_tokens }After context is compactedTerminate stops the loop

Durable execution

For crash-recoverable agents, set a DurableContext on the loop. When present, LLM calls go through DurableContext::execute_llm_call and tool calls go through DurableContext::execute_tool, enabling journaling and replay by engines like Temporal, Restate, or Inngest.

let agent = AgentLoop::builder(provider, context)
    .durability(my_temporal_context)
    .build();

The loop handles the durable/non-durable split transparently. All other behavior (hooks, compaction, cancellation) works the same way.

Error handling

run() and run_text() return Result<AgentResult, LoopError>:

VariantCause
LoopError::Provider(e)LLM call failed
LoopError::Tool(e)Tool execution failed (except ModelRetry)
LoopError::Context(e)Context compaction failed
LoopError::MaxTurns(n)Turn limit reached
LoopError::UsageLimitExceeded(msg)Token, request, or tool call budget exceeded
LoopError::HookTerminated(reason)A hook returned Terminate
LoopError::CancelledCancellation token was triggered

run_stream() sends errors as StreamEvent::Error on the channel instead of returning them as Result.

API reference