The agent loop
AgentLoop is the commodity while loop at the center of every agent. It
composes a Provider, a ToolRegistry, and a ContextStrategy into a loop
that calls the LLM, executes tools, manages context, and repeats until the model
returns a final text response or a limit is reached.
Quick example
use neuron_context::SlidingWindowStrategy;
use neuron_loop::AgentLoop;
use neuron_tool::ToolRegistry;
use neuron_types::ToolContext;
let provider = Anthropic::from_env()?;
let context = SlidingWindowStrategy::new(20, 100_000);
let mut tools = ToolRegistry::new();
tools.register(MySearchTool);
tools.register(MyCalculateTool);
let mut agent = AgentLoop::builder(provider, context)
.tools(tools)
.system_prompt("You are a helpful research assistant.")
.max_turns(15)
.parallel_tool_execution(true)
.build();
let ctx = ToolContext::default();
let result = agent.run_text("Find the population of Tokyo", &ctx).await?;
println!("Response: {}", result.response);
println!("Turns: {}, Tokens: {} in / {} out",
result.turns, result.usage.input_tokens, result.usage.output_tokens);
Building an AgentLoop
The builder pattern
AgentLoop::builder(provider, context) returns an AgentLoopBuilder with
sensible defaults. Only the provider and context strategy are required.
let agent = AgentLoop::builder(provider, context)
.tools(registry) // ToolRegistry (default: empty)
.system_prompt("You are helpful.") // SystemPrompt (default: empty)
.max_turns(10) // Option<usize> (default: None = unlimited)
.parallel_tool_execution(true) // bool (default: false)
.usage_limits(limits) // UsageLimits (default: no limits)
.hook(my_logging_hook) // ObservabilityHook (can add multiple)
.durability(my_durable_ctx) // DurableContext (optional)
.build();
Direct construction
You can also construct directly when you need to set the full LoopConfig:
use neuron_loop::{AgentLoop, LoopConfig};
use neuron_types::SystemPrompt;
let config = LoopConfig {
system_prompt: SystemPrompt::Text("You are a code reviewer.".into()),
max_turns: Some(20),
parallel_tool_execution: true,
..Default::default()
};
let agent = AgentLoop::new(provider, tools, context, config);
Running the loop
run() – drive to completion
Appends the user message, then loops until the model returns a text-only response or the turn limit is reached.
let result = agent.run(Message::user("Hello!"), &tool_ctx).await?;
// result: AgentResult { response, messages, usage, turns }
run_text() – convenience for text input
Wraps a &str into a Message::user() and calls run():
let result = agent.run_text("What is 2 + 2?", &tool_ctx).await?;
run_stream() – streaming output
Uses provider.complete_stream() for real-time token output. Returns a
channel receiver that yields StreamEvents:
let mut rx = agent.run_stream(Message::user("Explain Rust ownership"), &tool_ctx).await;
while let Some(event) = rx.recv().await {
match event {
StreamEvent::TextDelta(text) => print!("{text}"),
StreamEvent::ToolUse { name, .. } => println!("\n[calling {name}...]"),
StreamEvent::Usage(usage) => println!("\n[{} tokens]", usage.output_tokens),
StreamEvent::MessageComplete(_) => println!("\n[done]"),
StreamEvent::Error(err) => eprintln!("Error: {err}"),
_ => {}
}
}
Tool execution is handled between streaming turns. The loop streams the LLM response, executes any tool calls, appends results, and streams the next turn.
run_step() – one turn at a time
Returns a StepIterator that lets you advance the loop manually. Between turns
you can inspect messages, inject new ones, and modify the tool registry.
let mut steps = agent.run_step(Message::user("Plan a trip"), &tool_ctx);
while let Some(turn) = steps.next().await {
match turn {
TurnResult::ToolsExecuted { calls, results } => {
println!("Executed {} tools", calls.len());
// Optionally inject guidance between turns
steps.inject_message(Message::user("Focus on budget options."));
}
TurnResult::FinalResponse(result) => {
println!("Final: {}", result.response);
}
TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
println!("Compacted: {old_tokens} -> {new_tokens} tokens");
}
TurnResult::MaxTurnsReached => {
println!("Hit turn limit");
}
TurnResult::Error(e) => {
eprintln!("Error: {e}");
}
}
}
StepIterator exposes:
next()– advance one turnmessages()– view current conversationinject_message(msg)– add a message between turnstools_mut()– modify the tool registry between turns
Distinguishing text responses from tool calls
TurnResult is the key abstraction for telling apart a direct LLM message from
a tool-call round trip. When the model returns plain text and no tool calls, the
iterator yields TurnResult::FinalResponse containing the finished
AgentResult. When the model requests one or more tool calls, the loop executes
them and yields TurnResult::ToolsExecuted with the calls and their results.
The loop handles dispatch automatically — you just match on the variant.
let mut steps = agent.run_step(Message::user("What's 2 + 2?"), &tool_ctx);
while let Some(turn) = steps.next().await {
match turn {
TurnResult::ToolsExecuted { calls, results } => {
// The model requested tool calls — they've been executed
for (call_id, tool_name, input) in &calls {
println!("Model called tool '{tool_name}' with {input}");
}
// results contains the ContentBlock::ToolResult for each call
// The loop automatically sends these back to the model
}
TurnResult::FinalResponse(result) => {
// The model returned a text response — no more tool calls
println!("Final answer: {}", result.response);
println!("Total turns: {}", result.turns);
}
TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
println!("Context compacted: {old_tokens} → {new_tokens} tokens");
// Loop continues automatically
}
TurnResult::MaxTurnsReached => {
println!("Turn limit reached without a final response");
}
TurnResult::Error(e) => {
eprintln!("Loop error: {e}");
}
}
}
If you only need the final result and don’t need turn-by-turn control, use
run() or run_text() instead — they drive the loop to completion and return
AgentResult directly.
AgentResult
Returned by run(), run_text(), and TurnResult::FinalResponse:
pub struct AgentResult {
pub response: String, // Final text response from the model
pub messages: Vec<Message>, // Full conversation history
pub usage: TokenUsage, // Cumulative token usage across all turns
pub turns: usize, // Number of turns completed
}
Loop lifecycle
Each iteration of the loop follows this sequence:
- Check cancellation – if
tool_ctx.cancellation_tokenis cancelled, returnLoopError::Cancelled - Check max turns – if the turn limit is reached, return
LoopError::MaxTurns - Check usage limits – if any token, request, or tool call limit is
exceeded, return
LoopError::UsageLimitExceeded - Fire
LoopIterationhooks - Check context compaction – call
context.should_compact()andcontext.compact()if needed - Build
CompletionRequestfrom current messages, system prompt, and tool definitions - Fire
PreLlmCallhooks - Call the provider (or durable context if set)
- Fire
PostLlmCallhooks - Accumulate token usage
- Check stop reason:
StopReason::Compaction– append message and continue the loopStopReason::EndTurnor no tool calls – extract text and returnAgentResultStopReason::ToolUse– proceed to tool execution
- Check cancellation again before tool execution
- Execute tool calls (parallel or sequential), firing
PreToolExecutionandPostToolExecutionhooks for each - Check usage limits – verify tool call count against limit
- Append tool results as a user message and loop back to step 1
How tool call processing works
When the LLM decides to use a tool, the loop handles the entire dispatch cycle automatically. Here is exactly what happens at the code level:
Step 1: LLM returns tool calls. The provider responds with
StopReason::ToolUse and one or more ContentBlock::ToolUse blocks in the
assistant message. Each block contains a name, input (JSON arguments), and
a unique id.
// Inside the loop — the LLM response contains tool calls:
// response.stop_reason == StopReason::ToolUse
// response.message.content == [
// ContentBlock::Text("Let me look that up."),
// ContentBlock::ToolUse { id: "call_1", name: "get_weather", input: {"city": "Tokyo"} },
// ]
Step 2: Extract tool calls. The loop filters the assistant message for
ContentBlock::ToolUse blocks and collects them as (id, name, input) tuples.
The full assistant message (including any text) is appended to the conversation.
let tool_calls: Vec<_> = response.message.content.iter()
.filter_map(|block| {
if let ContentBlock::ToolUse { id, name, input } = block {
Some((id.clone(), name.clone(), input.clone()))
} else {
None
}
})
.collect();
// Append the assistant message (with both text and tool use blocks)
self.messages.push(response.message.clone());
Step 3: Execute tools via the registry. Each tool call is dispatched to the
ToolRegistry, which finds the matching tool by name, deserializes the JSON
input, and calls the tool’s call() method. Pre- and post-execution hooks fire
around each call.
// For each tool call, the loop calls execute_single_tool:
// 1. Fire PreToolExecution hooks (can Skip or Terminate)
// 2. Call self.tools.execute(tool_name, input, tool_ctx)
// 3. Fire PostToolExecution hooks
// 4. Wrap the ToolOutput into a ContentBlock::ToolResult
If parallel_tool_execution is true and there are multiple tool calls, all
calls run concurrently via futures::future::join_all. Otherwise they execute
sequentially.
Step 4: Append results and continue. The tool results are collected into
ContentBlock::ToolResult blocks (each linked back to the original call by
tool_use_id) and appended as a User message. The loop then continues —
the LLM sees the tool results and can respond with text or call more tools.
// Each tool result looks like:
// ContentBlock::ToolResult {
// tool_use_id: "call_1",
// content: [ContentItem::Text("{\"temp\": 22, \"conditions\": \"sunny\"}")],
// is_error: false,
// }
// All results are appended as a single user message
self.messages.push(Message {
role: Role::User,
content: tool_result_blocks,
});
// Loop continues → LLM sees results → responds or calls more tools
Special case — ToolError::ModelRetry. If a tool returns
Err(ToolError::ModelRetry(hint)), the loop does not propagate an error.
Instead, it converts the hint into a ToolResult with is_error: true. The
model receives the hint and can retry with corrected arguments:
// Tool returns: Err(ToolError::ModelRetry("city must be a valid name, got '123'"))
// Loop converts to: ContentBlock::ToolResult {
// tool_use_id: "call_1",
// content: [ContentItem::Text("city must be a valid name, got '123'")],
// is_error: true,
// }
// Model sees the error and retries: get_weather({"city": "Tokyo"})
Complete flow diagram
User: "What's the weather in Tokyo?"
│
▼
┌─────────────────────────────────────────────┐
│ Turn 1: LLM call │
│ Request: [User: "What's the weather..."] │
│ Response: ToolUse(get_weather, {city: │
│ "Tokyo"}) │
│ StopReason: ToolUse │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Tool execution │
│ Registry dispatches get_weather │
│ Tool returns: {temp: 22, conditions: │
│ "sunny"} │
│ Result appended as ToolResult message │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Turn 2: LLM call │
│ Request: [User, Assistant(ToolUse), │
│ User(ToolResult)] │
│ Response: "It's 22°C and sunny in Tokyo." │
│ StopReason: EndTurn │
└──────────────────┬──────────────────────────┘
│
▼
AgentResult {
response: "It's 22°C and sunny in Tokyo.",
turns: 2,
...
}
Cancellation
The loop checks ToolContext.cancellation_token at two points:
- Top of each iteration (before the max turns check)
- Before tool execution (after the LLM returns tool calls)
use tokio_util::sync::CancellationToken;
let token = CancellationToken::new();
let ctx = ToolContext {
cancellation_token: token.clone(),
..Default::default()
};
// Cancel from another task
tokio::spawn(async move {
tokio::time::sleep(Duration::from_secs(30)).await;
token.cancel();
});
match agent.run_text("Long task...", &ctx).await {
Err(LoopError::Cancelled) => println!("Cancelled!"),
Ok(result) => println!("{}", result.response),
Err(e) => eprintln!("{e}"),
}
Parallel tool execution
When LoopConfig.parallel_tool_execution is true and the LLM returns multiple
tool calls in a single response, all calls execute concurrently via
futures::future::join_all. When false (the default), tools execute
sequentially in order.
let agent = AgentLoop::builder(provider, context)
.parallel_tool_execution(true)
.tools(registry)
.build();
Parallel execution applies to run() and run_step(). Streaming (run_stream())
always executes tools sequentially.
Usage limits
UsageLimits enforces token and request budgets on the agent loop. When any
limit is exceeded, the loop returns LoopError::UsageLimitExceeded with a
message describing which limit was hit.
use neuron_loop::AgentLoop;
use neuron_types::UsageLimits;
let limits = UsageLimits::default()
.with_input_tokens_limit(500_000)
.with_output_tokens_limit(50_000)
.with_total_tokens_limit(600_000)
.with_request_limit(25)
.with_tool_calls_limit(100);
let agent = AgentLoop::builder(provider, context)
.tools(registry)
.usage_limits(limits)
.build();
Each field is optional – set only the limits you care about. Unset limits are not enforced.
| Limit | Checked against |
|---|---|
input_tokens_limit | Cumulative TokenUsage.input_tokens across all turns |
output_tokens_limit | Cumulative TokenUsage.output_tokens across all turns |
total_tokens_limit | Sum of cumulative input + output tokens |
request_limit | Number of LLM calls made (incremented each turn) |
tool_calls_limit | Number of tool executions (incremented per tool call) |
The loop checks limits at two points:
- Before each LLM call – checks token and request limits against accumulated usage
- After tool execution – checks the tool call count against the limit
When a limit is exceeded, the loop stops immediately and returns
LoopError::UsageLimitExceeded with a descriptive message (e.g.,
"output token limit exceeded: 50123 > 50000").
You can also construct UsageLimits directly:
use neuron_types::UsageLimits;
let limits = UsageLimits {
input_tokens_limit: Some(500_000),
output_tokens_limit: Some(50_000),
total_tokens_limit: None,
request_limit: Some(25),
tool_calls_limit: None,
};
Or use LoopConfig directly:
use neuron_loop::LoopConfig;
use neuron_types::UsageLimits;
let config = LoopConfig {
usage_limits: Some(UsageLimits::default()
.with_total_tokens_limit(1_000_000)
.with_request_limit(50)),
..Default::default()
};
Context compaction
The loop supports two independent compaction mechanisms:
Client-side compaction
Uses the ContextStrategy you provide. Between turns, the loop calls
should_compact() and compact() to reduce message history when tokens exceed
the configured threshold.
// SlidingWindow compacts by dropping old messages
let agent = AgentLoop::builder(provider, SlidingWindowStrategy::new(20, 100_000))
.build();
Server-side compaction
When the provider returns StopReason::Compaction, the loop automatically
continues without treating it as a final response. The compacted content
arrives in ContentBlock::Compaction within the assistant’s message.
No configuration is needed in the loop – it handles this transparently. Set
CompletionRequest.context_management on the provider side to enable it.
ToolError::ModelRetry
When a tool returns Err(ToolError::ModelRetry(hint)), the loop converts it
to a ToolOutput with is_error: true and the hint as content. The model
receives the hint and can retry with corrected arguments.
This does not propagate as LoopError::Tool. The loop continues normally,
giving the model a chance to self-correct.
Observability hooks
Add hooks to observe or control loop behavior. Hooks receive events at each step
and return HookAction::Continue, HookAction::Skip, or HookAction::Terminate.
use neuron_types::{ObservabilityHook, HookEvent, HookAction, HookError};
struct TokenBudgetHook { max_tokens: usize }
impl ObservabilityHook for TokenBudgetHook {
async fn on_event(&self, event: HookEvent<'_>) -> Result<HookAction, HookError> {
match event {
HookEvent::PostLlmCall { response } => {
if response.usage.output_tokens > self.max_tokens {
return Ok(HookAction::Terminate {
reason: "token budget exceeded".into(),
});
}
}
_ => {}
}
Ok(HookAction::Continue)
}
}
let agent = AgentLoop::builder(provider, context)
.hook(TokenBudgetHook { max_tokens: 10_000 })
.build();
Hook events
| Event | Fired when | Skip/Terminate behavior |
|---|---|---|
LoopIteration { turn } | Start of each turn | Terminate stops the loop |
PreLlmCall { request } | Before calling the provider | Terminate stops the loop |
PostLlmCall { response } | After receiving the response | Terminate stops the loop |
PreToolExecution { tool_name, input } | Before each tool call | Skip returns rejection as tool result |
PostToolExecution { tool_name, output } | After each tool call | Terminate stops the loop |
ContextCompaction { old_tokens, new_tokens } | After context is compacted | Terminate stops the loop |
Durable execution
For crash-recoverable agents, set a DurableContext on the loop. When present,
LLM calls go through DurableContext::execute_llm_call and tool calls go
through DurableContext::execute_tool, enabling journaling and replay by engines
like Temporal, Restate, or Inngest.
let agent = AgentLoop::builder(provider, context)
.durability(my_temporal_context)
.build();
The loop handles the durable/non-durable split transparently. All other behavior (hooks, compaction, cancellation) works the same way.
Error handling
run() and run_text() return Result<AgentResult, LoopError>:
| Variant | Cause |
|---|---|
LoopError::Provider(e) | LLM call failed |
LoopError::Tool(e) | Tool execution failed (except ModelRetry) |
LoopError::Context(e) | Context compaction failed |
LoopError::MaxTurns(n) | Turn limit reached |
LoopError::UsageLimitExceeded(msg) | Token, request, or tool call budget exceeded |
LoopError::HookTerminated(reason) | A hook returned Terminate |
LoopError::Cancelled | Cancellation token was triggered |
run_stream() sends errors as StreamEvent::Error on the channel instead of
returning them as Result.