Introduction

neuron provides composable building blocks for AI agents in Rust. Each block is an independent crate, versioned and published separately. Pull one block without buying the whole stack.

Philosophy: serde, not serde_json

neuron is to agent frameworks what serde is to serde_json. It defines traits (Provider, Tool, ContextStrategy) and provides foundational implementations. An SDK layer composes these blocks into opinionated workflows.

Every Rust and Python agent framework converges on the same ~300-line while loop. The differentiation is never the loop — it’s the blocks around it: context management, tool pipelines, durability, runtime. Nobody ships those blocks independently. That’s the gap neuron fills.

What’s Included

neuron ships the following crates:

Crate	Purpose
`neuron-types`	Core traits and types — `Provider`, `Tool`, `ContextStrategy`, `Message`
`neuron-provider-anthropic`	Anthropic Messages API (streaming, tool use, server-side compaction)
`neuron-provider-openai`	OpenAI Chat Completions + Embeddings API
`neuron-provider-ollama`	Ollama local inference API
`neuron-tool`	`ToolRegistry` with composable middleware pipeline
`neuron-tool-macros`	`#[neuron_tool]` derive macro
`neuron-context`	Compaction strategies, token counting, system prompt injection
`neuron-loop`	Configurable `AgentLoop` with streaming, cancellation, parallel tools
`neuron-mcp`	Model Context Protocol client and server (stdio + Streamable HTTP)
`neuron-runtime`	Sessions, guardrails, `TracingHook`, `GuardrailHook`, `DurableContext`
`neuron-otel`	OpenTelemetry instrumentation with GenAI semantic conventions (`gen_ai.*` spans)
`neuron`	Umbrella crate with feature flags for all of the above

Who Is This For?

Rust developers building AI-powered applications who want control over each layer of the stack
Framework authors who need well-tested building blocks to compose into higher-level abstractions
AI agents (like Claude Code) that need to understand, evaluate, and work with the codebase

What neuron Is NOT

neuron is the layer below frameworks. It does not provide:

CLI, TUI, or GUI applications
Opinionated agent framework (compose one from the blocks)
RAG pipeline (use the EmbeddingProvider trait with your own retrieval)
Workflow engine (integrate with Temporal/Restate via DurableContext)
Retry middleware (use tower or your durable engine’s retry policy)

Next Steps

Installation — add neuron to your project
Quickstart — build your first agent in 50 lines
Core Concepts — understand the five key abstractions

Installation

Using the Umbrella Crate

The fastest way to get started is the neuron umbrella crate with feature flags:

[dependencies]
neuron = { features = ["anthropic"] }

Or install via cargo:

cargo add neuron --features anthropic

Feature Flags

Feature	Enables	Default
`anthropic`	`neuron-provider-anthropic`	Yes
`openai`	`neuron-provider-openai`	No
`ollama`	`neuron-provider-ollama`	No
`mcp`	`neuron-mcp` (Model Context Protocol)	No
`runtime`	`neuron-runtime` (sessions, guardrails)	No
`otel`	`neuron-otel` (OpenTelemetry instrumentation)	No
`full`	All of the above	No

Using Individual Crates

Each neuron crate is independently published. Use them directly for finer control over dependencies:

[dependencies]
neuron-types = "*"
neuron-provider-openai = "*"
neuron-tool = "*"
neuron-loop = "*"

This pulls in only what you need — no transitive dependency on providers you don’t use.

Minimum Supported Rust Version

neuron requires Rust 1.90+ (edition 2024). It uses native async traits (RPITIT) and requires no #[async_trait] macro.

Environment Variables

Each provider loads credentials from environment variables via from_env():

Provider	Environment Variable
Anthropic	`ANTHROPIC_API_KEY`
OpenAI	`OPENAI_API_KEY`
Ollama	`OLLAMA_HOST` (default: `http://localhost:11434`)

Quickstart

Build a working AI agent in ~50 lines of Rust.

Prerequisites

Rust 1.90+
An API key for Anthropic or OpenAI (set as ANTHROPIC_API_KEY or OPENAI_API_KEY)

The Agent

use neuron::prelude::*;
use neuron_provider_anthropic::Anthropic;
use neuron_tool::ToolRegistry;
use neuron_loop::AgentLoop;
use neuron_context::SlidingWindowStrategy;
use neuron_types::*;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};

// 1. Define a tool
struct GetWeather;

impl Tool for GetWeather {
    const NAME: &'static str = "get_weather";
    type Args = WeatherArgs;
    type Output = String;
    type Error = std::io::Error;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".to_string(),
            title: None,
            description: "Get the current weather for a city".to_string(),
            input_schema: schemars::schema_for!(WeatherArgs).into(),
            output_schema: None,
            annotations: None,
            cache_control: None,
        }
    }

    async fn call(&self, args: WeatherArgs, _ctx: &ToolContext) -> Result<String, std::io::Error> {
        Ok(format!("Weather in {}: 72°F, sunny", args.city))
    }
}

#[derive(Debug, Deserialize, JsonSchema)]
struct WeatherArgs {
    /// The city to get weather for
    city: String,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 2. Set up a provider
    let provider = Anthropic::from_env()?;

    // 3. Register tools
    let mut tools = ToolRegistry::new();
    tools.register(GetWeather);

    // 4. Create the context strategy
    let context = SlidingWindowStrategy::new(10, 100_000);

    // 5. Build and run the agent loop
    let mut agent = AgentLoop::builder(provider, context)
        .tools(tools)
        .system_prompt("You are a helpful weather assistant.")
        .max_turns(5)
        .build();

    let ctx = ToolContext::default();
    let result = agent.run(Message::user("What's the weather in San Francisco?"), &ctx).await?;
    println!("{}", result.response);
    Ok(())
}

What Just Happened?

Provider — Anthropic::from_env() creates an API client from ANTHROPIC_API_KEY
Tool — GetWeather implements the Tool trait with typed args and output
Registry — ToolRegistry stores tools and handles JSON deserialization
Context — SlidingWindowStrategy keeps the conversation within token limits
Loop — AgentLoop drives the conversation: send message, get response, execute tools, repeat

The agent loop handles multi-turn tool use automatically. When Claude calls get_weather, the loop executes the tool and sends the result back. The loop continues until Claude responds without tool calls or hits max_turns.

Next Steps

Core Concepts — understand Provider, Tool, ContextStrategy, and more
Tools Guide — the #[neuron_tool] macro, middleware, and advanced patterns
Providers Guide — switching between Anthropic, OpenAI, and Ollama

Core Concepts

neuron is built around five core abstractions. Each is a trait defined in neuron-types with one or more implementations in satellite crates.

Provider

The Provider trait abstracts LLM API calls. Each provider is its own crate.

pub trait Provider: Send + Sync {
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, ProviderError>;
    async fn complete_stream(&self, request: CompletionRequest) -> Result<StreamHandle, ProviderError>;
}

Implementations: Anthropic, OpenAi, Ollama. All support from_env() for credential loading.

Providers Guide

Tool

The Tool trait defines a function the model can call. Tools have typed arguments (via schemars for JSON Schema) and typed outputs.

pub trait Tool: Send + Sync {
    const NAME: &'static str;
    type Args: DeserializeOwned + JsonSchema;
    type Output: Serialize;
    type Error: std::error::Error;

    fn definition(&self) -> ToolDefinition;
    async fn call(&self, args: Self::Args, ctx: &ToolContext) -> Result<Self::Output, Self::Error>;
}

The ToolRegistry stores tools with type erasure (ToolDyn) and runs them through a composable middleware pipeline.

Tools Guide

ContextStrategy

The ContextStrategy trait manages conversation history to stay within token limits.

pub trait ContextStrategy: Send + Sync {
    fn should_compact(&self, messages: &[Message], token_count: usize) -> bool;
    async fn compact(&self, messages: Vec<Message>) -> Result<Vec<Message>, ContextError>;
    fn token_estimate(&self, messages: &[Message]) -> usize;
}

Implementations: SlidingWindowStrategy (drop oldest messages), ToolResultClearingStrategy (clear tool outputs), CompositeStrategy (chain multiple strategies).

Context Management Guide

ObservabilityHook

Hooks observe the agent loop lifecycle without altering it (unless they terminate).

pub trait ObservabilityHook: Send + Sync {
    async fn on_event(&self, event: HookEvent<'_>) -> HookAction;
}

HookAction is Continue, Skip, or Terminate(String). Implementations: TracingHook (structured tracing spans), GuardrailHook (input/output guardrails as hooks).

Runtime Guide

DurableContext

Wraps side effects (LLM calls, tool execution) for durable engines like Temporal or Restate.

pub trait DurableContext: Send + Sync {
    fn execute_llm_call(&self, request: CompletionRequest, options: ActivityOptions)
        -> impl Future<Output = Result<CompletionResponse, DurableError>> + Send;

    fn execute_tool(&self, tool_name: &str, input: Value, ctx: &ToolContext, options: ActivityOptions)
        -> impl Future<Output = Result<ToolOutput, DurableError>> + Send;
}

This enables journal-based replay and recovery for long-running agents.

Runtime Guide

Tools

The tool system in neuron lets you give LLMs the ability to call into your Rust code. You define strongly-typed tools, register them in a ToolRegistry, and optionally wrap execution with middleware for logging, validation, or permissions.

Quick example

use neuron_tool::{neuron_tool, ToolRegistry};
use neuron_types::ToolContext;

#[neuron_tool(name = "lookup", description = "Look up a value by key")]
async fn lookup(
    /// The key to look up
    key: String,
    _ctx: &ToolContext,
) -> Result<String, std::io::Error> {
    Ok(format!("value for {key}"))
}

#[tokio::main]
async fn main() {
    let mut registry = ToolRegistry::new();
    registry.register(LookupTool);

    let ctx = ToolContext::default();
    let output = registry
        .execute("lookup", serde_json::json!({"key": "foo"}), &ctx)
        .await
        .unwrap();
    println!("{:?}", output.content);
}

Core traits

`Tool` – strongly typed

The Tool trait is the primary way to define a tool. It uses Rust’s type system to enforce correct input/output handling at compile time.

pub trait Tool: Send + Sync {
    const NAME: &'static str;
    type Args: DeserializeOwned + JsonSchema + Send;
    type Output: Serialize;
    type Error: std::error::Error + Send + 'static;

    fn definition(&self) -> ToolDefinition;
    fn call(&self, args: Self::Args, ctx: &ToolContext) -> impl Future<Output = Result<Self::Output, Self::Error>> + Send;
}

Key points:

NAME – a unique identifier the LLM uses to invoke the tool.
Args – must derive Deserialize and schemars::JsonSchema so the registry can generate a JSON Schema for the LLM and deserialize its input.
Output – must implement Serialize; the blanket ToolDyn impl serializes it to JSON automatically.
definition() – returns a ToolDefinition containing the name, description, and JSON Schema. The LLM sees this to decide when to call the tool.

`ToolDyn` – type-erased

Every Tool automatically implements ToolDyn via a blanket impl. ToolDyn is the dyn-compatible version that the ToolRegistry stores internally:

pub trait ToolDyn: Send + Sync {
    fn name(&self) -> &str;
    fn definition(&self) -> ToolDefinition;
    fn call_dyn(&self, input: serde_json::Value, ctx: &ToolContext) -> WasmBoxedFuture<'_, Result<ToolOutput, ToolError>>;
}

The blanket impl handles JSON deserialization of Args, calling Tool::call, serializing the Output into ToolOutput, and mapping errors to ToolError.

The `#[neuron_tool]` macro

For simple tools, the neuron_tool attribute macro reduces boilerplate. It generates the Args struct, Tool struct, and Tool impl from a single annotated async function:

use neuron_tool::neuron_tool;
use neuron_types::ToolContext;

#[derive(Debug, serde::Serialize)]
struct WeatherOutput { temperature: f64, conditions: String }

#[derive(Debug, thiserror::Error)]
#[error("weather error: {0}")]
struct WeatherError(String);

#[neuron_tool(name = "get_weather", description = "Get current weather for a city")]
async fn get_weather(
    /// City name (e.g. "San Francisco")
    city: String,
    _ctx: &ToolContext,
) -> Result<WeatherOutput, WeatherError> {
    // The macro generates GetWeatherTool and GetWeatherArgs automatically
    Ok(WeatherOutput { temperature: 72.0, conditions: "sunny".into() })
}

The macro generates:

GetWeatherArgs – a struct with #[derive(Deserialize, JsonSchema)]
GetWeatherTool – a unit struct implementing Tool
Doc comments on function parameters become JSON Schema descriptions

`ToolRegistry`

The registry stores tools and executes them through an optional middleware chain.

use neuron_tool::ToolRegistry;

let mut registry = ToolRegistry::new();

// Register a strongly-typed tool (auto-erased to ToolDyn)
registry.register(MyTool);

// Register a pre-erased tool (e.g. from MCP bridge)
registry.register_dyn(arc_tool_dyn);

// Get all definitions to send to the LLM
let defs: Vec<ToolDefinition> = registry.definitions();

// Execute a tool by name with JSON input
let output = registry.execute("my_tool", json_input, &tool_ctx).await?;

// Look up a specific tool
let tool: Option<Arc<dyn ToolDyn>> = registry.get("my_tool");

`ToolContext`

Every tool call receives a ToolContext providing runtime information:

Field	Type	Description
`cwd`	`PathBuf`	Current working directory
`session_id`	`String`	Session identifier
`environment`	`HashMap<String, String>`	Key-value environment
`cancellation_token`	`CancellationToken`	Cooperative cancellation
`progress_reporter`	`Option<Arc<dyn ProgressReporter>>`	Progress feedback for long-running tools

ToolContext implements Default with the current directory, an empty session ID, an empty environment, and a fresh cancellation token.

Middleware

Middleware wraps tool execution with cross-cutting concerns. The pattern is identical to axum’s from_fn – each middleware receives a Next that it can call to continue the chain, or skip to short-circuit.

Writing middleware with closures

use neuron_tool::{tool_middleware_fn, ToolRegistry};

let logging = tool_middleware_fn(|call, ctx, next| {
    Box::pin(async move {
        println!("calling tool: {}", call.name);
        let result = next.run(call, ctx).await;
        println!("tool completed: is_error={}", result.as_ref().map(|o| o.is_error).unwrap_or(true));
        result
    })
});

let mut registry = ToolRegistry::new();
registry.add_middleware(logging);

Writing middleware as a struct

use neuron_tool::middleware::{ToolMiddleware, ToolCall, Next};
use neuron_types::{ToolContext, ToolError, ToolOutput, WasmBoxedFuture};

struct RateLimiter { /* ... */ }

impl ToolMiddleware for RateLimiter {
    fn process<'a>(
        &'a self,
        call: &'a ToolCall,
        ctx: &'a ToolContext,
        next: Next<'a>,
    ) -> WasmBoxedFuture<'a, Result<ToolOutput, ToolError>> {
        Box::pin(async move {
            // Check rate limit, then proceed
            next.run(call, ctx).await
        })
    }
}

Input validation middleware

A common use case is intercepting tool calls to validate input arguments before the tool executes. When validation fails, returning ToolError::ModelRetry gives the model a hint so it can self-correct rather than crashing the loop.

Here is a closure-based validation middleware that checks URL and numeric range arguments:

use neuron_tool::{tool_middleware_fn, ToolRegistry};
use neuron_types::ToolError;

let mut registry = ToolRegistry::new();

// Input validation middleware — rejects invalid arguments with a hint
// so the model can self-correct
registry.add_middleware(tool_middleware_fn(|call, ctx, next| {
    Box::pin(async move {
        // Validate URL arguments
        if let Some(url) = call.input.get("url").and_then(|v| v.as_str()) {
            if !url.starts_with("https://") {
                return Err(ToolError::ModelRetry(
                    format!("url must start with https://, got '{url}'")
                ));
            }
        }

        // Validate numeric ranges
        if let Some(count) = call.input.get("count").and_then(|v| v.as_u64()) {
            if count == 0 || count > 100 {
                return Err(ToolError::ModelRetry(
                    format!("count must be 1-100, got {count}")
                ));
            }
        }

        // Input is valid — proceed to the tool
        next.run(call, ctx).await
    })
}));

The middleware reads fields from call.input (a serde_json::Value) and returns early with a validation hint when constraints are violated. Because it uses ToolError::ModelRetry, the agentic loop converts the message into an error tool result that the model sees as feedback – it can then retry the call with corrected arguments.

For the struct-based approach, implement ToolMiddleware the same way as the RateLimiter example above, placing validation logic inside the process method and returning Err(ToolError::ModelRetry(hint)) on failure.

`ToolError` variants for validation

Choose the right error variant depending on whether the model can recover:

Variant	Behavior	Use when
`ModelRetry(hint)`	Loop sends the hint back to the model as an error tool result. The model retries with corrected arguments.	Validation errors the model can fix: bad format, out-of-range values, missing optional fields
`InvalidInput(msg)`	Propagates as `LoopError::Tool` and stops the loop.	Unrecoverable issues: impossible argument combinations, security violations, malformed JSON

Use ModelRetry as the default for input validation. Reserve InvalidInput for cases where no amount of retrying will produce valid input.

Scoping validation to specific tools

Use per-tool middleware to apply validation only where it is needed:

// This validation runs only when the "fetch_page" tool is called
registry.add_tool_middleware("fetch_page", tool_middleware_fn(|call, ctx, next| {
    Box::pin(async move {
        if let Some(url) = call.input.get("url").and_then(|v| v.as_str()) {
            if !url.starts_with("https://") {
                return Err(ToolError::ModelRetry(
                    format!("fetch_page requires an https:// URL, got '{url}'")
                ));
            }
        }
        next.run(call, ctx).await
    })
}));

Middleware execution order

Middleware executes in registration order, wrapping tool calls from outside in:

Global middleware (registered with add_middleware) runs first
Per-tool middleware (registered with add_tool_middleware) runs next
The actual tool executes last

registry.add_middleware(logging_middleware);       // Runs first for ALL tools
registry.add_tool_middleware("search", auth_mw);  // Runs second, only for "search"
// The tool itself runs last

Built-in middleware

neuron-tool ships built-in middleware implementations:

PermissionChecker – checks a PermissionPolicy before each tool call. Returns ToolError::PermissionDenied on Deny or Ask decisions.
OutputFormatter – truncates tool output exceeding a character limit. Useful to prevent large tool results from consuming the context window.
SchemaValidator – validates tool call inputs against their JSON Schema before execution. Catches missing required fields and type mismatches.
TimeoutMiddleware – wraps tool calls with tokio::time::timeout. Configurable default timeout and per-tool overrides for tools with different latency characteristics.
StructuredOutputValidator – validates tool input against the tool’s JSON Schema and returns ToolError::ModelRetry on failure, giving the model a chance to self-correct with the validation error as a hint.
RetryLimitedValidator – wraps StructuredOutputValidator with a maximum retry count. After the retry limit is exhausted, converts ModelRetry to ToolError::InvalidInput to stop the loop rather than retrying indefinitely.

use neuron_tool::builtin::{PermissionChecker, OutputFormatter, SchemaValidator};

// Truncate outputs longer than 10,000 characters
registry.add_middleware(OutputFormatter::new(10_000));

// Validate inputs before execution
let validator = SchemaValidator::new(&registry);
registry.add_middleware(validator);

`TimeoutMiddleware`

Wraps each tool call with tokio::time::timeout. If a tool exceeds the configured duration, the call is cancelled and returns ToolError::ExecutionFailed with a timeout message.

use std::time::Duration;
use neuron_tool::builtin::TimeoutMiddleware;

// Default timeout of 30 seconds for all tools
let timeout = TimeoutMiddleware::new(Duration::from_secs(30))
    // Override for specific tools that need more time
    .with_tool_timeout("slow_search", Duration::from_secs(120))
    .with_tool_timeout("code_execution", Duration::from_secs(300));

registry.add_middleware(timeout);

The middleware checks the tool name from the ToolCall and uses the per-tool timeout if one was configured, otherwise the default. This is useful when most tools are fast but a few (external API calls, code execution) need longer deadlines.

`StructuredOutputValidator`

Validates tool input JSON against the tool’s JSON Schema before execution. When validation fails, it returns ToolError::ModelRetry with the validation errors as a hint, giving the model a chance to fix its arguments and retry.

use neuron_tool::builtin::StructuredOutputValidator;

// Validate tool output against a JSON Schema, with up to 3 retries
let schema = serde_json::json!({
    "type": "object",
    "required": ["result"],
    "properties": { "result": { "type": "string" } }
});
let validator = StructuredOutputValidator::new(schema, 3);
registry.add_middleware(validator);

Unlike SchemaValidator (which returns ToolError::InvalidInput on failure), StructuredOutputValidator uses the ModelRetry self-correction pattern. The model receives the validation errors as feedback and can retry with corrected arguments. This is directly inspired by Pydantic AI’s validation-retry loop.

`RetryLimitedValidator`

Wraps StructuredOutputValidator with a maximum retry count. After the model has retried a specified number of times, the validator converts ModelRetry to ToolError::InvalidInput, stopping the self-correction loop and propagating the error.

use neuron_tool::builtin::{RetryLimitedValidator, StructuredOutputValidator};

// Create a structured validator, then wrap it with a retry limit
let schema = serde_json::json!({
    "type": "object",
    "required": ["result"],
    "properties": { "result": { "type": "string" } }
});
let inner = StructuredOutputValidator::new(schema, 3);
let validator = RetryLimitedValidator::new(inner);
registry.add_middleware(validator);

This prevents infinite retry loops when the model consistently produces invalid input. After the inner validator’s retry limit is exhausted, RetryLimitedValidator converts ModelRetry to ToolError::InvalidInput.

`ToolError::ModelRetry`

The ModelRetry variant enables self-correction. When a tool returns Err(ToolError::ModelRetry(hint)), the agentic loop converts the hint into an error tool result and sends it back to the model. The model sees the hint and can retry with corrected arguments.

use neuron_types::ToolError;

// Inside a tool's call() method:
if !is_valid_query(&args.query) {
    return Err(ToolError::ModelRetry(
        "Query must be a valid SQL SELECT statement. \
         You provided a DELETE statement.".to_string()
    ));
}

This does not propagate as a LoopError – the loop continues with the model receiving the hint as feedback.

Implementing `Tool` manually

When you need full control (custom schemas, complex error types), implement Tool directly instead of using the macro:

use neuron_types::{Tool, ToolContext, ToolDefinition};
use serde::Deserialize;

#[derive(Debug, Deserialize, schemars::JsonSchema)]
struct SearchArgs {
    query: String,
    max_results: Option<usize>,
}

struct SearchTool { api_key: String }

impl Tool for SearchTool {
    const NAME: &'static str = "search";
    type Args = SearchArgs;
    type Output = Vec<String>;
    type Error = std::io::Error;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "search".into(),
            title: Some("Web Search".into()),
            description: "Search the web for information".into(),
            input_schema: serde_json::to_value(
                schemars::schema_for!(SearchArgs)
            ).unwrap(),
            output_schema: None,
            annotations: None,
            cache_control: None,
        }
    }

    async fn call(&self, args: SearchArgs, _ctx: &ToolContext) -> Result<Vec<String>, std::io::Error> {
        let max = args.max_results.unwrap_or(5);
        Ok(vec![format!("Result for '{}' (max {})", args.query, max)])
    }
}

API reference

Context management

neuron-context provides strategies for keeping conversation history within token limits. When context grows too large, a ContextStrategy compacts messages – dropping old ones, clearing tool results, or summarizing via an LLM. The crate also includes token estimation, system prompt injection, and persistent context sections.

Quick example

use neuron_context::{SlidingWindowStrategy, TokenCounter};
use neuron_types::{ContextStrategy, Message};

let strategy = SlidingWindowStrategy::new(
    10,       // keep the last 10 non-system messages
    100_000,  // compact when tokens exceed 100k
);

let messages = vec![
    Message::system("You are a helpful assistant."),
    Message::user("Hello"),
    Message::assistant("Hi there!"),
    // ... many more messages ...
];

let token_count = strategy.token_estimate(&messages);
if strategy.should_compact(&messages, token_count) {
    let compacted = strategy.compact(messages).await?;
    // compacted retains system messages + the last 10 non-system messages
}

The `ContextStrategy` trait

All strategies implement this trait from neuron-types:

pub trait ContextStrategy: Send + Sync {
    /// Whether compaction should be triggered.
    fn should_compact(&self, messages: &[Message], token_count: usize) -> bool;

    /// Compact the message list to reduce token usage.
    fn compact(&self, messages: Vec<Message>) -> impl Future<Output = Result<Vec<Message>, ContextError>> + Send;

    /// Estimate the token count for a list of messages.
    fn token_estimate(&self, messages: &[Message]) -> usize;
}

The agentic loop (AgentLoop) calls these methods between turns:

token_estimate() to get the current count
should_compact() to decide if action is needed
compact() to reduce the message list

Built-in strategies

`SlidingWindowStrategy`

Keeps system messages plus the most recent N non-system messages. Simple and predictable – older messages are dropped entirely.

use neuron_context::SlidingWindowStrategy;

// Keep last 20 non-system messages, trigger at 100k tokens
let strategy = SlidingWindowStrategy::new(20, 100_000);

// With a custom token counter (e.g. different chars-per-token ratio)
let counter = TokenCounter::with_ratio(3.5);
let strategy = SlidingWindowStrategy::with_counter(20, 100_000, counter);

What compaction actually does. SlidingWindowStrategy partitions messages by role: system messages are always preserved regardless of the window size, and the window count applies only to non-system messages. Here is a concrete before/after showing a compaction with SlidingWindowStrategy::new(2, 500):

Before compaction (7 messages, ~800 tokens):
  [system] "You are a helpful assistant."
  [user]   "What is Rust?"
  [asst]   "Rust is a systems programming language..."
  [user]   "How about memory safety?"
  [asst]   "Rust uses ownership and borrowing..."
  [user]   "What about async?"
  [asst]   "Rust supports async/await via futures..."

After compaction with SlidingWindowStrategy::new(2, 500):
  [system] "You are a helpful assistant."    <- always preserved
  [user]   "What about async?"               <- last 2 non-system messages
  [asst]   "Rust supports async/await..."    <- last 2 non-system messages

The first four non-system messages are dropped entirely. The system message survives because the implementation unconditionally retains all system messages before applying the sliding window to the remaining conversation. See neuron-context/examples/compaction.rs for a runnable demo.

`ToolResultClearingStrategy`

Replaces old tool result content with "[tool result cleared]" while preserving the tool_use_id so the conversation still makes semantic sense. Keeps the most recent N tool results intact.

This is effective when tool outputs are large (file contents, API responses) but the model only needs the recent ones to stay coherent.

use neuron_context::ToolResultClearingStrategy;

// Keep the 2 most recent tool results intact, clear older ones
let strategy = ToolResultClearingStrategy::new(2, 100_000);

`SummarizationStrategy`

Uses an LLM provider to summarize old messages, replacing them with a single summary message. Preserves the most recent N messages verbatim.

This produces the highest-quality compaction but costs an additional LLM call.

use neuron_context::SummarizationStrategy;

// Summarize old messages, keep the 5 most recent verbatim
let strategy = SummarizationStrategy::new(provider, 5, 100_000);

The summarization prompt asks the LLM to summarize concisely, focusing on key information, decisions made, and tool call results. The summary is wrapped in a [Summary of earlier conversation] prefix.

`CompositeStrategy`

Chains multiple strategies in order, applying each one until the token budget is met. After each strategy runs, the token count is re-estimated; iteration stops early if below the threshold.

Because ContextStrategy uses RPITIT (not dyn-compatible), strategies must be wrapped in BoxedStrategy before composing:

use neuron_context::{
    CompositeStrategy, SlidingWindowStrategy, ToolResultClearingStrategy,
    strategies::BoxedStrategy,
};

let strategy = CompositeStrategy::new(vec![
    // First: clear old tool results (cheap, often sufficient)
    BoxedStrategy::new(ToolResultClearingStrategy::new(2, 100_000)),
    // Second: drop old messages if still over budget
    BoxedStrategy::new(SlidingWindowStrategy::new(10, 100_000)),
], 100_000);

This ordering is a best practice: try cheaper strategies first (clearing tool results), then progressively more aggressive ones (dropping messages, summarizing).

`TokenCounter`

A heuristic token estimator using a configurable characters-per-token ratio. The default ratio of 4.0 characters per token approximates GPT-family and Claude models.

use neuron_context::TokenCounter;

let counter = TokenCounter::new();          // 4.0 chars/token (default)
let counter = TokenCounter::with_ratio(3.5); // Custom ratio

// Estimate tokens for plain text
let tokens = counter.estimate_text("Hello, world!");

// Estimate tokens for a message list
let tokens = counter.estimate_messages(&messages);

// Estimate tokens for tool definitions
let tokens = counter.estimate_tools(&tool_definitions);

The counter estimates different content block types:

Content type	Estimation method
`Text`	`len / chars_per_token`
`Thinking`	Thinking text length
`ToolUse`	Name + serialized input
`ToolResult`	Sum of content items
`Image`	Fixed 300 tokens
`Document`	Fixed 500 tokens
`Compaction`	Content text length

Each message adds a fixed 4-token overhead for role markers.

`SystemInjector`

Injects additional system prompt content based on turn count or token thresholds. Useful for reminders (“be concise”) or context-aware instructions that only apply under certain conditions.

use neuron_context::{SystemInjector, InjectionTrigger};

let mut injector = SystemInjector::new();

// Remind the model to be concise every 5 turns
injector.add_rule(
    InjectionTrigger::EveryNTurns(5),
    "Reminder: keep responses concise.".into(),
);

// Warn when context is getting large
injector.add_rule(
    InjectionTrigger::OnTokenThreshold(50_000),
    "Context is getting long. Summarize when possible.".into(),
);

// Check each turn -- returns a Vec of triggered content strings
let injections: Vec<String> = injector.check(turn_number, token_count);

`PersistentContext`

Aggregates named context sections and renders them into a single structured string. Use this to build system prompts from multiple independent sources (role definition, rules, domain knowledge) with explicit ordering.

use neuron_context::{PersistentContext, ContextSection};

let mut ctx = PersistentContext::new();
ctx.add_section(ContextSection {
    label: "Role".into(),
    content: "You are a senior Rust engineer.".into(),
    priority: 0,  // lower = rendered first
});
ctx.add_section(ContextSection {
    label: "Output rules".into(),
    content: "Always include code examples.".into(),
    priority: 10,
});

let system_prompt = ctx.render();
// ## Role
// You are a senior Rust engineer.
//
// ## Output rules
// Always include code examples.

Server-side context management

Some providers (Anthropic) support server-side context compaction. Instead of the client compacting messages, the server pauses generation, compacts context internally, and resumes.

neuron supports this via three types in neuron-types:

ContextManagement – configuration sent in CompletionRequest to enable server-side compaction.
ContentBlock::Compaction – a content block containing the compacted summary, emitted by the server.
StopReason::Compaction – signals that the server paused to compact. The agentic loop automatically continues when it sees this stop reason.

use neuron_types::{CompletionRequest, ContextManagement, ContextEdit};

let request = CompletionRequest {
    context_management: Some(ContextManagement {
        edits: vec![ContextEdit::Compact {
            strategy: "compact_20260112".into(),
        }],
    }),
    ..Default::default()
};

When AgentLoop receives StopReason::Compaction, it appends the assistant’s message (which may contain ContentBlock::Compaction) and loops again without treating it as a final response.

Choosing a strategy

Strategy	Token cost	Quality	Best for
`SlidingWindowStrategy`	None	Low (drops context)	Short conversations, prototyping
`ToolResultClearingStrategy`	None	Medium (preserves flow)	Tool-heavy agents with large outputs
`SummarizationStrategy`	1 LLM call	High (semantic summary)	Long conversations needing continuity
`CompositeStrategy`	Varies	High (layered)	Production agents with mixed workloads
Server-side compaction	Provider-managed	Provider-dependent	Anthropic users who prefer server management

API reference

Providers

Provider crates implement the Provider trait from neuron-types, giving you a uniform interface to call any LLM. neuron ships three provider crates – Anthropic, OpenAI, and Ollama – each in its own crate following the serde pattern: trait in core, implementation in a satellite.

Quick example

use neuron_provider_anthropic::Anthropic;
use neuron_types::{CompletionRequest, Message, Provider};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = Anthropic::from_env()?;

    let request = CompletionRequest {
        messages: vec![Message::user("What is Rust?")],
        max_tokens: Some(256),
        ..Default::default()
    };

    let response = provider.complete(request).await?;
    println!("{}", response.message.content[0]); // ContentBlock::Text(...)
    println!("Tokens: {} in, {} out", response.usage.input_tokens, response.usage.output_tokens);
    Ok(())
}

The `Provider` trait

pub trait Provider: Send + Sync {
    fn complete(&self, request: CompletionRequest)
        -> impl Future<Output = Result<CompletionResponse, ProviderError>> + Send;

    fn complete_stream(&self, request: CompletionRequest)
        -> impl Future<Output = Result<StreamHandle, ProviderError>> + Send;
}

Key design points:

Uses RPITIT (return position impl trait in trait) – Rust 2024 native async. No #[async_trait] needed.
Not object-safe by design. Use generics <P: Provider> to compose. This avoids the overhead of boxing futures while keeping the API clean.
complete() returns a full CompletionResponse with the message, token usage, and stop reason.
complete_stream() returns a StreamHandle whose receiver field is a tokio::sync::mpsc::Receiver<StreamEvent> that yields text deltas, tool use blocks, usage stats, and a final MessageComplete event.

Anthropic (`neuron-provider-anthropic`)

Client for the Anthropic Messages API.

Construction

use neuron_provider_anthropic::Anthropic;

// From environment variable (ANTHROPIC_API_KEY)
let provider = Anthropic::from_env()?;

// With explicit API key
let provider = Anthropic::new("sk-ant-...");

// Builder-style configuration
let provider = Anthropic::new("sk-ant-...")
    .model("claude-opus-4-5")
    .base_url("https://api.anthropic.com");

Configuration

Method	Default	Description
`new(api_key)`	–	Create with explicit key
`from_env()`	–	Read `ANTHROPIC_API_KEY` from environment
`.model(name)`	`claude-sonnet-4-20250514`	Default model when request has empty model field
`.base_url(url)`	`https://api.anthropic.com`	Override for proxies or testing

Features

Full content block mapping: text, thinking, tool use/result, images, documents, compaction
Server-side context management via ContextManagement request field
SSE streaming with manual parser
Cache control on system prompts and tool definitions
ToolChoice::Required maps to Anthropic’s {"type": "any"}

OpenAI (`neuron-provider-openai`)

Client for the OpenAI Chat Completions API. Also implements EmbeddingProvider for the Embeddings API.

Construction

use neuron_provider_openai::OpenAi;

// From environment variable (OPENAI_API_KEY, optional OPENAI_ORG_ID)
let provider = OpenAi::from_env()?;

// With explicit API key
let provider = OpenAi::new("sk-...");

// Builder-style configuration
let provider = OpenAi::new("sk-...")
    .model("gpt-4o")
    .base_url("https://api.openai.com")
    .organization("org-...");

Configuration

Method	Default	Description
`new(api_key)`	–	Create with explicit key
`from_env()`	–	Read `OPENAI_API_KEY` (and optional `OPENAI_ORG_ID`)
`.model(name)`	`gpt-4o`	Default model
`.base_url(url)`	`https://api.openai.com`	Override for Azure, proxies, or testing
`.organization(org)`	None	Sent as `OpenAI-Organization` header

Embeddings

OpenAi also implements the EmbeddingProvider trait:

use neuron_types::{EmbeddingProvider, EmbeddingRequest};
use neuron_provider_openai::OpenAi;

let provider = OpenAi::from_env()?;

let request = EmbeddingRequest {
    model: "text-embedding-3-small".into(),
    input: vec!["Hello world".into(), "Goodbye world".into()],
    dimensions: Some(256),  // optional dimension reduction
    ..Default::default()
};

let response = provider.embed(request).await?;
// response.embeddings: Vec<Vec<f32>> -- one vector per input
// response.usage: EmbeddingUsage { prompt_tokens, total_tokens }

The EmbeddingProvider trait is separate from Provider because not all embedding models support chat completion and vice versa. The OpenAi struct implements both.

Features

SSE streaming with data: [DONE] sentinel
System prompts mapped to role: "developer" (OpenAI convention)
Tool calls in choices[0].message.tool_calls array format
ToolChoice::Required maps to OpenAI’s "required"
Stream options include include_usage: true for token stats

Ollama (`neuron-provider-ollama`)

Client for the Ollama Chat API. Designed for local models with no authentication required by default.

Construction

use neuron_provider_ollama::Ollama;

// Default: localhost:11434, no auth
let provider = Ollama::new();

// From environment (reads OLLAMA_HOST if set)
let provider = Ollama::from_env()?;

// Builder-style configuration
let provider = Ollama::new()
    .model("llama3.2")
    .base_url("http://remote-host:11434")
    .keep_alive("5m");

Configuration

Method	Default	Description
`new()`	–	Create with defaults (no auth needed)
`from_env()`	–	Read `OLLAMA_HOST` for base URL
`.model(name)`	`llama3.2`	Default model
`.base_url(url)`	`http://localhost:11434`	Override for remote instances
`.keep_alive(duration)`	None (server default)	Model memory residency (`"5m"`, `"0"` to unload)

Features

NDJSON streaming (newline-delimited JSON, not SSE)
No authentication by default (Ollama runs locally)
Synthesizes tool call IDs with UUID (Ollama does not provide them natively)
keep_alive controls how long the model stays in GPU memory
Tool definitions use the same format as OpenAI (adopted by Ollama)

Provider + AgentLoop integration

The most common use of a provider is plugging it into an AgentLoop – the commodity agentic while-loop that handles tool dispatch, context management, and multi-turn conversation. Here is a complete, self-contained example using OpenAI:

use neuron_context::SlidingWindowStrategy;
use neuron_loop::AgentLoop;
use neuron_provider_openai::OpenAi;
use neuron_tool::ToolRegistry;
use neuron_types::ToolContext;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Build the provider
    let provider = OpenAi::from_env()?.model("gpt-4o");

    // 2. Choose a context strategy (keep last 20 messages, up to 100k tokens)
    let context = SlidingWindowStrategy::new(20, 100_000);

    // 3. Create a tool registry (empty here -- add tools as needed)
    let tools = ToolRegistry::new();

    // 4. Assemble the agent loop
    let mut agent = AgentLoop::builder(provider, context)
        .system_prompt("You are a helpful assistant.")
        .max_turns(10)
        .tools(tools)
        .build();

    // 5. Run with a plain text message
    let ctx = ToolContext::default();
    let result = agent.run_text("Hello!", &ctx).await?;
    println!("{}", result.response);
    println!("Turns: {}, Tokens: {} in / {} out",
        result.turns, result.usage.input_tokens, result.usage.output_tokens);

    Ok(())
}

This pattern is identical for every Provider implementation. Replace OpenAi::from_env()? with Anthropic::from_env()? or Ollama::from_env()? and nothing else changes – the builder, context strategy, tool registry, and run call all stay the same.

Implementing a custom provider

If none of the built-in provider crates fit your needs – for example, you want to integrate a proprietary LLM service or a local inference engine – you can implement the Provider trait directly:

use std::future::Future;
use neuron_types::{
    CompletionRequest, CompletionResponse, ContentBlock, Message,
    Provider, ProviderError, Role, StopReason, StreamHandle, TokenUsage,
};

/// A minimal provider that calls a hypothetical LLM API.
pub struct MyProvider {
    api_key: String,
}

impl MyProvider {
    pub fn new(api_key: impl Into<String>) -> Self {
        Self {
            api_key: api_key.into(),
        }
    }
}

impl Provider for MyProvider {
    fn complete(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<CompletionResponse, ProviderError>> + Send {
        let api_key = self.api_key.clone();
        async move {
            // In a real implementation, serialize `request` and send it
            // to your LLM API using reqwest, hyper, etc.
            let response_text = format!(
                "Echo: {}",
                request.messages.last().map(|m| m.content[0].to_string()).unwrap_or_default()
            );

            Ok(CompletionResponse {
                id: "resp-001".to_string(),
                model: "my-model-v1".to_string(),
                message: Message::assistant(response_text),
                usage: TokenUsage {
                    input_tokens: 10,
                    output_tokens: 20,
                    ..Default::default()
                },
                stop_reason: StopReason::EndTurn,
            })
        }
    }

    fn complete_stream(
        &self,
        _request: CompletionRequest,
    ) -> impl Future<Output = Result<StreamHandle, ProviderError>> + Send {
        async {
            Err(ProviderError::InvalidRequest(
                "streaming not supported".to_string(),
            ))
        }
    }
}

`CompletionResponse` fields

Field	Type	Meaning
`id`	`String`	Unique identifier from the LLM API (e.g., `"msg_01XFDUDYJgAACzvnptvVoYEL"`). Used for logging and deduplication.
`model`	`String`	The model name that processed the request (e.g., `"gpt-4o"`, `"claude-sonnet-4-20250514"`).
`message`	`Message`	The assistant response. Construct with `Message::assistant("text")` or manually as `Message { role: Role::Assistant, content: vec![ContentBlock::Text(...)] }`.
`usage`	`TokenUsage`	Token counts for the request. Use `..Default::default()` for optional fields (`cache_read_tokens`, `cache_creation_tokens`, `reasoning_tokens`, `iterations`) when your API does not report them.
`stop_reason`	`StopReason`	Why generation stopped. `EndTurn` for normal completion, `ToolUse` when the model wants to call tools, `MaxTokens` if the response was truncated by the token limit.

The Provider trait requires WasmCompatSend + WasmCompatSync, which are equivalent to Send + Sync on native targets. On WASM, these bounds are automatically satisfied so your provider can compile for both environments.

Error handling

All providers map errors to ProviderError, which classifies errors as retryable or terminal:

use neuron_types::ProviderError;

match provider.complete(request).await {
    Ok(response) => { /* ... */ }
    Err(e) if e.is_retryable() => {
        // Network, RateLimit, ModelLoading, Timeout, ServiceUnavailable
        // Safe to retry with backoff
    }
    Err(e) => {
        // Authentication, InvalidRequest, ModelNotFound, InsufficientResources
        // Do not retry -- fix the root cause
    }
}

`ProviderError` variants

Variant	Retryable	Description
`Network(source)`	Yes	Connection reset, DNS failure
`RateLimit { retry_after }`	Yes	Provider rate limit hit
`ModelLoading(msg)`	Yes	Cold start, model still loading
`Timeout(duration)`	Yes	Request timed out
`ServiceUnavailable(msg)`	Yes	Temporary provider outage
`Authentication(msg)`	No	Bad API key or permissions
`InvalidRequest(msg)`	No	Malformed request
`ModelNotFound(msg)`	No	Requested model does not exist
`InsufficientResources(msg)`	No	Quota or limit exceeded
`StreamError(msg)`	No	Error during streaming
`Other(source)`	No	Catch-all

neuron does not include built-in retry logic. Use is_retryable() with your own retry strategy, tower middleware, or a durable execution engine.

Streaming

All providers support streaming via complete_stream(), which returns a StreamHandle:

use futures::StreamExt;
use neuron_types::StreamEvent;

let handle = provider.complete_stream(request).await?;
let mut stream = handle.receiver;

while let Some(event) = stream.recv().await {
    match event {
        StreamEvent::TextDelta(text) => print!("{text}"),
        StreamEvent::ToolUse { id, name, input } => { /* tool call */ }
        StreamEvent::Usage(usage) => { /* token stats */ }
        StreamEvent::MessageComplete(message) => { /* final assembled message */ }
        StreamEvent::Error(err) => { /* stream error */ }
        _ => {}
    }
}

The transport differs by provider:

Provider	Transport	Format
Anthropic	Server-Sent Events (SSE)	`event:` + `data:` lines
OpenAI	Server-Sent Events (SSE)	`data:` lines, `data: [DONE]` sentinel
Ollama	NDJSON	One JSON object per line

Swapping providers

Because all providers implement the same Provider trait, swapping is a one-line change:

use neuron_context::SlidingWindowStrategy;
use neuron_loop::AgentLoop;

// Switch from Anthropic...
// let provider = Anthropic::from_env()?;

// ...to OpenAI:
let provider = OpenAi::from_env()?;

// Everything else stays the same
let agent = AgentLoop::builder(provider, SlidingWindowStrategy::new(20, 100_000))
    .system_prompt("You are a helpful assistant.")
    .build();

The model field in CompletionRequest defaults to empty, which makes the provider use its configured default model. Set it explicitly when you need a specific model within a run.

API reference

The agent loop

AgentLoop is the commodity while loop at the center of every agent. It composes a Provider, a ToolRegistry, and a ContextStrategy into a loop that calls the LLM, executes tools, manages context, and repeats until the model returns a final text response or a limit is reached.

Quick example

use neuron_context::SlidingWindowStrategy;
use neuron_loop::AgentLoop;
use neuron_tool::ToolRegistry;
use neuron_types::ToolContext;

let provider = Anthropic::from_env()?;
let context = SlidingWindowStrategy::new(20, 100_000);

let mut tools = ToolRegistry::new();
tools.register(MySearchTool);
tools.register(MyCalculateTool);

let mut agent = AgentLoop::builder(provider, context)
    .tools(tools)
    .system_prompt("You are a helpful research assistant.")
    .max_turns(15)
    .parallel_tool_execution(true)
    .build();

let ctx = ToolContext::default();
let result = agent.run_text("Find the population of Tokyo", &ctx).await?;
println!("Response: {}", result.response);
println!("Turns: {}, Tokens: {} in / {} out",
    result.turns, result.usage.input_tokens, result.usage.output_tokens);

Building an `AgentLoop`

The builder pattern

AgentLoop::builder(provider, context) returns an AgentLoopBuilder with sensible defaults. Only the provider and context strategy are required.

let agent = AgentLoop::builder(provider, context)
    .tools(registry)                    // ToolRegistry (default: empty)
    .system_prompt("You are helpful.")  // SystemPrompt (default: empty)
    .max_turns(10)                      // Option<usize> (default: None = unlimited)
    .parallel_tool_execution(true)      // bool (default: false)
    .usage_limits(limits)               // UsageLimits (default: no limits)
    .hook(my_logging_hook)              // ObservabilityHook (can add multiple)
    .durability(my_durable_ctx)         // DurableContext (optional)
    .build();

Direct construction

You can also construct directly when you need to set the full LoopConfig:

use neuron_loop::{AgentLoop, LoopConfig};
use neuron_types::SystemPrompt;

let config = LoopConfig {
    system_prompt: SystemPrompt::Text("You are a code reviewer.".into()),
    max_turns: Some(20),
    parallel_tool_execution: true,
    ..Default::default()
};

let agent = AgentLoop::new(provider, tools, context, config);

Running the loop

`run()` – drive to completion

Appends the user message, then loops until the model returns a text-only response or the turn limit is reached.

let result = agent.run(Message::user("Hello!"), &tool_ctx).await?;
// result: AgentResult { response, messages, usage, turns }

`run_text()` – convenience for text input

Wraps a &str into a Message::user() and calls run():

let result = agent.run_text("What is 2 + 2?", &tool_ctx).await?;

`run_stream()` – streaming output

Uses provider.complete_stream() for real-time token output. Returns a channel receiver that yields StreamEvents:

let mut rx = agent.run_stream(Message::user("Explain Rust ownership"), &tool_ctx).await;

while let Some(event) = rx.recv().await {
    match event {
        StreamEvent::TextDelta(text) => print!("{text}"),
        StreamEvent::ToolUse { name, .. } => println!("\n[calling {name}...]"),
        StreamEvent::Usage(usage) => println!("\n[{} tokens]", usage.output_tokens),
        StreamEvent::MessageComplete(_) => println!("\n[done]"),
        StreamEvent::Error(err) => eprintln!("Error: {err}"),
        _ => {}
    }
}

Tool execution is handled between streaming turns. The loop streams the LLM response, executes any tool calls, appends results, and streams the next turn.

`run_step()` – one turn at a time

Returns a StepIterator that lets you advance the loop manually. Between turns you can inspect messages, inject new ones, and modify the tool registry.

let mut steps = agent.run_step(Message::user("Plan a trip"), &tool_ctx);

while let Some(turn) = steps.next().await {
    match turn {
        TurnResult::ToolsExecuted { calls, results } => {
            println!("Executed {} tools", calls.len());
            // Optionally inject guidance between turns
            steps.inject_message(Message::user("Focus on budget options."));
        }
        TurnResult::FinalResponse(result) => {
            println!("Final: {}", result.response);
        }
        TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
            println!("Compacted: {old_tokens} -> {new_tokens} tokens");
        }
        TurnResult::MaxTurnsReached => {
            println!("Hit turn limit");
        }
        TurnResult::Error(e) => {
            eprintln!("Error: {e}");
        }
    }
}

StepIterator exposes:

next() – advance one turn
messages() – view current conversation
inject_message(msg) – add a message between turns
tools_mut() – modify the tool registry between turns

Distinguishing text responses from tool calls

TurnResult is the key abstraction for telling apart a direct LLM message from a tool-call round trip. When the model returns plain text and no tool calls, the iterator yields TurnResult::FinalResponse containing the finished AgentResult. When the model requests one or more tool calls, the loop executes them and yields TurnResult::ToolsExecuted with the calls and their results. The loop handles dispatch automatically — you just match on the variant.

let mut steps = agent.run_step(Message::user("What's 2 + 2?"), &tool_ctx);

while let Some(turn) = steps.next().await {
    match turn {
        TurnResult::ToolsExecuted { calls, results } => {
            // The model requested tool calls — they've been executed
            for (call_id, tool_name, input) in &calls {
                println!("Model called tool '{tool_name}' with {input}");
            }
            // results contains the ContentBlock::ToolResult for each call
            // The loop automatically sends these back to the model
        }
        TurnResult::FinalResponse(result) => {
            // The model returned a text response — no more tool calls
            println!("Final answer: {}", result.response);
            println!("Total turns: {}", result.turns);
        }
        TurnResult::CompactionOccurred { old_tokens, new_tokens } => {
            println!("Context compacted: {old_tokens} → {new_tokens} tokens");
            // Loop continues automatically
        }
        TurnResult::MaxTurnsReached => {
            println!("Turn limit reached without a final response");
        }
        TurnResult::Error(e) => {
            eprintln!("Loop error: {e}");
        }
    }
}

If you only need the final result and don’t need turn-by-turn control, use run() or run_text() instead — they drive the loop to completion and return AgentResult directly.

`AgentResult`

Returned by run(), run_text(), and TurnResult::FinalResponse:

pub struct AgentResult {
    pub response: String,       // Final text response from the model
    pub messages: Vec<Message>, // Full conversation history
    pub usage: TokenUsage,      // Cumulative token usage across all turns
    pub turns: usize,           // Number of turns completed
}

Loop lifecycle

Each iteration of the loop follows this sequence:

Check cancellation – if tool_ctx.cancellation_token is cancelled, return LoopError::Cancelled
Check max turns – if the turn limit is reached, return LoopError::MaxTurns
Check usage limits – if any token, request, or tool call limit is exceeded, return LoopError::UsageLimitExceeded
Fire LoopIteration hooks
Check context compaction – call context.should_compact() and context.compact() if needed
Build CompletionRequest from current messages, system prompt, and tool definitions
Fire PreLlmCall hooks
Call the provider (or durable context if set)
Fire PostLlmCall hooks
Accumulate token usage
Check stop reason:
- StopReason::Compaction – append message and continue the loop
- StopReason::EndTurn or no tool calls – extract text and return AgentResult
- StopReason::ToolUse – proceed to tool execution
Check cancellation again before tool execution
Execute tool calls (parallel or sequential), firing PreToolExecution and PostToolExecution hooks for each
Check usage limits – verify tool call count against limit
Append tool results as a user message and loop back to step 1

How tool call processing works

When the LLM decides to use a tool, the loop handles the entire dispatch cycle automatically. Here is exactly what happens at the code level:

Step 1: LLM returns tool calls. The provider responds with StopReason::ToolUse and one or more ContentBlock::ToolUse blocks in the assistant message. Each block contains a name, input (JSON arguments), and a unique id.

// Inside the loop — the LLM response contains tool calls:
// response.stop_reason == StopReason::ToolUse
// response.message.content == [
//     ContentBlock::Text("Let me look that up."),
//     ContentBlock::ToolUse { id: "call_1", name: "get_weather", input: {"city": "Tokyo"} },
// ]

Step 2: Extract tool calls. The loop filters the assistant message for ContentBlock::ToolUse blocks and collects them as (id, name, input) tuples. The full assistant message (including any text) is appended to the conversation.

let tool_calls: Vec<_> = response.message.content.iter()
    .filter_map(|block| {
        if let ContentBlock::ToolUse { id, name, input } = block {
            Some((id.clone(), name.clone(), input.clone()))
        } else {
            None
        }
    })
    .collect();

// Append the assistant message (with both text and tool use blocks)
self.messages.push(response.message.clone());

Step 3: Execute tools via the registry. Each tool call is dispatched to the ToolRegistry, which finds the matching tool by name, deserializes the JSON input, and calls the tool’s call() method. Pre- and post-execution hooks fire around each call.

// For each tool call, the loop calls execute_single_tool:
// 1. Fire PreToolExecution hooks (can Skip or Terminate)
// 2. Call self.tools.execute(tool_name, input, tool_ctx)
// 3. Fire PostToolExecution hooks
// 4. Wrap the ToolOutput into a ContentBlock::ToolResult

If parallel_tool_execution is true and there are multiple tool calls, all calls run concurrently via futures::future::join_all. Otherwise they execute sequentially.

Step 4: Append results and continue. The tool results are collected into ContentBlock::ToolResult blocks (each linked back to the original call by tool_use_id) and appended as a User message. The loop then continues — the LLM sees the tool results and can respond with text or call more tools.

// Each tool result looks like:
// ContentBlock::ToolResult {
//     tool_use_id: "call_1",
//     content: [ContentItem::Text("{\"temp\": 22, \"conditions\": \"sunny\"}")],
//     is_error: false,
// }

// All results are appended as a single user message
self.messages.push(Message {
    role: Role::User,
    content: tool_result_blocks,
});
// Loop continues → LLM sees results → responds or calls more tools

Special case — ToolError::ModelRetry. If a tool returns Err(ToolError::ModelRetry(hint)), the loop does not propagate an error. Instead, it converts the hint into a ToolResult with is_error: true. The model receives the hint and can retry with corrected arguments:

// Tool returns: Err(ToolError::ModelRetry("city must be a valid name, got '123'"))
// Loop converts to: ContentBlock::ToolResult {
//     tool_use_id: "call_1",
//     content: [ContentItem::Text("city must be a valid name, got '123'")],
//     is_error: true,
// }
// Model sees the error and retries: get_weather({"city": "Tokyo"})

Complete flow diagram

User: "What's the weather in Tokyo?"
    │
    ▼
┌─────────────────────────────────────────────┐
│ Turn 1: LLM call                            │
│   Request: [User: "What's the weather..."]  │
│   Response: ToolUse(get_weather, {city:      │
│             "Tokyo"})                        │
│   StopReason: ToolUse                       │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│ Tool execution                              │
│   Registry dispatches get_weather           │
│   Tool returns: {temp: 22, conditions:      │
│                  "sunny"}                    │
│   Result appended as ToolResult message     │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│ Turn 2: LLM call                            │
│   Request: [User, Assistant(ToolUse),       │
│             User(ToolResult)]               │
│   Response: "It's 22°C and sunny in Tokyo." │
│   StopReason: EndTurn                       │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
              AgentResult {
                  response: "It's 22°C and sunny in Tokyo.",
                  turns: 2,
                  ...
              }

Cancellation

The loop checks ToolContext.cancellation_token at two points:

Top of each iteration (before the max turns check)
Before tool execution (after the LLM returns tool calls)

use tokio_util::sync::CancellationToken;

let token = CancellationToken::new();
let ctx = ToolContext {
    cancellation_token: token.clone(),
    ..Default::default()
};

// Cancel from another task
tokio::spawn(async move {
    tokio::time::sleep(Duration::from_secs(30)).await;
    token.cancel();
});

match agent.run_text("Long task...", &ctx).await {
    Err(LoopError::Cancelled) => println!("Cancelled!"),
    Ok(result) => println!("{}", result.response),
    Err(e) => eprintln!("{e}"),
}

Parallel tool execution

When LoopConfig.parallel_tool_execution is true and the LLM returns multiple tool calls in a single response, all calls execute concurrently via futures::future::join_all. When false (the default), tools execute sequentially in order.

let agent = AgentLoop::builder(provider, context)
    .parallel_tool_execution(true)
    .tools(registry)
    .build();

Parallel execution applies to run() and run_step(). Streaming (run_stream()) always executes tools sequentially.

Usage limits

UsageLimits enforces token and request budgets on the agent loop. When any limit is exceeded, the loop returns LoopError::UsageLimitExceeded with a message describing which limit was hit.

use neuron_loop::AgentLoop;
use neuron_types::UsageLimits;

let limits = UsageLimits::default()
    .with_input_tokens_limit(500_000)
    .with_output_tokens_limit(50_000)
    .with_total_tokens_limit(600_000)
    .with_request_limit(25)
    .with_tool_calls_limit(100);

let agent = AgentLoop::builder(provider, context)
    .tools(registry)
    .usage_limits(limits)
    .build();

Each field is optional – set only the limits you care about. Unset limits are not enforced.

Limit	Checked against
`input_tokens_limit`	Cumulative `TokenUsage.input_tokens` across all turns
`output_tokens_limit`	Cumulative `TokenUsage.output_tokens` across all turns
`total_tokens_limit`	Sum of cumulative input + output tokens
`request_limit`	Number of LLM calls made (incremented each turn)
`tool_calls_limit`	Number of tool executions (incremented per tool call)

The loop checks limits at two points:

Before each LLM call – checks token and request limits against accumulated usage
After tool execution – checks the tool call count against the limit

When a limit is exceeded, the loop stops immediately and returns LoopError::UsageLimitExceeded with a descriptive message (e.g., "output token limit exceeded: 50123 > 50000").

You can also construct UsageLimits directly:

use neuron_types::UsageLimits;

let limits = UsageLimits {
    input_tokens_limit: Some(500_000),
    output_tokens_limit: Some(50_000),
    total_tokens_limit: None,
    request_limit: Some(25),
    tool_calls_limit: None,
};

Or use LoopConfig directly:

use neuron_loop::LoopConfig;
use neuron_types::UsageLimits;

let config = LoopConfig {
    usage_limits: Some(UsageLimits::default()
        .with_total_tokens_limit(1_000_000)
        .with_request_limit(50)),
    ..Default::default()
};

Context compaction

The loop supports two independent compaction mechanisms:

Client-side compaction

Uses the ContextStrategy you provide. Between turns, the loop calls should_compact() and compact() to reduce message history when tokens exceed the configured threshold.

// SlidingWindow compacts by dropping old messages
let agent = AgentLoop::builder(provider, SlidingWindowStrategy::new(20, 100_000))
    .build();

Server-side compaction

When the provider returns StopReason::Compaction, the loop automatically continues without treating it as a final response. The compacted content arrives in ContentBlock::Compaction within the assistant’s message.

No configuration is needed in the loop – it handles this transparently. Set CompletionRequest.context_management on the provider side to enable it.

`ToolError::ModelRetry`

When a tool returns Err(ToolError::ModelRetry(hint)), the loop converts it to a ToolOutput with is_error: true and the hint as content. The model receives the hint and can retry with corrected arguments.

This does not propagate as LoopError::Tool. The loop continues normally, giving the model a chance to self-correct.

Observability hooks

Add hooks to observe or control loop behavior. Hooks receive events at each step and return HookAction::Continue, HookAction::Skip, or HookAction::Terminate.

use neuron_types::{ObservabilityHook, HookEvent, HookAction, HookError};

struct TokenBudgetHook { max_tokens: usize }

impl ObservabilityHook for TokenBudgetHook {
    async fn on_event(&self, event: HookEvent<'_>) -> Result<HookAction, HookError> {
        match event {
            HookEvent::PostLlmCall { response } => {
                if response.usage.output_tokens > self.max_tokens {
                    return Ok(HookAction::Terminate {
                        reason: "token budget exceeded".into(),
                    });
                }
            }
            _ => {}
        }
        Ok(HookAction::Continue)
    }
}

let agent = AgentLoop::builder(provider, context)
    .hook(TokenBudgetHook { max_tokens: 10_000 })
    .build();

Hook events

Event	Fired when	Skip/Terminate behavior
`LoopIteration { turn }`	Start of each turn	Terminate stops the loop
`PreLlmCall { request }`	Before calling the provider	Terminate stops the loop
`PostLlmCall { response }`	After receiving the response	Terminate stops the loop
`PreToolExecution { tool_name, input }`	Before each tool call	Skip returns rejection as tool result
`PostToolExecution { tool_name, output }`	After each tool call	Terminate stops the loop
`ContextCompaction { old_tokens, new_tokens }`	After context is compacted	Terminate stops the loop

Durable execution

For crash-recoverable agents, set a DurableContext on the loop. When present, LLM calls go through DurableContext::execute_llm_call and tool calls go through DurableContext::execute_tool, enabling journaling and replay by engines like Temporal, Restate, or Inngest.

let agent = AgentLoop::builder(provider, context)
    .durability(my_temporal_context)
    .build();

The loop handles the durable/non-durable split transparently. All other behavior (hooks, compaction, cancellation) works the same way.

Error handling

run() and run_text() return Result<AgentResult, LoopError>:

Variant	Cause
`LoopError::Provider(e)`	LLM call failed
`LoopError::Tool(e)`	Tool execution failed (except `ModelRetry`)
`LoopError::Context(e)`	Context compaction failed
`LoopError::MaxTurns(n)`	Turn limit reached
`LoopError::UsageLimitExceeded(msg)`	Token, request, or tool call budget exceeded
`LoopError::HookTerminated(reason)`	A hook returned `Terminate`
`LoopError::Cancelled`	Cancellation token was triggered

run_stream() sends errors as StreamEvent::Error on the channel instead of returning them as Result.

API reference

MCP Integration

neuron-mcp connects your agent to external tool servers using the Model Context Protocol (MCP). It wraps the rmcp crate (the official Rust MCP SDK) and bridges MCP tools into neuron’s ToolRegistry so they appear like any other tool to the agent loop.

Quick Example

use std::sync::Arc;
use neuron_mcp::{McpClient, McpToolBridge, StdioConfig};
use neuron_tool::ToolRegistry;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Connect to an MCP server via stdio
    let client = Arc::new(McpClient::connect_stdio(StdioConfig {
        command: "npx".to_string(),
        args: vec!["-y".to_string(), "@modelcontextprotocol/server-filesystem".to_string(), "/tmp".to_string()],
        env: vec![],
    }).await?);

    // Discover tools and register them
    let tools = McpToolBridge::discover(&client).await?;
    let mut registry = ToolRegistry::new();
    for tool in tools {
        registry.register_dyn(tool);
    }

    Ok(())
}

API Walkthrough

McpClient

McpClient manages the connection to an MCP server. Two transports are supported:

Stdio – spawns a child process and communicates over stdin/stdout:

use neuron_mcp::{McpClient, StdioConfig};

let client = McpClient::connect_stdio(StdioConfig {
    command: "npx".to_string(),
    args: vec!["-y".to_string(), "@modelcontextprotocol/server-everything".to_string()],
    env: vec![("NODE_ENV".to_string(), "production".to_string())],
}).await?;

Streamable HTTP – connects to a remote MCP server over HTTP with SSE:

use neuron_mcp::{McpClient, HttpConfig};

let client = McpClient::connect_http(HttpConfig {
    url: "http://localhost:8080/mcp".to_string(),
    auth_header: Some("Bearer my-token".to_string()),
    headers: vec![],
}).await?;

Once connected, McpClient provides methods for all MCP operations:

Method	Description
`list_tools(cursor)`	List available tools (paginated)
`list_all_tools()`	List all tools (fetches every page)
`call_tool(name, arguments)`	Call a tool with a JSON argument map
`call_tool_json(name, value)`	Convenience: accepts `serde_json::Value`
`list_resources(cursor)`	List available resources
`read_resource(uri)`	Read a resource by URI
`list_prompts(cursor)`	List available prompt templates
`get_prompt(name, arguments)`	Retrieve an expanded prompt
`is_closed()`	Check if the transport is closed
`peer()`	Access the underlying `rmcp` peer for advanced use

McpToolBridge

McpToolBridge bridges a single MCP tool into neuron’s ToolDyn trait. When the agent loop calls a bridged tool, the call is forwarded to the MCP server via McpClient::call_tool.

The typical workflow uses McpToolBridge::discover(), which lists all tools from the server and returns them as Arc<dyn ToolDyn> ready for registration:

use std::sync::Arc;
use neuron_mcp::{McpClient, McpToolBridge};
use neuron_tool::ToolRegistry;

let client = Arc::new(McpClient::connect_stdio(config).await?);

// Discover returns Vec<Arc<dyn ToolDyn>>
let bridges = McpToolBridge::discover(&client).await?;

let mut registry = ToolRegistry::new();
for bridge in bridges {
    registry.register_dyn(bridge);
}

An equivalent convenience method exists on McpClient itself:

let tools = McpClient::discover_tools(&client).await?;

You can also bridge a single known tool manually:

use neuron_mcp::McpToolBridge;

let bridge = McpToolBridge::new(Arc::clone(&client), tool_definition);
registry.register_dyn(Arc::new(bridge));

McpServer

McpServer does the reverse: it exposes a neuron ToolRegistry as an MCP server, making your tools available to any MCP client.

use neuron_mcp::McpServer;
use neuron_tool::ToolRegistry;

let mut registry = ToolRegistry::new();
// ... register your tools ...

let server = McpServer::new(registry)
    .with_name("my-agent-tools")
    .with_version("1.0.0")
    .with_instructions("Tools for file manipulation");

// Serve over stdio (blocks until client disconnects)
server.serve_stdio().await?;

The server handles tools/list and tools/call MCP requests by delegating to the underlying ToolRegistry.

Configuration Types

StdioConfig – command, args, env for spawning a child process
HttpConfig – url, auth_header, headers for HTTP connections
PaginatedList<T> – generic wrapper with items and next_cursor

MCP-Specific Types

These types represent MCP protocol objects:

McpResource – uri, name, title, description, mime_type
McpResourceContents – uri, mime_type, text or blob
McpPrompt – name, title, description, arguments
McpPromptArgument – name, description, required

Error Handling

All MCP operations return Result<_, McpError>. The variants are:

McpError::Connection – failed to connect (process spawn or HTTP)
McpError::Initialization – MCP handshake failed
McpError::ToolCall – a tool call returned an error
McpError::Transport – transport-level communication error

Advanced Usage

Mixing MCP and Native Tools

MCP tools and native tools live side by side in the same ToolRegistry. The agent loop cannot tell the difference:

use neuron_mcp::{McpClient, McpToolBridge, StdioConfig};
use neuron_tool::ToolRegistry;

let mut registry = ToolRegistry::new();

// Register a native tool
registry.register(MyNativeTool);

// Register MCP tools from a filesystem server
let fs_client = Arc::new(McpClient::connect_stdio(fs_config).await?);
for tool in McpToolBridge::discover(&fs_client).await? {
    registry.register_dyn(tool);
}

// Register MCP tools from a different server
let db_client = Arc::new(McpClient::connect_http(db_config).await?);
for tool in McpToolBridge::discover(&db_client).await? {
    registry.register_dyn(tool);
}

// All tools are now available to the agent loop

Accessing the Raw rmcp Peer

For operations not covered by McpClient’s methods, access the underlying rmcp::Peer directly:

let peer = client.peer();
// Use any rmcp method directly

Tool Annotations

MCP tools can carry behavioral annotations (read-only, destructive, idempotent, open-world). These are preserved during bridging and available on the ToolDefinition:

let tools = client.list_all_tools().await?;
for tool in &tools {
    if let Some(ann) = &tool.annotations {
        println!("{}: read_only={:?}", tool.name, ann.read_only_hint);
    }
}

API Docs

Full API documentation: neuron-mcp on docs.rs

Runtime

neuron-runtime provides production infrastructure for agents: session persistence, input/output guardrails, structured observability, durable execution, and sandboxed tool execution.

Quick Example

use std::path::PathBuf;
use neuron_runtime::*;
use neuron_types::Message;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Persist sessions to disk
    let storage = FileSessionStorage::new(PathBuf::from("./sessions"));
    let mut session = Session::new("s-1", PathBuf::from("."));
    session.messages.push(Message::user("Hello"));
    storage.save(&session).await?;

    // Load it back later
    let loaded = storage.load("s-1").await?;
    println!("{} messages", loaded.messages.len());
    Ok(())
}

Sessions

Sessions store conversation message history along with metadata (timestamps, token usage, custom state). The SessionStorage trait defines how sessions are persisted.

Session Type

use neuron_runtime::Session;

let mut session = Session::new("chat-42", "/home/user/project".into());
session.messages.push(Message::user("What is Rust?"));
session.state.custom.insert("theme".to_string(), serde_json::json!("dark"));

A Session contains:

Field	Type	Description
`id`	`String`	Unique session identifier
`messages`	`Vec<Message>`	Conversation history
`state`	`SessionState`	Working directory, token usage, event count, custom metadata
`created_at`	`DateTime<Utc>`	Creation timestamp
`updated_at`	`DateTime<Utc>`	Last update timestamp

SessionState holds mutable runtime data: cwd, token_usage, event_count, and a custom map for arbitrary key-value metadata.

SessionStorage Trait

pub trait SessionStorage: Send + Sync {
    async fn save(&self, session: &Session) -> Result<(), StorageError>;
    async fn load(&self, id: &str) -> Result<Session, StorageError>;
    async fn list(&self) -> Result<Vec<SessionSummary>, StorageError>;
    async fn delete(&self, id: &str) -> Result<(), StorageError>;
}

Two implementations ship with the crate:

InMemorySessionStorage – backed by Arc<RwLock<HashMap>>, suitable for testing and short-lived processes:

let storage = InMemorySessionStorage::new();
storage.save(&session).await?;

FileSessionStorage – one JSON file per session at {directory}/{session_id}.json. Creates the directory on first save:

let storage = FileSessionStorage::new(PathBuf::from("./sessions"));
storage.save(&session).await?;
// Creates ./sessions/chat-42.json

Session Summaries

Session::summary() returns a lightweight SessionSummary without the full message history – useful for listing sessions:

let summaries = storage.list().await?;
for s in &summaries {
    println!("{}: {} messages, created {}", s.id, s.message_count, s.created_at);
}

Session persistence with AgentLoop

Session persistence and durable execution are two complementary layers. SessionStorage saves conversation state between runs — when the process exits, the session is written to disk (or another backend), and a new process can load it later to resume. DurableContext protects individual operations during a run — if the process crashes mid-tool-call, the durable engine journals and replays to recover. They compose naturally: DurableContext protects during a run, SessionStorage saves between runs.

use neuron_runtime::{Session, FileSessionStorage, SessionStorage};
use neuron_loop::AgentLoop;
use neuron_types::Message;

// --- Save after a conversation ---
let result = agent.run_text("Hello!", &ctx).await?;

let mut session = Session::new("session-123", std::env::current_dir()?);
session.messages = result.messages.clone();
session.state.token_usage = result.usage.clone();

let storage = FileSessionStorage::new("./sessions".into());
storage.save(&session).await?;

// --- Resume later (new process) ---
let storage = FileSessionStorage::new("./sessions".into());
let loaded = storage.load("session-123").await?;

// Build a new agent and continue the conversation
let mut agent = AgentLoop::builder(provider, context)
    .tools(tools)
    .system_prompt("You are a helpful assistant.")
    .build();

// Feed the loaded history back by running with the conversation context
// The previous messages provide continuity
let resume_msg = Message::user("Continue where we left off.");
let result = agent.run(resume_msg, &ctx).await?;

AgentResult.messages contains the full conversation history including tool calls and results, so saving it preserves the complete context. When you load and resume, the model sees the entire prior exchange — tool invocations, tool outputs, assistant reasoning — giving it full continuity without re-executing any previous steps.

Guardrails

Guardrails are safety checks that run on input (before it reaches the LLM) or output (before it reaches the user).

GuardrailResult

Every guardrail check returns one of three outcomes:

Pass – input/output is acceptable
Tripwire(reason) – immediately halt execution
Warn(reason) – allow execution but log a warning

InputGuardrail and OutputGuardrail

use std::future::Future;
use neuron_runtime::{InputGuardrail, GuardrailResult};

struct NoSecrets;
impl InputGuardrail for NoSecrets {
    fn check(&self, input: &str) -> impl Future<Output = GuardrailResult> + Send {
        async move {
            if input.contains("API_KEY") || input.contains("sk-") {
                GuardrailResult::Tripwire("Input contains a secret".to_string())
            } else {
                GuardrailResult::Pass
            }
        }
    }
}

Output guardrails use the same pattern via the OutputGuardrail trait.

Running Multiple Guardrails

Use run_input_guardrails and run_output_guardrails to evaluate a sequence. They return the first non-Pass result, or Pass if all checks pass:

use neuron_runtime::{run_input_guardrails, ErasedInputGuardrail};

let no_secrets = NoSecrets;
let no_sql = NoSqlInjection;
let guardrails: Vec<&dyn ErasedInputGuardrail> = vec![&no_secrets, &no_sql];

let result = run_input_guardrails(&guardrails, user_input).await;
if result.is_tripwire() {
    // Reject the input
}

GuardrailHook

GuardrailHook wraps guardrails as an ObservabilityHook, integrating them directly into the agent loop lifecycle:

Input guardrails fire on HookEvent::PreLlmCall
Output guardrails fire on HookEvent::PostLlmCall
Tripwire maps to HookAction::Terminate
Warn logs via tracing::warn! and returns HookAction::Continue
Pass returns HookAction::Continue

use neuron_runtime::GuardrailHook;
use neuron_loop::AgentLoop;

let hook = GuardrailHook::new()
    .input_guardrail(NoSecrets)
    .output_guardrail(NoProfanity);

let mut agent = AgentLoop::builder(provider, context)
    .tools(registry)
    .build();
agent.add_hook(hook);

Complete guardrail integration with AgentLoop

The InputGuardrail example above shows how to check user input. Output guardrails follow the same trait pattern via OutputGuardrail. Here is a complete output guardrail that detects PII (email addresses and phone numbers) in the model’s response, wired into AgentLoop end-to-end.

Implement the guardrail:

use std::future::Future;
use neuron_runtime::{OutputGuardrail, GuardrailResult};

struct NoPiiOutput;

impl OutputGuardrail for NoPiiOutput {
    fn check(&self, output: &str) -> impl Future<Output = GuardrailResult> + Send {
        async move {
            // Check for email addresses
            if output.contains('@') && output.contains('.') {
                return GuardrailResult::Tripwire(
                    "Response contains a potential email address".to_string(),
                );
            }
            // Check for phone number patterns (sequences of 10+ digits)
            let digit_count = output.chars().filter(|c| c.is_ascii_digit()).count();
            if digit_count >= 10 {
                return GuardrailResult::Tripwire(
                    "Response contains a potential phone number".to_string(),
                );
            }
            GuardrailResult::Pass
        }
    }
}

Wire it into AgentLoop:

use neuron_runtime::GuardrailHook;
use neuron_loop::{AgentLoop, LoopError};

let guardrail_hook = GuardrailHook::builder()
    .output_guardrail(NoPiiOutput)
    .build();

let mut agent = AgentLoop::builder(provider, context)
    .tools(tools)
    .hook(guardrail_hook)
    .build();

// Handle guardrail rejection
match agent.run_text("What's John's email?", &ctx).await {
    Ok(result) => println!("Response: {}", result.response),
    Err(LoopError::HookTerminated(reason)) => {
        println!("Guardrail blocked: {reason}");
        // Present safe fallback to user
    }
    Err(e) => eprintln!("Other error: {e}"),
}

Guardrails are gates, not transformers — they accept (Pass), reject (Tripwire), or flag (Warn), but do not modify content. To transform output, post-process the AgentResult after run() returns.

TracingHook

TracingHook is a concrete ObservabilityHook that emits structured tracing events for every stage of the agent loop. Wire it to any tracing-compatible subscriber for stdout logging, OpenTelemetry export, or custom collectors.

use neuron_runtime::TracingHook;

let hook = TracingHook::new();
// Add to agent loop: agent.add_hook(hook);

TracingHook always returns HookAction::Continue – it observes but never controls execution. It maps 8 hook events to structured spans:

Event	Level	Span name
`LoopIteration`	DEBUG	`neuron.loop.iteration`
`PreLlmCall`	DEBUG	`neuron.llm.pre_call`
`PostLlmCall`	DEBUG	`neuron.llm.post_call`
`PreToolExecution`	DEBUG	`neuron.tool.pre_execution`
`PostToolExecution`	DEBUG	`neuron.tool.post_execution`
`ContextCompaction`	INFO	`neuron.context.compaction`
`SessionStart`	INFO	`neuron.session.start`
`SessionEnd`	INFO	`neuron.session.end`

Set RUST_LOG=debug to see all events:

RUST_LOG=debug cargo run --example tracing_hook -p neuron-runtime

PermissionPolicy

The PermissionPolicy trait approves or denies tool calls before execution. It returns a PermissionDecision:

Allow – proceed with the tool call
Deny(reason) – reject the call
Ask(prompt) – ask the user for confirmation

use neuron_types::{PermissionPolicy, PermissionDecision};

struct ReadOnlyPolicy;
impl PermissionPolicy for ReadOnlyPolicy {
    fn check(&self, tool_name: &str, _input: &serde_json::Value) -> PermissionDecision {
        match tool_name {
            "read_file" | "list_dir" => PermissionDecision::Allow,
            _ => PermissionDecision::Deny(format!("{tool_name} is not allowed in read-only mode")),
        }
    }
}

DurableContext

DurableContext wraps LLM calls and tool execution so durable engines (Temporal, Restate, Inngest) can journal, replay, and recover from crashes.

The Trait

pub trait DurableContext: Send + Sync {
    async fn execute_llm_call(&self, request: CompletionRequest, options: ActivityOptions) -> Result<CompletionResponse, DurableError>;
    async fn execute_tool(&self, tool_name: &str, input: Value, ctx: &ToolContext, options: ActivityOptions) -> Result<ToolOutput, DurableError>;
    async fn wait_for_signal<T: DeserializeOwned>(&self, signal_name: &str, timeout: Duration) -> Result<Option<T>, DurableError>;
    fn should_continue_as_new(&self) -> bool;
    async fn continue_as_new(&self, state: Value) -> Result<(), DurableError>;
    async fn sleep(&self, duration: Duration);
    fn now(&self) -> DateTime<Utc>;
}

LocalDurableContext

For local development and testing, LocalDurableContext passes through to the provider and tools directly – no journaling, no replay:

use std::sync::Arc;
use neuron_runtime::LocalDurableContext;
use neuron_tool::ToolRegistry;

let provider = Arc::new(my_provider);
let tools = Arc::new(ToolRegistry::new());
let durable = LocalDurableContext::new(provider, tools);

// Use in the agent loop
agent.set_durability(durable);

In production, swap LocalDurableContext for a Temporal or Restate implementation. The calling code stays the same.

ActivityOptions

Controls timeout and retry behavior for durable activities:

use neuron_types::{ActivityOptions, RetryPolicy};
use std::time::Duration;

let options = ActivityOptions {
    start_to_close_timeout: Duration::from_secs(30),
    heartbeat_timeout: Some(Duration::from_secs(10)),
    retry_policy: Some(RetryPolicy {
        initial_interval: Duration::from_secs(1),
        backoff_coefficient: 2.0,
        maximum_attempts: 3,
        maximum_interval: Duration::from_secs(30),
        non_retryable_errors: vec!["Authentication".to_string()],
    }),
};

Sandbox

The Sandbox trait wraps tool execution with isolation – filesystem restrictions, network limits, or container boundaries:

use neuron_runtime::{Sandbox, NoOpSandbox};

// NoOpSandbox passes through directly (no isolation)
let sandbox = NoOpSandbox;
let output = sandbox.execute_tool(&*tool, input, &ctx).await?;

Implement Sandbox for your own isolation strategy:

use neuron_runtime::Sandbox;
use neuron_types::{ToolDyn, ToolContext, ToolOutput, SandboxError};

struct DockerSandbox { image: String }

impl Sandbox for DockerSandbox {
    async fn execute_tool(
        &self,
        tool: &dyn ToolDyn,
        input: serde_json::Value,
        ctx: &ToolContext,
    ) -> Result<ToolOutput, SandboxError> {
        // Spawn a container, execute tool inside, return output
        todo!()
    }
}

API Docs

Full API documentation: neuron-runtime on docs.rs

Embeddings

neuron provides a provider-agnostic EmbeddingProvider trait for generating text embeddings, with an OpenAI implementation in neuron-provider-openai.

Quick Example

use neuron_provider_openai::OpenAi;
use neuron_types::{EmbeddingProvider, EmbeddingRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = OpenAi::from_env()?;

    let response = client.embed(EmbeddingRequest {
        model: "text-embedding-3-small".to_string(),
        input: vec![
            "Rust is a systems programming language.".to_string(),
            "Python is great for scripting.".to_string(),
        ],
        dimensions: None,
        ..Default::default()
    }).await?;

    println!("Got {} embeddings", response.embeddings.len());
    println!("First vector has {} dimensions", response.embeddings[0].len());
    println!("Tokens used: {}", response.usage.total_tokens);
    Ok(())
}

API Walkthrough

EmbeddingProvider Trait

The trait is defined in neuron-types and kept separate from Provider because not all embedding models support chat completions and not all chat providers support embeddings. Implement both on a single struct when a provider supports both capabilities.

pub trait EmbeddingProvider: Send + Sync {
    async fn embed(
        &self,
        request: EmbeddingRequest,
    ) -> Result<EmbeddingResponse, EmbeddingError>;
}

The trait uses RPITIT (return position impl trait in trait) and is not object-safe. Use generics <E: EmbeddingProvider> for composition.

EmbeddingRequest

pub struct EmbeddingRequest {
    /// The embedding model (e.g. "text-embedding-3-small").
    pub model: String,
    /// Text inputs to embed. Multiple strings are batched into one API call.
    pub input: Vec<String>,
    /// Optional output dimensionality (not all models support this).
    pub dimensions: Option<usize>,
    /// Provider-specific extra fields forwarded verbatim.
    pub extra: HashMap<String, serde_json::Value>,
}

model – the model identifier. The OpenAI implementation defaults to text-embedding-3-small when this is empty.
input – a batch of strings. Each string produces one embedding vector in the response. Batching multiple inputs in a single request is more efficient than making separate calls.
dimensions – reduces the output dimensionality when supported by the model (e.g., OpenAI’s text-embedding-3-small supports 256, 512, or 1536).
extra – a map of provider-specific fields merged directly into the request body. Useful for options not covered by the common fields.

EmbeddingResponse

pub struct EmbeddingResponse {
    /// One embedding vector per input string, in the same order.
    pub embeddings: Vec<Vec<f32>>,
    /// The model that generated the embeddings.
    pub model: String,
    /// Token usage statistics.
    pub usage: EmbeddingUsage,
}

The embeddings vector is always in the same order as the input vector in the request. Each inner Vec<f32> is a dense floating-point embedding.

EmbeddingUsage

pub struct EmbeddingUsage {
    /// Number of tokens in the input.
    pub prompt_tokens: usize,
    /// Total tokens consumed.
    pub total_tokens: usize,
}

EmbeddingError

All embedding operations return Result<_, EmbeddingError>. The variants are:

Variant	Description	Retryable?
`Authentication(String)`	Invalid API key or forbidden	No
`RateLimit { retry_after }`	Provider rate limit hit	Yes
`InvalidRequest(String)`	Bad model name, empty input, etc.	No
`Network(source)`	Connection failure, DNS error	Yes
`Other(source)`	Catch-all for unexpected errors	Depends

Use error.is_retryable() to decide whether to retry:

match client.embed(request).await {
    Ok(response) => { /* use embeddings */ }
    Err(e) if e.is_retryable() => { /* back off and retry */ }
    Err(e) => { /* terminal error, report to user */ }
}

OpenAI Implementation

neuron-provider-openai implements EmbeddingProvider on the same OpenAi struct that implements Provider. No additional setup is needed – the embedding calls reuse the same API key, base URL, and HTTP client.

use neuron_provider_openai::OpenAi;
use neuron_types::{EmbeddingProvider, EmbeddingRequest};

// Same client for both chat completions and embeddings
let client = OpenAi::new("sk-...")
    .base_url("https://api.openai.com");

// Chat completion
let chat_response = client.complete(completion_request).await?;

// Embedding
let embed_response = client.embed(EmbeddingRequest {
    model: "text-embedding-3-small".to_string(),
    input: vec!["Hello world".to_string()],
    ..Default::default()
}).await?;

Default Model

When EmbeddingRequest.model is empty, the OpenAI implementation defaults to text-embedding-3-small.

Controlling Dimensions

Use the dimensions field to reduce output size. Smaller embeddings use less storage and are faster to compare, at the cost of some accuracy:

let response = client.embed(EmbeddingRequest {
    model: "text-embedding-3-small".to_string(),
    input: vec!["hello".to_string()],
    dimensions: Some(256), // Default is 1536 for this model
    ..Default::default()
}).await?;

assert_eq!(response.embeddings[0].len(), 256);

Provider-Specific Options

Pass extra fields that the OpenAI API supports but neuron does not model explicitly:

use std::collections::HashMap;

let mut extra = HashMap::new();
extra.insert("user".to_string(), serde_json::json!("user-123"));

let response = client.embed(EmbeddingRequest {
    model: "text-embedding-3-large".to_string(),
    input: vec!["text to embed".to_string()],
    extra,
    ..Default::default()
}).await?;

Implementing a Custom EmbeddingProvider

To add embedding support for a new provider, implement the trait in your provider crate:

use std::future::Future;
use neuron_types::{EmbeddingProvider, EmbeddingRequest, EmbeddingResponse, EmbeddingError};

struct MyEmbeddingProvider { /* ... */ }

impl EmbeddingProvider for MyEmbeddingProvider {
    fn embed(
        &self,
        request: EmbeddingRequest,
    ) -> impl Future<Output = Result<EmbeddingResponse, EmbeddingError>> + Send {
        async move {
            // Call your embedding API
            let vectors = call_my_api(&request.input).await?;

            Ok(EmbeddingResponse {
                embeddings: vectors,
                model: request.model,
                usage: EmbeddingUsage {
                    prompt_tokens: 0,
                    total_tokens: 0,
                },
            })
        }
    }
}

API Docs

Full API documentation:

Trait and types: neuron-types on docs.rs
OpenAI implementation: neuron-provider-openai on docs.rs

Testing Agents

neuron is designed for testability. Every block – providers, tools, context strategies, guardrails – can be tested independently without real API calls.

Quick Example

use std::sync::Mutex;
use neuron_types::*;

struct MockProvider {
    responses: Mutex<Vec<CompletionResponse>>,
}

impl Provider for MockProvider {
    async fn complete(&self, _req: CompletionRequest) -> Result<CompletionResponse, ProviderError> {
        let mut responses = self.responses.lock().unwrap();
        Ok(responses.remove(0))
    }
    async fn complete_stream(&self, _req: CompletionRequest) -> Result<StreamHandle, ProviderError> {
        Err(ProviderError::InvalidRequest("mock does not stream".into()))
    }
}

Testing Strategies

1. Mock Providers

A mock provider returns fixed CompletionResponse values in sequence. This lets you test agent behavior without network calls or API keys.

Single-turn response (model ends the conversation):

fn end_turn_response(text: &str) -> CompletionResponse {
    CompletionResponse {
        id: "mock-1".to_string(),
        model: "mock".to_string(),
        message: Message::assistant(text),
        usage: TokenUsage::default(),
        stop_reason: StopReason::EndTurn,
    }
}

Tool-calling response (model requests a tool call):

fn tool_call_response(tool_name: &str, tool_id: &str, args: serde_json::Value) -> CompletionResponse {
    CompletionResponse {
        id: "mock-2".to_string(),
        model: "mock".to_string(),
        message: Message {
            role: Role::Assistant,
            content: vec![ContentBlock::ToolUse {
                id: tool_id.to_string(),
                name: tool_name.to_string(),
                input: args,
            }],
        },
        usage: TokenUsage::default(),
        stop_reason: StopReason::ToolUse,
    }
}

Multi-turn mock – queue responses to simulate a full conversation:

let provider = MockProvider {
    responses: Mutex::new(vec![
        // Turn 1: model calls a tool
        tool_call_response("get_weather", "call-1", serde_json::json!({"city": "Tokyo"})),
        // Turn 2: model responds with the final answer
        end_turn_response("The weather in Tokyo is 72F and sunny."),
    ]),
};

2. Testing Tools Independently

Tools implement a trait with typed arguments and outputs. Test them directly without involving a provider or loop:

use neuron_types::{Tool, ToolContext};

#[tokio::test]
async fn test_weather_tool() {
    let tool = GetWeather;
    let ctx = ToolContext::default();

    let result = tool.call(WeatherArgs { city: "Tokyo".to_string() }, &ctx).await;
    assert!(result.is_ok());
    assert!(result.unwrap().contains("Tokyo"));
}

ToolContext::default() provides sensible defaults (cwd from the environment, empty session ID, fresh cancellation token). Override fields when your tool depends on them:

let ctx = ToolContext {
    session_id: "test-session".to_string(),
    cwd: PathBuf::from("/tmp/test"),
    ..Default::default()
};

3. Testing Tools via the Registry

To test the full JSON serialization/deserialization path through the ToolRegistry:

use neuron_tool::ToolRegistry;
use neuron_types::ToolContext;

#[tokio::test]
async fn test_tool_via_registry() {
    let mut registry = ToolRegistry::new();
    registry.register(GetWeather);

    let ctx = ToolContext::default();
    let input = serde_json::json!({"city": "London"});

    let output = registry.execute("get_weather", input, &ctx).await.unwrap();
    assert!(!output.is_error);

    // Check structured output
    let text = &output.content[0];
    match text {
        neuron_types::ContentItem::Text(t) => assert!(t.contains("London")),
        _ => panic!("expected text content"),
    }
}

4. Testing Context Strategies

Context strategies are pure functions on message lists. Test them with synthetic data:

use neuron_context::SlidingWindowStrategy;
use neuron_types::{ContextStrategy, Message};

#[tokio::test]
async fn test_sliding_window() {
    let strategy = SlidingWindowStrategy::new(3, 100_000);

    // Create a long conversation
    let messages: Vec<Message> = (0..10)
        .map(|i| Message::user(format!("Message {i}")))
        .collect();

    assert!(strategy.should_compact(&messages, 150_000));

    let compacted = strategy.compact(messages).await.unwrap();
    assert!(compacted.len() <= 3);
}

5. Testing Guardrails

Guardrails are async functions on strings – no provider needed:

use neuron_runtime::{InputGuardrail, GuardrailResult};

#[tokio::test]
async fn test_no_secrets_guardrail() {
    let guardrail = NoSecrets;

    let result = guardrail.check("What is Rust?").await;
    assert!(result.is_pass());

    let result = guardrail.check("My API_KEY is abc123").await;
    assert!(result.is_tripwire());
}

6. Testing the Full Agent Loop

Combine a mock provider with real tools to test the complete agent loop:

use neuron_loop::AgentLoop;
use neuron_tool::ToolRegistry;
use neuron_context::SlidingWindowStrategy;
use neuron_types::*;

#[tokio::test]
async fn test_agent_loop_with_tool_call() {
    // Set up mock provider with two responses:
    // 1. Model calls the echo tool
    // 2. Model produces a final answer
    let provider = MockProvider {
        responses: Mutex::new(vec![
            tool_call_response("echo", "call-1", serde_json::json!({"text": "hello"})),
            end_turn_response("The echo tool returned: hello"),
        ]),
    };

    let mut tools = ToolRegistry::new();
    tools.register(EchoTool);

    let context = SlidingWindowStrategy::new(10, 100_000);

    let mut agent = AgentLoop::builder(provider, context)
        .tools(tools)
        .system_prompt("You are a test agent.")
        .max_turns(5)
        .build();

    let ctx = ToolContext::default();
    let result = agent.run(Message::user("Echo hello"), &ctx).await.unwrap();

    assert_eq!(result.turns, 2);
    assert!(result.response.contains("hello"));
}

7. HTTP-Level Integration Tests with wiremock

For testing actual HTTP request/response mapping without calling the real API, use wiremock to stand up a local mock server:

use wiremock::{Mock, MockServer, ResponseTemplate};
use wiremock::matchers::{method, path};
use neuron_provider_openai::OpenAi;
use neuron_types::*;

#[tokio::test]
async fn test_openai_provider_http() {
    let server = MockServer::start().await;

    // Mock the OpenAI completions endpoint
    Mock::given(method("POST"))
        .and(path("/v1/chat/completions"))
        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
            "id": "chatcmpl-123",
            "model": "gpt-4o",
            "choices": [{
                "index": 0,
                "message": { "role": "assistant", "content": "Hello!" },
                "finish_reason": "stop"
            }],
            "usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 }
        })))
        .mount(&server)
        .await;

    let client = OpenAi::new("test-key").base_url(server.uri());

    let response = client.complete(CompletionRequest {
        model: "gpt-4o".to_string(),
        messages: vec![Message::user("Hi")],
        ..Default::default()
    }).await.unwrap();

    assert_eq!(response.stop_reason, StopReason::EndTurn);
}

This tests the full serialization/deserialization path through the provider implementation without any network calls to OpenAI.

Testing Patterns Summary

What to test	Approach	Needs API key?
Individual tools	Call `tool.call(args, ctx)` directly	No
Tool JSON path	Use `ToolRegistry::execute()`	No
Context strategy	Call `should_compact()` / `compact()` with synthetic messages	No
Guardrails	Call `guardrail.check(text)`	No
Single-turn agent	Mock provider + `AgentLoop::run()`	No
Multi-turn agent	Mock provider with queued responses	No
Provider HTTP mapping	wiremock + real provider	No
End-to-end integration	Real provider + real tools	Yes

Tips

Use ..Default::default() on CompletionRequest, TokenUsage, and ToolContext to avoid breaking tests when new fields are added.
Keep mock providers simple: Mutex<Vec<CompletionResponse>> covers most patterns.
Test ToolError::ModelRetry by returning it from a mock tool – verify the loop converts it to an error tool result and the model gets another chance.
Use StopReason::EndTurn for final responses and StopReason::ToolUse for tool-calling turns in your mock data.

API Docs

Full API documentation:

Observability

neuron provides observability through the ObservabilityHook trait and the neuron-otel crate, which implements OpenTelemetry instrumentation following the GenAI semantic conventions.

The `ObservabilityHook` trait

The ObservabilityHook trait (defined in neuron-types) is the extension point for logging, metrics, and telemetry. Hooks receive events at each step of the agent loop and can observe or control execution.

pub trait ObservabilityHook: Send + Sync {
    fn on_event(&self, event: HookEvent<'_>) -> impl Future<Output = Result<HookAction, HookError>> + Send;
}

See the agent loop guide for details on hook events and the HookAction enum.

`neuron-otel` – OpenTelemetry instrumentation

neuron-otel provides OtelHook, an ObservabilityHook implementation that emits structured tracing spans using the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes).

Quick example

use neuron_otel::OtelHook;
use neuron_loop::AgentLoop;

let agent = AgentLoop::builder(provider, context)
    .tools(registry)
    .hook(OtelHook::default())
    .build();

That’s it. OtelHook emits spans for every LLM call, tool execution, and loop iteration, with attributes following the gen_ai.* namespace.

What gets traced

OtelHook emits spans at each hook event:

Event	Span name	Key attributes
`LoopIteration`	`gen_ai.loop.iteration`	`gen_ai.loop.turn`
`PreLlmCall` / `PostLlmCall`	`gen_ai.chat`	`gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.stop_reason`
`PreToolExecution` / `PostToolExecution`	`gen_ai.execute_tool`	`gen_ai.tool.name`, `gen_ai.tool.is_error`
`ContextCompaction`	`gen_ai.context.compaction`	`gen_ai.context.old_tokens`, `gen_ai.context.new_tokens`

Configuration

OtelHook uses the standard tracing crate subscriber model. Configure your tracing pipeline as usual with tracing-opentelemetry and the OpenTelemetry SDK:

use neuron_otel::OtelHook;
use opentelemetry::trace::TracerProvider;
use tracing_subscriber::prelude::*;

// Set up your OpenTelemetry pipeline (exporter, batch processor, etc.)
let tracer_provider = /* your OTel setup */;

// Install tracing-opentelemetry layer
tracing_subscriber::registry()
    .with(tracing_opentelemetry::layer().with_tracer(
        tracer_provider.tracer("neuron")
    ))
    .init();

// Add the hook to your agent
let agent = AgentLoop::builder(provider, context)
    .hook(OtelHook::default())
    .build();

OtelHook does not configure the OpenTelemetry pipeline itself – it only emits tracing spans. You bring your own exporter (Jaeger, OTLP, Zipkin, etc.) and configure it through the standard OpenTelemetry SDK.

GenAI semantic conventions

The span attributes follow the emerging OpenTelemetry GenAI semantic conventions specification. Key attributes include:

gen_ai.system – the provider system (e.g., "anthropic", "openai")
gen_ai.request.model – the model identifier
gen_ai.usage.input_tokens – input token count
gen_ai.usage.output_tokens – output token count
gen_ai.response.stop_reason – why the model stopped generating
gen_ai.tool.name – the name of the tool being called

Using with `neuron-runtime`’s `TracingHook`

neuron-runtime also ships a TracingHook for basic tracing span emission. OtelHook and TracingHook serve different purposes:

TracingHook – lightweight, emits simple tracing spans for local debugging. No GenAI semantic conventions. Ships with neuron-runtime.
OtelHook – full OpenTelemetry instrumentation with GenAI semantic conventions. Designed for production observability pipelines. Ships with neuron-otel.

You can use both simultaneously – they are independent hooks:

use neuron_otel::OtelHook;
use neuron_runtime::TracingHook;

let agent = AgentLoop::builder(provider, context)
    .hook(TracingHook::default())  // Local debug logging
    .hook(OtelHook::default())     // Production OTel export
    .build();

Installation

Add neuron-otel directly:

[dependencies]
neuron-otel = "*"

Or use the umbrella crate with the otel feature:

[dependencies]
neuron = { features = ["anthropic", "otel"] }

API reference

neuron_otel on docs.rs

Design Decisions

neuron’s architecture reflects a set of deliberate trade-offs. This page explains the key decisions and the reasoning behind them.

“serde, not serde_json”

neuron is a library of building blocks, not a framework.

The serde crate defines the Serialize and Deserialize traits. serde_json implements them for JSON. neuron follows the same pattern: neuron-types defines the Provider, Tool, and ContextStrategy traits. Provider crates (neuron-provider-anthropic, neuron-provider-openai, etc.) implement them.

This means you can pull in a single block – say, neuron-tool for the tool registry and middleware pipeline – without buying into an opinionated agent framework. You compose the blocks yourself, or use a framework built on top.

The scope test: If removing a feature forces every user to reimplement 200+ lines of non-trivial code (type erasure, middleware chaining, protocol handling), it belongs in neuron. If removing it forces 20-50 lines of straightforward composition, it belongs in an SDK layer above.

Block decomposition: one crate, one concern

Each crate owns exactly one concern:

Crate	Concern
`neuron-types`	Types and trait definitions (zero logic)
`neuron-provider-anthropic`	Anthropic API implementation
`neuron-provider-openai`	OpenAI API implementation
`neuron-provider-ollama`	Ollama (local models) implementation
`neuron-tool`	Tool registry, type erasure, middleware
`neuron-mcp`	MCP protocol bridge (wraps rmcp)
`neuron-context`	Context compaction strategies
`neuron-loop`	The agentic while-loop
`neuron-runtime`	Sessions, guardrails, durability
`neuron`	Umbrella re-export

Crates depend only on neuron-types and the crates directly below them in the dependency graph. No circular dependencies. Adding a new provider never touches the tool system. Adding a new compaction strategy never touches the loop.

Provider-per-crate (the serde pattern)

The Provider trait lives in neuron-types. Each cloud API gets its own crate:

// neuron-types/src/traits.rs
pub trait Provider: Send + Sync {
    fn complete(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<CompletionResponse, ProviderError>> + Send;

    fn complete_stream(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<StreamHandle, ProviderError>> + Send;
}

The trait is intentionally not object-safe (it uses RPITIT). You compose with generics (fn run<P: Provider>(provider: &P)), which gives the compiler full visibility for optimization.

Why not a single provider crate with feature flags? Because provider APIs evolve independently. An Anthropic-specific feature (prompt caching, extended thinking) should not force a recompile of OpenAI code. Separate crates give you separate version timelines.

Message structure: flat struct over variant-per-role

neuron uses a flat Message struct:

pub struct Message {
    pub role: Role,
    pub content: Vec<ContentBlock>,
}

The alternative – one enum variant per role (UserMessage, AssistantMessage, SystemMessage) – creates a combinatorial explosion of conversion code. Rig uses the variant-per-role approach and needs roughly 300 lines of conversion logic per provider. The flat struct maps naturally to every provider API we studied (Anthropic, OpenAI, Ollama) with minimal translation.

Tool middleware: axum’s `from_fn`, not tower’s Service/Layer

The tool middleware pipeline uses a callback-based pattern identical to axum’s middleware::from_fn:

async fn logging_middleware(
    tool_name: &str,
    input: serde_json::Value,
    ctx: &ToolContext,
    next: ToolMiddlewareNext<'_>,
) -> Result<ToolOutput, ToolError> {
    println!("calling {tool_name}");
    let result = next.run(tool_name, input, ctx).await;
    println!("result: {result:?}");
    result
}

tower’s Service and Layer traits are designed for high-throughput request/response pipelines where the overhead of trait objects and Pin<Box<...>> matters. Tool calls happen at most a few times per LLM turn. The axum-style callback is simpler to write, simpler to read, and validated by the tokio team for exactly this kind of middleware.

DurableContext wraps side effects, not just observes them

Early designs had a single DurabilityHook that observed LLM calls and tool executions. This fails for Temporal replay: an observation hook cannot prevent a side effect from re-executing during replay.

The solution is DurableContext, which wraps side effects:

pub trait DurableContext: Send + Sync {
    fn execute_llm_call(
        &self,
        request: CompletionRequest,
        options: ActivityOptions,
    ) -> impl Future<Output = Result<CompletionResponse, DurableError>> + Send;

    fn execute_tool(
        &self,
        tool_name: &str,
        input: serde_json::Value,
        ctx: &ToolContext,
        options: ActivityOptions,
    ) -> impl Future<Output = Result<ToolOutput, DurableError>> + Send;
}

When a DurableContext is present, the agentic loop calls through it instead of directly calling the provider or tools. The durable engine (Temporal, Restate, Inngest) can journal the result, and on replay, return the journaled result without re-executing the side effect.

A separate ObservabilityHook trait handles logging, metrics, and telemetry. It returns HookAction (Continue, Skip, or Terminate) but does not wrap execution.

RPITIT native async traits

neuron uses Rust 2024 edition with native impl Future return types in traits (RPITIT). There is no #[async_trait] anywhere in the codebase:

pub trait Provider: Send + Sync {
    fn complete(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<CompletionResponse, ProviderError>> + Send;
}

This avoids the heap allocation that #[async_trait] forces (one Box::pin per call). The trade-off is that these traits are not object-safe – you must use generics, not dyn Provider. For type-erased dispatch, neuron provides ToolDyn with an explicit Box::pin at the erasure boundary only.

ToolError::ModelRetry for self-correction

Adopted from Pydantic AI’s pattern, ModelRetry lets a tool tell the model to try again with different arguments:

pub enum ToolError {
    NotFound(String),
    InvalidInput(String),
    ExecutionFailed(Box<dyn std::error::Error + Send + Sync>),
    PermissionDenied(String),
    Cancelled,
    ModelRetry(String),  // <-- hint for the model
}

When a tool returns ModelRetry("date must be in YYYY-MM-DD format"), the loop does not propagate this as an error. Instead, it converts the hint into an error tool result and sends it back to the model. The model sees the hint, adjusts its arguments, and calls the tool again.

This keeps self-correction logic out of the tool implementation. The tool just says “try again, here’s why” and the loop handles the retry protocol.

Server-side context compaction

The Anthropic API supports server-side context management: the client sends a context_management field, and the server may respond with StopReason::Compaction plus a ContentBlock::Compaction summary.

neuron models this with dedicated types:

pub struct ContextManagement {
    pub edits: Vec<ContextEdit>,
}

pub enum ContextEdit {
    Compact { strategy: String },
}

pub enum StopReason {
    EndTurn,
    ToolUse,
    MaxTokens,
    StopSequence,
    ContentFilter,
    Compaction,  // <-- server compacted context
}

pub enum ContentBlock {
    // ...
    Compaction { content: String },
}

When the loop receives StopReason::Compaction, it continues automatically – the server has already compacted the context, and the response contains the compaction summary. Token usage during compaction is tracked per-iteration via UsageIteration.

This is distinct from client-side compaction (the ContextStrategy trait), which the loop manages locally. Both can coexist: the provider handles server-side compaction transparently, while the context strategy handles client-side compaction when needed.

Dependency Graph

neuron’s crates form a strict upward-pointing dependency tree. Every arrow points toward the foundation (neuron-types), never downward. There are no circular dependencies.

The graph

neuron-types                    (zero deps, the foundation)
neuron-tool-macros              (zero deps, proc macro)
    ^
    |-- neuron-provider-*       (each implements Provider trait)
    |-- neuron-otel             (OTel instrumentation, GenAI semantic conventions)
    |-- neuron-context          (compaction strategies, token counting)
    +-- neuron-tool             (Tool trait, registry, middleware; optional dep on neuron-tool-macros)
            ^
            |-- neuron-mcp      (wraps rmcp, bridges to Tool trait)
            |-- neuron-loop     (provider loop with tool dispatch)
            +-- neuron-runtime  (sessions, DurableContext, guardrails, sandbox)
                    ^
                neuron          (umbrella re-export)
                    ^
                YOUR PROJECT    (SDK, CLI, TUI, GUI)

Layer by layer

neuron-types (foundation)

Zero dependencies on other neuron crates. Contains all types and trait definitions:

Types: Message, CompletionRequest, CompletionResponse, TokenUsage, ToolDefinition, ToolOutput, ContentBlock, StopReason
Traits: Provider, EmbeddingProvider, Tool, ToolDyn, ContextStrategy, ObservabilityHook, DurableContext, PermissionPolicy
Errors: ProviderError, ToolError, LoopError, ContextError, DurableError, HookError, McpError, EmbeddingError, StorageError, SandboxError

Every other crate depends on neuron-types. Nothing else.

Provider crates (leaf nodes)

Each provider crate implements the Provider trait for one API:

Crate	Provider
`neuron-provider-anthropic`	Anthropic Messages API
`neuron-provider-openai`	OpenAI Chat Completions / Responses API
`neuron-provider-ollama`	Ollama local inference

Provider crates depend only on neuron-types (plus their HTTP client and API-specific serialization). They never depend on each other or on higher-level neuron crates.

Adding a new provider means creating a new crate that implements Provider. No existing code changes.

neuron-otel (leaf node)

Implements the ObservabilityHook trait using OpenTelemetry tracing spans with gen_ai.* GenAI semantic conventions. Emits structured spans for LLM calls, tool executions, and loop iterations following the emerging OpenTelemetry GenAI semantic conventions specification.

Depends only on neuron-types (plus tracing and opentelemetry for span emission). Like provider crates, it is a leaf node with no knowledge of other neuron crates.

neuron-tool-macros (leaf node)

Proc macro crate providing #[neuron_tool] for deriving Tool implementations from annotated async functions. Zero workspace dependencies.

neuron-tool (leaf node)

Implements the tool system:

ToolRegistry – stores Arc<dyn ToolDyn> for dynamic dispatch
Tool middleware pipeline (axum-style from_fn)
Type erasure via the ToolDyn blanket impl

Depends on neuron-types and optionally on neuron-tool-macros (via macros feature flag).

neuron-mcp

Wraps the rmcp crate (the official Rust MCP SDK) and bridges MCP tools into neuron’s ToolDyn trait. Depends on neuron-types, neuron-tool, and rmcp.

neuron-context (leaf node)

Implements ContextStrategy for client-side context compaction. Some strategies (like summarization) optionally use a Provider for LLM calls, but the dependency is on the trait, not on any concrete provider crate.

neuron-loop

The agentic while-loop that composes a provider and tool registry. This is the ~300-line commodity loop that every agent framework converges on. It depends on:

neuron-types (for trait definitions)
neuron-tool (for ToolRegistry)

The loop is generic over <P: Provider, C: ContextStrategy> and accepts a ToolRegistry. neuron-context is a dev-dependency only (for tests).

neuron-runtime

Adds cross-cutting runtime concerns:

Sessions – persistent conversation state via StorageError-aware backends
DurableContext – wraps side effects for Temporal/Restate replay
ObservabilityHook – logging, metrics, telemetry
Guardrails – input/output validation
PermissionPolicy – tool call authorization
Sandbox – isolated tool execution environments

Depends on neuron-types and neuron-tool. neuron-loop and neuron-context are dev-dependencies only (for tests).

neuron (umbrella)

Re-exports public items from all crates under a single neuron dependency. Feature flags control which provider crates are included:

[dependencies]
neuron = { version = "0.2", features = ["anthropic", "openai"] }

Design rules

Arrows only point up. A crate at layer N may depend on crates at layer N-1 or below, never at layer N or above. This is enforced by Cargo.toml dependencies – circular dependencies are a compile error in Rust.

Each block knows only about neuron-types and the blocks it directly depends on. neuron-tool has no idea that neuron-loop exists. neuron-provider-anthropic has no idea that neuron-runtime exists. This means you can use any block independently.

Provider crates are fully independent. Provider crates do not depend on the tool crate, the MCP crate, or each other. neuron-mcp, neuron-loop, and neuron-runtime share a dependency on neuron-tool but are independent of each other.

Practical implications

Using just the tool system:

[dependencies]
neuron-types = "0.2"
neuron-tool = "0.2"

Using just a provider for raw LLM calls:

[dependencies]
neuron-types = "0.2"
neuron-provider-anthropic = "0.2"

Using the full stack:

[dependencies]
neuron = { version = "0.2", features = ["anthropic", "openai", "mcp"] }

The dependency graph ensures that pulling in one block never forces you to compile unrelated blocks.

Comparison with Other Frameworks

neuron takes a different approach from most agent frameworks. This page compares its architecture with other popular options in the Rust and Python ecosystems.

Honest note: neuron is at an early stage. This comparison focuses on architectural differences, not feature completeness. Where other frameworks have more mature implementations, we say so.

Summary matrix

	neuron (Rust)	Rig (Rust)	ADK-Rust (Google)	OpenAI Agents SDK (Python)	Pydantic AI (Python)
Architecture	Independent crates (building blocks)	Monolithic library	Multi-crate with DAG engine	Single package	Single package
Provider abstraction	Trait in types crate, impl per crate	Trait + built-in impls	Google-focused, extensible	OpenAI-only	Multi-provider
Tool system	Typed trait + type erasure + middleware	Typed trait, no middleware	Typed with annotations	Function decorators	Function decorators with typed args
Middleware	axum-style `from_fn` pipeline	None	None	Hooks	None
Usage limits	`UsageLimits` (tokens, requests, tool calls)	None	None	None	`UsageLimits` (tokens, requests)
Tool timeouts	`TimeoutMiddleware` (per-tool configurable)	None	None	None	None
Context management	Client-side + server-side compaction	Manual	Built-in	Built-in	Manual
Durable execution	`DurableContext` trait (Temporal/Restate)	None	None	None	None
Async model	RPITIT (native, no alloc)	`#[async_trait]` (boxed)	`#[async_trait]` (boxed)	Python async	Python async
OpenTelemetry	`neuron-otel` with GenAI semantic conventions	None	None	Built-in tracing	None
MCP support	Via `neuron-mcp` (wraps rmcp)	Community	Limited	Built-in	Limited
Graph/DAG	Not included (SDK layer)	Not included	LangGraph port	Not included	Not included
Maturity	Early	Established	Early	Established	Established

Detailed comparisons

Rig (Rust)

Rig is the most established Rust agent framework. It provides a solid multi-provider abstraction and a typed tool system.

Where Rig excels:

Mature ecosystem with multiple provider implementations
Good documentation and examples
Proven in production use cases

Where neuron differs:

Crate independence. Rig is a monolithic library – you depend on rig-core and get everything. neuron lets you pull in just the tool system, or just a provider, without the rest.
Message model. Rig uses a variant-per-role enum (UserMessage, AssistantMessage), which requires roughly 300 lines of conversion code per provider. neuron uses a flat Message { role, content } struct that maps directly to every API.
Tool middleware. Rig has no middleware pipeline. Adding logging, rate limiting, or permission checks requires wrapping each tool individually. neuron’s middleware pipeline applies cross-cutting concerns to all tools.
Async model. Rig uses #[async_trait], which heap-allocates on every call. neuron uses RPITIT (native Rust 2024 async traits) with zero overhead for non-erased dispatch.

ADK-Rust (Google’s Agent Development Kit)

Google’s ADK-Rust is a multi-crate Rust framework that includes a port of LangGraph’s DAG execution engine.

Where ADK-Rust excels:

Comprehensive multi-crate architecture
Built-in DAG/graph orchestration for complex workflows
Strong Google Cloud integration

Where neuron differs:

No graph layer. ADK-Rust’s LangGraph port is its most complex component and, based on community feedback, its least-used. neuron deliberately omits graph orchestration – most agent use cases are sequential loops, not DAGs.
Block independence. ADK-Rust’s crates have tighter coupling than neuron’s. neuron’s leaf crates (providers, tools, MCP) have zero knowledge of each other.
Durable execution. neuron’s DurableContext trait is designed specifically for Temporal/Restate integration. ADK-Rust does not have a durability abstraction.

OpenAI Agents SDK (Python)

The OpenAI Agents SDK provides a clean Python API for building agents with strong support for handoff protocols between agents.

Where Agents SDK excels:

Elegant handoff protocol for multi-agent systems
Built-in MCP support
Well-documented, easy to get started
Built-in tracing

Where neuron differs:

Language. neuron is Rust, giving you compile-time safety, zero-cost abstractions, and predictable performance. The Agents SDK is Python-only.
Provider lock-in. The Agents SDK is designed for OpenAI’s API. neuron’s Provider trait is provider-agnostic from the foundation.
Building blocks vs. framework. The Agents SDK is an opinionated framework with a specific agent lifecycle model. neuron gives you the pieces to build your own lifecycle.

Pydantic AI (Python)

Pydantic AI brings typed tool arguments and structured output validation to Python agents. neuron adopted its ModelRetry self-correction pattern.

Where Pydantic AI excels:

Typed tool arguments with runtime validation (Pydantic models)
Multi-provider support
Clean API for structured output
The ModelRetry pattern for tool self-correction

Where neuron differs:

Compile-time types. Pydantic validates at runtime. neuron’s Tool trait uses schemars::JsonSchema for schema generation and serde::Deserialize for deserialization, both checked at compile time.
ModelRetry adoption. neuron’s ToolError::ModelRetry(String) is directly inspired by Pydantic AI. When a tool returns ModelRetry, the hint is converted to an error tool result so the model can self-correct.
UsageLimits adoption. neuron’s UsageLimits is inspired by Pydantic AI’s budget enforcement, extended with tool call limits.
Middleware. Pydantic AI has no tool middleware pipeline. neuron provides TimeoutMiddleware, StructuredOutputValidator, and RetryLimitedValidator as composable middleware.

What neuron does not do

Being honest about scope:

neuron is not a framework. It does not give you a run_agent() function that handles everything. You compose the blocks.
neuron does not include a CLI, TUI, or GUI. Those are built on top of the blocks.
neuron does not include RAG pipelines. Retrieval is a tool or context strategy implementation, not a core block.
neuron does not include sub-agent orchestration. Multi-agent handoff is straightforward composition of AgentLoop + ToolRegistry and belongs in an SDK layer.

Choosing the right tool

If you want a batteries-included Rust agent framework today: Rig is more mature and has a larger ecosystem.
If you want composable building blocks you can adopt incrementally: neuron lets you use exactly the pieces you need.
If you need durable execution (Temporal/Restate): neuron is the only Rust option with a dedicated DurableContext trait.
If you work primarily in Python: Pydantic AI and the OpenAI Agents SDK are excellent choices with larger communities.
If you need DAG/graph orchestration: ADK-Rust includes a LangGraph port. neuron does not include a graph layer by design.

Error Handling

All neuron error types live in neuron-types and use thiserror for derivation. This page documents every error enum, its variants, and how to handle them.

Error hierarchy

LoopError                       (top-level, from the agentic loop)
    |-- ProviderError           (LLM provider failures)
    |-- ToolError               (tool execution failures)
    +-- ContextError            (context compaction failures)
            +-- ProviderError   (when summarization fails)

DurableError                    (durable execution failures)
HookError                       (observability hook failures)
McpError                        (MCP protocol failures)
EmbeddingError                  (embedding provider failures)
StorageError                    (session storage failures)
SandboxError                    (sandbox execution failures)

LoopError is the primary error type you encounter when running the agentic loop. It wraps ProviderError, ToolError, and ContextError via From implementations, so ? propagation works naturally.

The remaining error types (DurableError, HookError, McpError, EmbeddingError, StorageError, SandboxError) are standalone – they appear in their respective subsystems and do not nest under LoopError.

ProviderError

Errors from LLM provider operations (completions and streaming).

pub enum ProviderError {
    // --- Retryable ---
    Network(Box<dyn std::error::Error + Send + Sync>),
    RateLimit { retry_after: Option<Duration> },
    ModelLoading(String),
    Timeout(Duration),
    ServiceUnavailable(String),

    // --- Terminal ---
    Authentication(String),
    InvalidRequest(String),
    ModelNotFound(String),
    InsufficientResources(String),

    // --- Other ---
    StreamError(String),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variants

Variant	Description	Retryable?
`Network`	Connection reset, DNS failure, TLS error. Wraps the underlying transport error.	Yes
`RateLimit`	Provider returned 429. `retry_after` contains the suggested delay if the API provided one.	Yes
`ModelLoading`	Model is cold-starting (common with Ollama and serverless endpoints).	Yes
`Timeout`	Request exceeded the configured timeout. Contains the duration that elapsed.	Yes
`ServiceUnavailable`	Provider returned 503 or equivalent.	Yes
`Authentication`	Invalid API key, expired token, or insufficient permissions (401/403).	No
`InvalidRequest`	Malformed request: bad parameters, unsupported model configuration, schema violations.	No
`ModelNotFound`	The requested model identifier does not exist on this provider.	No
`InsufficientResources`	Quota exceeded or billing limit reached. Distinct from rate limiting.	No
`StreamError`	Error during SSE streaming after the connection was established.	No
`Other`	Catch-all for provider-specific errors that do not fit other variants.	No

is_retryable()

impl ProviderError {
    pub fn is_retryable(&self) -> bool {
        matches!(
            self,
            Self::Network(_)
                | Self::RateLimit { .. }
                | Self::ModelLoading(_)
                | Self::Timeout(_)
                | Self::ServiceUnavailable(_)
        )
    }
}

Use is_retryable() to decide whether to retry a failed request. neuron does not include built-in retry logic – use tower::retry, a durable engine’s retry policy, or a simple loop:

let mut attempts = 0;
let response = loop {
    match provider.complete(request.clone()).await {
        Ok(resp) => break resp,
        Err(e) if e.is_retryable() && attempts < 3 => {
            attempts += 1;
            tokio::time::sleep(Duration::from_secs(1 << attempts)).await;
        }
        Err(e) => return Err(e),
    }
};

EmbeddingError

Errors from embedding provider operations.

pub enum EmbeddingError {
    Authentication(String),
    RateLimit { retry_after: Option<Duration> },
    InvalidRequest(String),
    Network(Box<dyn std::error::Error + Send + Sync>),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variants

Variant	Description	Retryable?
`Authentication`	Invalid API key or expired token.	No
`RateLimit`	Provider returned 429.	Yes
`InvalidRequest`	Bad input (e.g., empty input array, unsupported model).	No
`Network`	Connection-level failure.	Yes
`Other`	Catch-all.	No

is_retryable()

impl EmbeddingError {
    pub fn is_retryable(&self) -> bool {
        matches!(self, Self::RateLimit { .. } | Self::Network(_))
    }
}

ToolError

Errors from tool operations (registration, validation, execution).

pub enum ToolError {
    NotFound(String),
    InvalidInput(String),
    ExecutionFailed(Box<dyn std::error::Error + Send + Sync>),
    PermissionDenied(String),
    Cancelled,
    ModelRetry(String),
}

Variants

Variant	Description
`NotFound`	The tool name in the model’s `ToolUse` block does not match any registered tool.
`InvalidInput`	The JSON arguments failed deserialization into the tool’s `Args` type.
`ExecutionFailed`	The tool ran but returned an error. Wraps the tool’s specific error type.
`PermissionDenied`	The `PermissionPolicy` denied this tool call.
`Cancelled`	The tool execution was cancelled via the `CancellationToken` in `ToolContext`.
`ModelRetry`	The tool is requesting the model to retry with different arguments.

ModelRetry: the self-correction pattern

ModelRetry is special. It does not propagate as an error to the caller. Instead, the agentic loop intercepts it and converts the hint string into an error tool result that is sent back to the model:

use neuron_types::ToolError;

// Inside a tool implementation:
fn validate_date(input: &str) -> Result<(), ToolError> {
    if !input.contains('-') {
        return Err(ToolError::ModelRetry(
            "Date must be in YYYY-MM-DD format, e.g. 2025-01-15".into()
        ));
    }
    Ok(())
}

The model sees the hint as a tool result with is_error: true and can adjust its next tool call accordingly. This keeps self-correction logic simple: the tool says what went wrong, and the loop handles the retry protocol.

LoopError

The top-level error type returned by the agentic loop.

pub enum LoopError {
    Provider(ProviderError),
    Tool(ToolError),
    Context(ContextError),
    MaxTurns(usize),
    UsageLimitExceeded(String),
    HookTerminated(String),
    Cancelled,
}

Variants

Variant	Description
`Provider`	An LLM call failed. Check `is_retryable()` on the inner `ProviderError`.
`Tool`	A tool call failed (excluding `ModelRetry`, which is handled internally).
`Context`	Context compaction failed.
`MaxTurns`	The loop hit the configured turn limit. Contains the limit value.
`UsageLimitExceeded`	A token, request, or tool call budget was exceeded. Contains a descriptive message (e.g., `"output token limit exceeded: 50123 > 50000"`).
`HookTerminated`	An `ObservabilityHook` returned `HookAction::Terminate`. Contains the reason.
`Cancelled`	The loop’s cancellation token was triggered.

From implementations

LoopError implements From<ProviderError>, From<ToolError>, and From<ContextError>, so you can use ? to propagate errors from any of these subsystems:

use neuron_types::{LoopError, ProviderError};

fn example() -> Result<(), LoopError> {
    let provider_result: Result<_, ProviderError> = Err(
        ProviderError::Authentication("invalid key".into())
    );
    provider_result?; // Automatically converted to LoopError::Provider
    Ok(())
}

Handling LoopError

use neuron_types::LoopError;

match loop_result {
    Ok(response) => { /* success */ }
    Err(LoopError::Provider(e)) if e.is_retryable() => {
        // Transient provider failure -- retry the whole loop or
        // let a durable engine handle it.
    }
    Err(LoopError::Provider(e)) => {
        // Terminal provider failure -- fix config and retry.
        eprintln!("Provider error: {e}");
    }
    Err(LoopError::MaxTurns(limit)) => {
        // The agent ran for too many turns without completing.
        eprintln!("Hit {limit} turn limit");
    }
    Err(LoopError::UsageLimitExceeded(msg)) => {
        // A token, request, or tool call budget was exceeded.
        eprintln!("Usage limit: {msg}");
    }
    Err(LoopError::HookTerminated(reason)) => {
        // A guardrail or hook stopped the loop.
        eprintln!("Terminated: {reason}");
    }
    Err(LoopError::Cancelled) => {
        // Graceful shutdown via cancellation token.
    }
    Err(e) => {
        eprintln!("Loop error: {e}");
    }
}

ContextError

Errors from context management operations.

pub enum ContextError {
    CompactionFailed(String),
    Provider(ProviderError),
}

Variant	Description
`CompactionFailed`	The compaction strategy itself failed (e.g., produced invalid output).
`Provider`	A provider call during summarization-based compaction failed. Wraps `ProviderError`, so you can check `is_retryable()` on the inner error.

DurableError

Errors from durable execution operations (Temporal, Restate, Inngest).

pub enum DurableError {
    ActivityFailed(String),
    Cancelled,
    SignalTimeout,
    ContinueAsNew(String),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variant	Description
`ActivityFailed`	A durable activity (LLM call or tool execution) failed after exhausting retries.
`Cancelled`	The workflow was cancelled externally.
`SignalTimeout`	`wait_for_signal()` timed out waiting for an external signal.
`ContinueAsNew`	The workflow needs to continue as a new execution to avoid history bloat.
`Other`	Catch-all for engine-specific errors.

McpError

Errors from MCP (Model Context Protocol) operations.

pub enum McpError {
    Connection(String),
    Initialization(String),
    ToolCall(String),
    Transport(String),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variant	Description
`Connection`	Failed to connect to the MCP server.
`Initialization`	The MCP handshake (`initialize` / `initialized`) failed.
`ToolCall`	An MCP `tools/call` request failed.
`Transport`	Transport-level error (stdio pipe broken, HTTP connection dropped).
`Other`	Catch-all.

HookError

Errors from observability hooks.

pub enum HookError {
    Failed(String),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variant	Description
`Failed`	The hook encountered an error during execution.
`Other`	Catch-all for hook-specific errors.

Hook errors do not stop the loop by default. The loop logs them and continues. To stop the loop from a hook, return HookAction::Terminate instead of returning an error.

StorageError

Errors from session storage operations.

pub enum StorageError {
    NotFound(String),
    Serialization(String),
    Io(std::io::Error),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variant	Description
`NotFound`	The requested session does not exist in storage.
`Serialization`	Failed to serialize or deserialize session data.
`Io`	Filesystem I/O error (for file-based storage backends).
`Other`	Catch-all for backend-specific errors.

SandboxError

Errors from sandbox operations (isolated tool execution environments).

pub enum SandboxError {
    ExecutionFailed(String),
    SetupFailed(String),
    Other(Box<dyn std::error::Error + Send + Sync>),
}

Variant	Description
`ExecutionFailed`	Tool execution failed within the sandbox.
`SetupFailed`	Sandbox creation or teardown failed.
`Other`	Catch-all.

Design principles

Two levels max. Error enums are at most two levels deep. LoopError::Context wraps ContextError, which wraps ProviderError. There is no deeper nesting. This keeps match arms readable.

thiserror everywhere. Every error enum derives thiserror::Error. Display messages are concise and include the variant’s data. Source errors are linked with #[source] or #[from] for proper error chain reporting.

Retryable classification at the source. ProviderError and EmbeddingError provide is_retryable() because they know which failures are transient. Callers do not need to pattern-match on specific variants to decide whether to retry.

No built-in retry. neuron exposes is_retryable() but does not include retry middleware. Use tower::retry, a durable engine’s retry policy, or write a simple loop. Retry logic is inherently policy-specific (backoff strategy, max attempts, circuit breaking) and belongs in the application layer.

ModelRetry is not an error. Despite living in ToolError, ModelRetry is a control flow signal, not a failure. The loop intercepts it before it reaches the caller. If you handle ToolError directly (outside the loop), treat ModelRetry as a hint to feed back to the model, not as an error to log.

Keyboard shortcuts

neuron — Building Blocks for AI Agents in Rust