AgentGuard

Local Sidecar for Real-Time AI Agent Security

research-previewsecurity

AgentGuard is a local security sidecar that monitors AI agent sessions in real-time. It ingests agent events (tool calls, tool results, LLM outputs), builds structured trajectories, and runs them through a 2.8B parameter Mamba-2 model to detect prompt injection, exfiltration, and tool-call hijacking. In guardian mode, it actively blocks malicious tool calls before execution. The system is fail-open: if the sidecar is unreachable or times out, tool calls proceed normally. Designed for local-first AI tools where the user is always trusted and threats originate from tool results.

github

Features

Real-time event ingestion

Fire-and-forget POST /events endpoint ingests user messages, tool calls, tool results, and LLM outputs into per-session trajectories.

Chain-of-thought security verdicts

POST /classify generates chain-of-thought reasoning tracing user intent through tool calls, outputting BENIGN/THREAT verdicts with confidence scores, threat type, severity, and recommended action.

Guardian mode (active blocking)

When enabled, the OpenClaw plugin calls /classify before every tool call and acts on the verdict — KILL terminates the session, BLOCK prevents the call, ALERT logs a warning.

Fail-open design

If the sidecar is unreachable or times out (configurable, default 2000ms), tool calls proceed normally. Security monitoring never blocks legitimate work.

OpenClaw plugin

Drop-in OpenClaw extension that forwards all agent events to the sidecar. Configurable passive (log-only) or guardian (active blocking) modes.

Trajectory logging

All agent trajectories and security verdicts are logged as JSONL to a configurable output directory for audit and analysis.

Performance

Classify latencyEnd-to-end verdict generation including model inference~187ms

Memory complexityMamba-2 state recurrence — no KV cache explosionO(1)

Model sizeMamba-2 SSM, runs on consumer GPUs2.8B

Architecture

OpenClaw Agent sends events to the agentguard-collector plugin, which forwards them via fire-and-forget POST /events to the sidecar. In guardian mode, the plugin issues a synchronous POST /classify before each tool call and acts on the verdict (KILL/BLOCK/ALERT).

The sidecar (FastAPI on port 7437) maintains per-session trajectories and runs them through the AgentGuard 2.8B Mamba-2 model. The model generates chain-of-thought security reasoning followed by a structured BENIGN/THREAT verdict.

Stack

PythonPyTorchTransformersMamba-2FastAPIOpenClaw

Quick start

git clone https://github.com/Guney-olu/agentguard
cd agentguard
pip install torch>=2.0 transformers safetensors fastapi uvicorn requests

# Tensor Parallel - 2 GPUs
python -m agentguard --model /path/to/model --mode tensor --tp-size 2

← all projects home