AgentGuard
Local Sidecar for Real-Time AI Agent Security
AgentGuard is a local security sidecar that monitors AI agent sessions in real-time. It ingests agent events (tool calls, tool results, LLM outputs), builds structured trajectories, and runs them through a 2.8B parameter Mamba-2 model to detect prompt injection, exfiltration, and tool-call hijacking. In guardian mode, it actively blocks malicious tool calls before execution. The system is fail-open: if the sidecar is unreachable or times out, tool calls proceed normally. Designed for local-first AI tools where the user is always trusted and threats originate from tool results.
Features
Real-time event ingestion
Fire-and-forget POST /events endpoint ingests user messages, tool calls, tool results, and LLM outputs into per-session trajectories.
Chain-of-thought security verdicts
POST /classify generates chain-of-thought reasoning tracing user intent through tool calls, outputting BENIGN/THREAT verdicts with confidence scores, threat type, severity, and recommended action.
Guardian mode (active blocking)
When enabled, the OpenClaw plugin calls /classify before every tool call and acts on the verdict — KILL terminates the session, BLOCK prevents the call, ALERT logs a warning.
Fail-open design
If the sidecar is unreachable or times out (configurable, default 2000ms), tool calls proceed normally. Security monitoring never blocks legitimate work.
OpenClaw plugin
Drop-in OpenClaw extension that forwards all agent events to the sidecar. Configurable passive (log-only) or guardian (active blocking) modes.
Trajectory logging
All agent trajectories and security verdicts are logged as JSONL to a configurable output directory for audit and analysis.
Performance
Architecture
OpenClaw Agent sends events to the agentguard-collector plugin, which forwards them via fire-and-forget POST /events to the sidecar. In guardian mode, the plugin issues a synchronous POST /classify before each tool call and acts on the verdict (KILL/BLOCK/ALERT).
The sidecar (FastAPI on port 7437) maintains per-session trajectories and runs them through the AgentGuard 2.8B Mamba-2 model. The model generates chain-of-thought security reasoning followed by a structured BENIGN/THREAT verdict.
Stack
Quick start
git clone https://github.com/Guney-olu/agentguard
cd agentguard
pip install torch>=2.0 transformers safetensors fastapi uvicorn requests
# Tensor Parallel - 2 GPUs
python -m agentguard --model /path/to/model --mode tensor --tp-size 2