← home

Projects

Open source infrastructure and tooling built at tiny compute research.

NanoSLG

v0.5research-preview

Minimal Multi-Mode Parallel LLM Inference Server

Lightweight LLM inference server with pipeline parallelism, tensor parallelism, hybrid TP+PP, and a dual-backend KV cache that auto-selects FlashInfer or contiguous SDPA based on GPU architecture.

Categoryinference
StackPython 3.10+, PyTorch 2.0+, CUDA / NCCL, FlashInfer
Single request throughput21.8 tok/s
Batch x4 throughput76.0 tok/s
TTFT (single)52ms
githubdetails →

AgentGuard

research-preview

Local Sidecar for Real-Time AI Agent Security

Local sidecar for real-time AI agent security. Monitors agent tool calls via Mamba-2 SSM, detects prompt injection, and blocks attacks before they execute.

Categorysecurity
StackPython, PyTorch, Transformers, Mamba-2
Classify latency~187ms
Memory complexityO(1)
Model size2.8B
githubdetails →