Projects
Open source infrastructure and tooling built at tiny compute research.
Parameter-Efficient Continuous Vision-Language-Action Model
Extremely parameter-efficient VLA model that uses continuous Flow Matching ODE solvers for robotic control instead of discrete action tokens. LoRA fine-tunes a 2.6B Gemma backbone on a single 16GB T4 — without catastrophic forgetting of VLM capabilities.
Minimal Multi-Mode Parallel LLM Inference Server
Lightweight LLM inference server with pipeline parallelism, tensor parallelism, hybrid TP+PP, and a dual-backend KV cache that auto-selects FlashInfer or contiguous SDPA based on GPU architecture.
AgentGuard
research-previewLocal Sidecar for Real-Time AI Agent Security
Local sidecar for real-time AI agent security. Monitors agent tool calls via Mamba-2 SSM, detects prompt injection, and blocks attacks before they execute.