← home

Projects

Open source infrastructure and tooling built at tiny compute research.

NanoVLA-Flow

v1.0experimental

Parameter-Efficient Continuous Vision-Language-Action Model

Extremely parameter-efficient VLA model that uses continuous Flow Matching ODE solvers for robotic control instead of discrete action tokens. LoRA fine-tunes a 2.6B Gemma backbone on a single 16GB T4 — without catastrophic forgetting of VLM capabilities.

Categoryrobotics
StackPython, PyTorch, Transformers, Gemma-4-E2B
A-OKVQA (MC Accuracy)55.11%
A-OKVQA (Direct Answer)19.33%
Flow Trajectory MSE0.156
githubdetails →

NanoSLG

v0.5research-preview

Minimal Multi-Mode Parallel LLM Inference Server

Lightweight LLM inference server with pipeline parallelism, tensor parallelism, hybrid TP+PP, and a dual-backend KV cache that auto-selects FlashInfer or contiguous SDPA based on GPU architecture.

Categoryinference
StackPython 3.10+, PyTorch 2.0+, CUDA / NCCL, FlashInfer
Single request throughput21.8 tok/s
Batch x4 throughput76.0 tok/s
TTFT (single)52ms
githubdetails →

AgentGuard

research-preview

Local Sidecar for Real-Time AI Agent Security

Local sidecar for real-time AI agent security. Monitors agent tool calls via Mamba-2 SSM, detects prompt injection, and blocks attacks before they execute.

Categorysecurity
StackPython, PyTorch, Transformers, Mamba-2
Classify latency~187ms
Memory complexityO(1)
Model size2.8B
githubdetails →