Projects

Open source infrastructure and tooling built at tiny compute research.

NanoVLA-Flow

v1.0experimental

Parameter-Efficient Continuous Vision-Language-Action Model

Extremely parameter-efficient VLA model that uses continuous Flow Matching ODE solvers for robotic control instead of discrete action tokens. LoRA fine-tunes a 2.6B Gemma backbone on a single 16GB T4 — without catastrophic forgetting of VLM capabilities.

Categoryrobotics

StackPython, PyTorch, Transformers, Gemma-4-E2B

A-OKVQA (MC Accuracy)55.11%

A-OKVQA (Direct Answer)19.33%

Flow Trajectory MSE0.156

github details →

NanoSLG

v0.5research-preview

Minimal Multi-Mode Parallel LLM Inference Server

Lightweight LLM inference server with pipeline parallelism, tensor parallelism, hybrid TP+PP, and a dual-backend KV cache that auto-selects FlashInfer or contiguous SDPA based on GPU architecture.

Categoryinference

StackPython 3.10+, PyTorch 2.0+, CUDA / NCCL, FlashInfer

Single request throughput21.8 tok/s

Batch x4 throughput76.0 tok/s

TTFT (single)52ms

github details →

AgentGuard

research-preview

Local Sidecar for Real-Time AI Agent Security

Local sidecar for real-time AI agent security. Monitors agent tool calls via Mamba-2 SSM, detects prompt injection, and blocks attacks before they execute.

Categorysecurity

StackPython, PyTorch, Transformers, Mamba-2

Classify latency~187ms

Memory complexityO(1)

Model size2.8B

github details →

April AI

v1.0research-preview

Native macOS Assistant with Local Memory and Deep Agents

A fast, screen-aware thinking partner for macOS. Features Gemini Live for low-latency voice, local SQLite memory indexing, and deep sandbox agents for expensive tasks.

Categoryassistant

StackSwift, SwiftUI, macOS Native, SQLite

github details →