Work Experience
Skills
Check out my latest work
I've worked on a variety of projects, from simple websites to complex web applications. Here are a few of my favorites.
Production voice agent that runs full user-research interviews. Cascaded STT→LLM→TTS streaming with dynamic TTS chunking, VAD + multilingual turn detection, barge-in and silence recovery — end-to-end first response under one second. Screen-share vision lets the agent watch, pause capture, and probe on-screen behavior in realtime.
A pluggable agent runtime built from scratch — single event loop (AgentLoop), unified event stream (AgentEvent), and three swappable contracts (Tool / LlmProvider / SessionStore); no LangChain. On top of it: a Slack-native research agent that parses natural language into tool calls via Gemini function-calling and executes them inside ephemeral E2B sandboxes, with webhook-driven session resume for long tasks.
Wrapped the entire SaaS as an agent-callable MCP server: 30 decorator-registered tools over SSE / StreamableHTTP / stdio, approved through the official OpenAI Apps review. Unified agent & CLI auth with RFC 8414/9728 OAuth discovery and stateless compact tokens (~75% smaller than JWT). All LLM traffic governed through Cloudflare AI Gateway: multi-model routing, rate limiting, cost observability, ~86% prompt-cache hit across 759K+ calls.
Built Cookiy's participant-recruitment vertical solo: adapter + runtime-registry pattern normalizing Prolific / CloudResearch / CINT into one domain model (new supplier ≈ one adapter), a composable screener engine (piped questions, min-selections, 3-state screen-in), and a concurrency-safe matching & billing engine — DB transactions, conditional atomic updates, idempotent wallet dual-write with drift=0. 50k+ respondents and 700+ studies served in production.
I'm a heavy user of AI coding tools — this dashboard makes it measurable. A GitHub Actions pipeline auto-syncs my daily Claude Code token usage to a public page: ~2B tokens and $1.6k+ API-equivalent cost in a single month, broken down by input/output/cache and per-day cost.
An ard-spec v0.9 conformant Agentic Resource Discovery (ARD) registry in TypeScript — crawls ai-catalog.json, BM25 search, federation between registries, self-publishes via .well-known. Verified with the official conformance CLI.
Obsidian plugin that auto-uploads pasted images to a GitHub repo and rewrites links to permanent jsDelivr CDN URLs — keeping vaults lightweight while notes stay portable.
Research & Publications
Alongside product engineering, I work on research about agentic systems — how agents fail, and how those failures become training signals.
AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
Under review at SIGKDD 2026 · arXiv preprint · cs.LG / cs.AI
Co-authored a multi-agent framework for automating SRE diagnosis with LLMs: distills expert knowledge into open-source models via GRPO, isolates execution behind a read-write separated architecture, and converts failed operation attempts into training signals — 66.3% success on benchmark tasks, with failure analysis alone contributing +4.8pp.
Get in Touch
Want to chat? Just shoot me a dm with a direct question on twitter and I'll respond whenever I can. I will ignore all soliciting.

