Work
Systems I've led & built
A selection of production systems I've architected or led to delivery, and the open-source work I'm publishing alongside them. The thread: agentic AI, inference at scale, and the economics that make both viable.
In production
- Cascaded multi-provider inference platform70% cost cut vs. Bedrock
Quality-based routing across small → mid → frontier models, with cost controls and built-in observability. Adopted as the org-wide reference standard for LLM cost engineering and model governance.
- LangGraph multi-agent on-call system50% manual effort removed
Role-defined agents with per-output evaluation, fallback routing, and CI/CD observability. Productized into a reusable capability adopted across teams.
- Voice-based customer service agentSTT→LLM→TTS real-time pipeline
Production voice agent with multi-turn dialogue management, intent routing, barge-in and latency optimization, and safety guardrails. Productized for reuse across support workflows.
- Multi-tenant LLM inference platform200M+ requests / day
Per-tenant SLA management with CPU/GPU-optimized async batching, serving multiple internal teams. 90% translation cost reduction at ultra-high throughput with SRE-level reliability.
- Real-time AI task allocation30% resource cost cut
Task-allocation system for thousands of UK field engineers — four years in production, optimizing dispatch at national scale.
- No-code enterprise onboarding4mo → 2wk onboarding time
Configurable onboarding module adopted as BT's company-wide standard, built through direct technical discovery with enterprise telecom clients. Contributed to three new contracts.
Open source
-
A simplified LLM inference simulator — batching, KV-cache, and throughput/latency trade-offs worked through from first principles.
-
An evaluation harness for agentic systems: per-output scoring, guardrail verdicts (auto / require-approval / forbidden), and CI gating.
-
A URL shortener built to showcase a cloud-native, Kubernetes-deployed service end to end — container build, deploy, and observability.
More at github.com/kumarpraveendev.
Themes
- Inference economics
- Agentic system design
- Evaluation & governance
- Cross-border engineering leadership