Venkata Rami Reddy Kallu

Senior AI Engineer | LLM Inference, Serving Systems, Safety & Privacy

I design and build production-grade AI systems across LLM inference, serving infrastructure, and safe tool execution. My work spans runtime optimization, auditable AI workflows, and privacy-preserving ML for text and audio applications.

PolicyGraph (GitHub) Email GitHub

What I build

My work sits at the boundary between engineering and applied research, with a focus on systems that are measurable, reliable, auditable, and safe by design.

LLM inference and serving systems
Safe and governed tool-using AI runtimes
Benchmarking, runtime validation, and deployment workflows
Privacy-preserving synthetic voice detection

I’m especially interested in AI inference, serving platforms, runtime enforcement, and privacy-conscious production AI systems.

Now

What I’m actively building and exploring:

PolicyGraph: a reproducible safety runtime for MCP tool execution
Inference systems work around vLLM, TensorRT-LLM, Triton, and Kubernetes
OWASP-aligned evaluation and contract tests for agent safety
Follow-up research on runtime enforcement patterns and privacy-preserving AI

Focus

LLM Inference Serving Systems Kubernetes AI Safety AI Governance Tool-Using Agents Privacy-Preserving ML Synthetic Voice Detection

Featured work

Selected highlights across inference, safety, and privacy

LLM Inference Systems Lab

Hands-on work across modern LLM serving stacks and deployment workflows.

vLLM, TensorRT-LLM, and Triton-based serving experiments
Quantization, KV-cache behavior, and long-context memory tradeoffs
Kubernetes-based deployment and benchmarking workflows

PolicyGraph

A policy-gated runtime for safe and auditable MCP tool execution.

Default-deny tool authorization
Schema-constrained planning
Typed output validation
Evidence-locked summaries

View repository

Privacy-Preserving Synthetic Voice Detection

Filed systems patent work focused on detecting spoofed speech while minimizing speaker identity leakage.

Detect AI-generated or spoofed speech reliably
Reduce sensitive identity leakage
Support privacy-sensitive deployment environments

Selected capabilities

A concise view of the areas I work across

Inference & Serving

vLLM TensorRT-LLM Triton Quantization KV Cache Benchmarking Long Context

Safety, Governance & Privacy

Runtime Validation Tool Governance Auditable AI Evidence-Backed Summaries Privacy-Preserving ML Synthetic Voice Detection

Research & IP

Production systems with research depth

PolicyGraph Paper + Patent Direction

My broader work explores how to make AI systems safer and more enforceable in practice — through policy-gated execution, runtime validation, and privacy-preserving system design.

PolicyGraph submitted to TechRxiv
Filed systems patent on privacy-preserving synthetic voice detection
Ongoing follow-up work on runtime enforcement patterns for safe AI systems