Venkata Rami Reddy Kallu
LLM Inference • Serving Systems • Safety • Privacy
Apex, North Carolina

Venkata Rami Reddy Kallu

Senior AI Engineer | LLM Inference, Serving Systems, Safety & Privacy

I design and build production-grade AI systems across LLM inference, serving infrastructure, and safe tool execution. My work spans runtime optimization, auditable AI workflows, and privacy-preserving ML for text and audio applications.

What I build

My work sits at the boundary between engineering and applied research, with a focus on systems that are measurable, reliable, auditable, and safe by design.

  • LLM inference and serving systems
  • Safe and governed tool-using AI runtimes
  • Benchmarking, runtime validation, and deployment workflows
  • Privacy-preserving synthetic voice detection

I’m especially interested in AI inference, serving platforms, runtime enforcement, and privacy-conscious production AI systems.

Now

What I’m actively building and exploring:

  • PolicyGraph: a reproducible safety runtime for MCP tool execution
  • Inference systems work around vLLM, TensorRT-LLM, Triton, and Kubernetes
  • OWASP-aligned evaluation and contract tests for agent safety
  • Follow-up research on runtime enforcement patterns and privacy-preserving AI

Focus

LLM Inference Serving Systems Kubernetes AI Safety AI Governance Tool-Using Agents Privacy-Preserving ML Synthetic Voice Detection

Featured work

Selected highlights across inference, safety, and privacy

LLM Inference Systems Lab

Hands-on work across modern LLM serving stacks and deployment workflows.

  • vLLM, TensorRT-LLM, and Triton-based serving experiments
  • Quantization, KV-cache behavior, and long-context memory tradeoffs
  • Kubernetes-based deployment and benchmarking workflows

PolicyGraph

A policy-gated runtime for safe and auditable MCP tool execution.

  • Default-deny tool authorization
  • Schema-constrained planning
  • Typed output validation
  • Evidence-locked summaries

Privacy-Preserving Synthetic Voice Detection

Filed systems patent work focused on detecting spoofed speech while minimizing speaker identity leakage.

  • Detect AI-generated or spoofed speech reliably
  • Reduce sensitive identity leakage
  • Support privacy-sensitive deployment environments

Selected capabilities

A concise view of the areas I work across

Inference & Serving

vLLM TensorRT-LLM Triton Quantization KV Cache Benchmarking Long Context

Safety, Governance & Privacy

Runtime Validation Tool Governance Auditable AI Evidence-Backed Summaries Privacy-Preserving ML Synthetic Voice Detection

Research & IP

Production systems with research depth

PolicyGraph Paper + Patent Direction

My broader work explores how to make AI systems safer and more enforceable in practice — through policy-gated execution, runtime validation, and privacy-preserving system design.

  • PolicyGraph submitted to TechRxiv
  • Filed systems patent on privacy-preserving synthetic voice detection
  • Ongoing follow-up work on runtime enforcement patterns for safe AI systems