Case Studies

Projects & outcomes

Real work. Real challenges. The details that matter — not marketing copy.

KubernetesCloud MigrationVideo Infrastructure

High-Scale Video Platform Migration to Kubernetes

Designed and executed a complete infrastructure migration for a high-scale video platform — moving from a legacy on-prem environment to a production Kubernetes cluster on the cloud. Built the entire video processing pipeline (ingest, transcode, storage, CDN delivery) as cloud-native workloads.

~40%
Infrastructure cost reduction
10× faster
Deployment time
Zero
Downtime during migration
< 90s
Autoscaling response time

The Challenge

The client was running a high-traffic video processing platform on aging on-prem hardware. The system was brittle, hard to scale during traffic spikes, and required manual intervention for deployments. Processing queues would back up under load, and there was no reliable failover. They needed a path to cloud-native infrastructure without disrupting live video delivery.

What I Did

  • Assessed existing infrastructure and mapped all workloads, dependencies, and data flows
  • Designed target architecture on cloud Kubernetes (EKS) with autoscaling worker pools
  • Built full IaC with Terraform: VPC, node groups, storage, networking
  • Containerized all services and built Helm charts for each workload
  • Implemented ArgoCD for GitOps-based deployments across environments
  • Built video processing pipeline with autoscaling job workers (ffmpeg-based)
  • Set up CDN integration for video delivery and origin failover
  • Executed zero-downtime cutover with DNS-based traffic shifting

Stack & Tools

AWS EKSTerraformArgoCDHelmKEDAffmpegS3 + CloudFrontGitHub Actions
GPU InfrastructureKubernetesAI / ML

NVIDIA A100 GPU Integration on Kubernetes with MIG Partitioning

Designed and deployed a production multi-tenant GPU cluster on Kubernetes using NVIDIA A100s with full MIG (Multi-Instance GPU) partitioning — matching MIG profile sizes to model sizes so every GPU cycle counts. Small models get small slices; large models get the full card.

+65%
GPU utilization improvement
Up to 7×
Models served per A100
−55%
Inference cost per request
Full MIG
Tenant isolation

The Challenge

The client was building a multi-tenant AI inference platform and needed to serve dozens of models simultaneously — from lightweight 7B models to large 70B+ models — on a fixed pool of NVIDIA A100 80GB GPUs. Giving each model a full GPU was wasteful and expensive. Running everything on shared GPUs without isolation caused memory conflicts and unstable latency. They needed fine-grained, isolated GPU partitioning with Kubernetes-native scheduling.

What I Did

  • Deployed NVIDIA GPU Operator on Kubernetes to manage drivers, container runtime, and device plugins automatically
  • Enabled MIG mode on all A100 nodes and planned profile allocation based on model size tiers
  • Configured 1g.10gb MIG instances for small models (≤7B params) — up to 7 instances per GPU
  • Configured 2g.20gb MIG instances for mid-size models (7B–13B params)
  • Configured 4g.40gb MIG instances for large models (30B–40B params)
  • Reserved full 7g.80gb instances for 70B+ models needing the entire card
  • Applied custom Kubernetes node labels per MIG profile for precise pod scheduling
  • Built a dynamic MIG reconfiguration pipeline using mig-parted to reshape profiles on demand without node reboots
  • Set up resource quotas and LimitRanges per namespace to enforce fair GPU allocation across teams
  • Integrated vLLM inference server as the serving layer, pinned to specific MIG instances via device plugin
  • Built Prometheus + Grafana dashboards for per-MIG GPU utilization, memory, and inference throughput

Stack & Tools

NVIDIA A100 80GBNVIDIA GPU OperatorMIG / mig-partedKubernetesvLLMKEDAPrometheusGrafanaHelmTerraform

Working on something similar?

Let's talk. Book a free discovery call and we'll figure out if I'm the right fit for your project.

Book a Call