Case Studies

Projects & outcomes

Real work. Real challenges. The details that matter — not marketing copy.

KubernetesCloud MigrationVideo Infrastructure

High-Scale Video Platform Migration to Kubernetes

Designed and executed a complete infrastructure migration for a high-scale video platform — moving from a legacy on-prem environment to a production Kubernetes cluster on the cloud. Built the entire video processing pipeline (ingest, transcode, storage, CDN delivery) as cloud-native workloads.

~40%

Infrastructure cost reduction

10× faster

Deployment time

Zero

Downtime during migration

< 90s

Autoscaling response time

The Challenge

The client was running a high-traffic video processing platform on aging on-prem hardware. The system was brittle, hard to scale during traffic spikes, and required manual intervention for deployments. Processing queues would back up under load, and there was no reliable failover. They needed a path to cloud-native infrastructure without disrupting live video delivery.

What I Did

Assessed existing infrastructure and mapped all workloads, dependencies, and data flows
Designed target architecture on cloud Kubernetes (EKS) with autoscaling worker pools
Built full IaC with Terraform: VPC, node groups, storage, networking
Containerized all services and built Helm charts for each workload
Implemented ArgoCD for GitOps-based deployments across environments
Built video processing pipeline with autoscaling job workers (ffmpeg-based)
Set up CDN integration for video delivery and origin failover
Executed zero-downtime cutover with DNS-based traffic shifting

Stack & Tools

AWS EKSTerraformArgoCDHelmKEDAffmpegS3 + CloudFrontGitHub Actions

GPU InfrastructureKubernetesAI / ML

NVIDIA A100 GPU Integration on Kubernetes with MIG Partitioning

Designed and deployed a production multi-tenant GPU cluster on Kubernetes using NVIDIA A100s with full MIG (Multi-Instance GPU) partitioning — matching MIG profile sizes to model sizes so every GPU cycle counts. Small models get small slices; large models get the full card.

+65%

GPU utilization improvement

Up to 7×

Models served per A100

−55%

Inference cost per request

Full MIG

Tenant isolation

The Challenge

The client was building a multi-tenant AI inference platform and needed to serve dozens of models simultaneously — from lightweight 7B models to large 70B+ models — on a fixed pool of NVIDIA A100 80GB GPUs. Giving each model a full GPU was wasteful and expensive. Running everything on shared GPUs without isolation caused memory conflicts and unstable latency. They needed fine-grained, isolated GPU partitioning with Kubernetes-native scheduling.

What I Did

Deployed NVIDIA GPU Operator on Kubernetes to manage drivers, container runtime, and device plugins automatically
Enabled MIG mode on all A100 nodes and planned profile allocation based on model size tiers
Configured 1g.10gb MIG instances for small models (≤7B params) — up to 7 instances per GPU
Configured 2g.20gb MIG instances for mid-size models (7B–13B params)
Configured 4g.40gb MIG instances for large models (30B–40B params)
Reserved full 7g.80gb instances for 70B+ models needing the entire card
Applied custom Kubernetes node labels per MIG profile for precise pod scheduling
Built a dynamic MIG reconfiguration pipeline using mig-parted to reshape profiles on demand without node reboots
Set up resource quotas and LimitRanges per namespace to enforce fair GPU allocation across teams
Integrated vLLM inference server as the serving layer, pinned to specific MIG instances via device plugin
Built Prometheus + Grafana dashboards for per-MIG GPU utilization, memory, and inference throughput

Stack & Tools

NVIDIA A100 80GBNVIDIA GPU OperatorMIG / mig-partedKubernetesvLLMKEDAPrometheusGrafanaHelmTerraform

Working on something similar?

Let's talk. Book a free discovery call and we'll figure out if I'm the right fit for your project.

Book a Call