Migrating from dcgm-exporter + Prometheus to keda-gpu-scaler¶
Replace the dcgm-exporter → Prometheus → KEDA Prometheus scaler pipeline with keda-gpu-scaler.
What You're Replacing¶
BEFORE:
GPU Pod → dcgm-exporter (DaemonSet) → Prometheus (StatefulSet) → PromQL → KEDA Prometheus trigger → HPA
AFTER:
GPU Pod → keda-gpu-scaler (DaemonSet) → KEDA External trigger → HPA
You're removing 3 components from the scaling path: dcgm-exporter, Prometheus, and the PromQL query layer.
Step 1: Deploy keda-gpu-scaler¶
Keep your existing dcgm-exporter + Prometheus pipeline running during migration. Deploy keda-gpu-scaler alongside it:
# Helm
helm install keda-gpu-scaler deploy/helm/keda-gpu-scaler \
--namespace keda \
--set nodeSelector."nvidia\.com/gpu\.present"=true
# Or manifests
kubectl apply -f deploy/manifests.yaml
Verify it's running on your GPU nodes:
Step 2: Update Your ScaledObject¶
Replace the Prometheus trigger with an external trigger.
Before (Prometheus trigger)¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: vllm-scaler
spec:
scaleTargetRef:
name: vllm-deployment
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc:9090
metricName: DCGM_FI_DEV_GPU_UTIL
query: avg(DCGM_FI_DEV_GPU_UTIL{pod=~"vllm-.*"})
threshold: "80"
After (external trigger)¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: vllm-scaler
spec:
scaleTargetRef:
name: vllm-deployment
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: external
metadata:
scalerAddress: "keda-gpu-scaler.keda.svc.cluster.local:6000"
profile: "vllm-inference"
Or with explicit metric configuration:
triggers:
- type: external
metadata:
scalerAddress: "keda-gpu-scaler.keda.svc.cluster.local:6000"
metricType: "gpu_utilization"
targetValue: "80"
activationThreshold: "5"
Step 3: Metric Mapping¶
DCGM metric → keda-gpu-scaler equivalent:
| DCGM Metric (Prometheus) | keda-gpu-scaler equivalent | Notes |
|---|---|---|
DCGM_FI_DEV_GPU_UTIL |
gpu_utilization |
SM utilization % |
DCGM_FI_DEV_MEM_COPY_UTIL |
memory_utilization |
Memory controller % |
DCGM_FI_DEV_FB_USED |
memory_used_mib |
Frame buffer used in MiB |
DCGM_FI_DEV_GPU_TEMP |
temperature |
GPU die temp in Celsius |
DCGM_FI_DEV_POWER_USAGE |
power_draw |
Power in Watts |
DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_TOTAL |
memory_used_percent |
Computed by scaler |
Step 4: Verify Scaling Behavior¶
Watch the HPA and scaler logs side by side:
# Scaler logs
kubectl logs -n keda -l app=keda-gpu-scaler -f
# HPA status
kubectl get hpa -w
# ScaledObject status
kubectl get scaledobject vllm-scaler -o yaml | grep -A10 status
Run a load test against your inference endpoint and confirm replicas scale up when GPU utilization rises.
Step 5: Remove dcgm-exporter (Optional)¶
Once you've confirmed scaling works through keda-gpu-scaler, you can remove dcgm-exporter from the scaling path.
If you still need dcgm-exporter for Grafana dashboards or monitoring (not scaling), keep it running — it doesn't conflict with keda-gpu-scaler. They both read NVML independently.
If you only had dcgm-exporter for KEDA scaling:
# Remove dcgm-exporter DaemonSet
kubectl delete daemonset dcgm-exporter -n gpu-operator
# Remove the Prometheus scrape config for dcgm metrics (if dedicated)
# Remove the KEDA TriggerAuthentication for Prometheus (if used)
What You Gain¶
| dcgm-exporter + Prometheus | keda-gpu-scaler | |
|---|---|---|
| Metric latency | 15-30s (scrape interval) | Sub-second |
| Components in scaling path | 5 | 2 |
| Configuration | PromQL query per workload | 3-line trigger or profile name |
| Failure mode | Prometheus outage = no scaling | Node-local, no single point of failure |
| Maintenance | PromQL queries break on DCGM version upgrades | No queries to maintain |
Rollback¶
If you need to revert, just switch the ScaledObject trigger back to type: prometheus and delete the keda-gpu-scaler DaemonSet. No data migration required.