Horizontal Pod Autoscaler (HPA)
Scale the number of pod replicas based on CPU, memory, or custom metrics.
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a Deployment based on observed metrics like CPU utilization, memory, or custom metrics. It is the most commonly used autoscaler in Kubernetes.
How It Works
HPA runs a control loop every 15 seconds (configurable). It queries the Metrics Server for current utilization, calculates the desired replica count using the formula:
desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))
Example: 3 replicas at 80% CPU, target 50%
= ceil(3 * (80 / 50)) = ceil(4.8) = 5 replicasThe HPA then updates the replica count on the target Deployment. A stabilization window (5 min for scale-down by default) prevents rapid flapping.
resources.requests defined for the metrics being tracked, otherwise HPA will show <unknown>.When to Use
- Stateless web apps, REST APIs, microservices
- Load correlates with CPU, memory, or request rate
- Variable or unpredictable traffic patterns
- You need fast scale-out (seconds to minutes)
When NOT to Use
- Singleton workloads that can't run multiple replicas
- Databases or stateful workloads (use VPA)
- You need scale-to-zero (use KEDA)
- I/O-bound workloads where CPU doesn't reflect load
Real-World Example
Netflix-style Streaming Service
A video transcoding service scales from 25 to 250 pods during peak evening hours based on a custom metric active_streams_per_pod. When the average exceeds 200 streams per pod, HPA triggers. The stabilization window ensures gradual scale-down after midnight, preventing premature termination of active streams.
Step-by-Step Implementation
1. Ensure Metrics Server is installed
# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system
# If not installed
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml2. Deploy with resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-api
spec:
replicas: 2
selector:
matchLabels:
app: web-api
template:
metadata:
labels:
app: web-api
spec:
containers:
- name: web-api
image: myregistry/web-api:1.4.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m" # REQUIRED for HPA
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"3. Create the HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 604. Verify
kubectl apply -f hpa.yaml
kubectl get hpa web-api-hpa --watch
kubectl describe hpa web-api-hpaCommon Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Missing resource requests | HPA shows <unknown> for metrics | Add resources.requests.cpu to every container |
| Metrics Server not installed | "unable to get metrics" error | Install Metrics Server in kube-system |
| Pod flapping | Replicas oscillate rapidly | Add stabilizationWindowSeconds to behavior |
| Memory as primary metric | Unnecessary scaling (GC behavior) | Use CPU or custom metrics as primary |
| Insufficient cluster capacity | New pods stuck in Pending | Pair with Cluster Autoscaler |