Horizontal Pod Autoscaler (HPA)

Scale the number of pod replicas based on CPU, memory, or custom metrics.

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a Deployment based on observed metrics like CPU utilization, memory, or custom metrics. It is the most commonly used autoscaler in Kubernetes.

How It Works

HPA runs a control loop every 15 seconds (configurable). It queries the Metrics Server for current utilization, calculates the desired replica count using the formula:

text

desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))

Example: 3 replicas at 80% CPU, target 50%
= ceil(3 * (80 / 50)) = ceil(4.8) = 5 replicas

The HPA then updates the replica count on the target Deployment. A stabilization window (5 min for scale-down by default) prevents rapid flapping.

InfoEvery container must have resources.requests defined for the metrics being tracked, otherwise HPA will show <unknown>.

When to Use

Stateless web apps, REST APIs, microservices
Load correlates with CPU, memory, or request rate
Variable or unpredictable traffic patterns
You need fast scale-out (seconds to minutes)

When NOT to Use

Singleton workloads that can't run multiple replicas
Databases or stateful workloads (use VPA)
You need scale-to-zero (use KEDA)
I/O-bound workloads where CPU doesn't reflect load

Real-World Example

Netflix-style Streaming Service

A video transcoding service scales from 25 to 250 pods during peak evening hours based on a custom metric active_streams_per_pod. When the average exceeds 200 streams per pod, HPA triggers. The stabilization window ensures gradual scale-down after midnight, preventing premature termination of active streams.

Step-by-Step Implementation

1. Ensure Metrics Server is installed

bash

# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system

# If not installed
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

2. Deploy with resource requests

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
      - name: web-api
        image: myregistry/web-api:1.4.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"       # REQUIRED for HPA
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

3. Create the HPA

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

4. Verify

bash

kubectl apply -f hpa.yaml
kubectl get hpa web-api-hpa --watch
kubectl describe hpa web-api-hpa

Common Pitfalls

Pitfall	Symptom	Fix
Missing resource requests	HPA shows <unknown> for metrics	Add resources.requests.cpu to every container
Metrics Server not installed	"unable to get metrics" error	Install Metrics Server in kube-system
Pod flapping	Replicas oscillate rapidly	Add stabilizationWindowSeconds to behavior
Memory as primary metric	Unnecessary scaling (GC behavior)	Use CPU or custom metrics as primary
Insufficient cluster capacity	New pods stuck in Pending	Pair with Cluster Autoscaler

Horizontal Pod Autoscaler (HPA)

Scale the number of pod replicas based on CPU, memory, or custom metrics.

How It Works

HPA runs a control loop every 15 seconds (configurable). It queries the Metrics Server for current utilization, calculates the desired replica count using the formula:

text

desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))

Example: 3 replicas at 80% CPU, target 50%
= ceil(3 * (80 / 50)) = ceil(4.8) = 5 replicas

The HPA then updates the replica count on the target Deployment. A stabilization window (5 min for scale-down by default) prevents rapid flapping.

InfoEvery container must have resources.requests defined for the metrics being tracked, otherwise HPA will show <unknown>.

When to Use

Stateless web apps, REST APIs, microservices
Load correlates with CPU, memory, or request rate
Variable or unpredictable traffic patterns
You need fast scale-out (seconds to minutes)

When NOT to Use

Singleton workloads that can't run multiple replicas
Databases or stateful workloads (use VPA)
You need scale-to-zero (use KEDA)
I/O-bound workloads where CPU doesn't reflect load

Real-World Example

Netflix-style Streaming Service

Step-by-Step Implementation

1. Ensure Metrics Server is installed

bash

# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system

# If not installed
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

2. Deploy with resource requests

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
      - name: web-api
        image: myregistry/web-api:1.4.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"       # REQUIRED for HPA
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

3. Create the HPA

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

4. Verify

bash

kubectl apply -f hpa.yaml
kubectl get hpa web-api-hpa --watch
kubectl describe hpa web-api-hpa

Common Pitfalls

Pitfall	Symptom	Fix
Missing resource requests	HPA shows <unknown> for metrics	Add resources.requests.cpu to every container
Metrics Server not installed	"unable to get metrics" error	Install Metrics Server in kube-system
Pod flapping	Replicas oscillate rapidly	Add stabilizationWindowSeconds to behavior
Memory as primary metric	Unnecessary scaling (GC behavior)	Use CPU or custom metrics as primary
Insufficient cluster capacity	New pods stuck in Pending	Pair with Cluster Autoscaler