Vertical Pod Autoscaler (VPA)

Automatically adjust CPU and memory requests for containers.

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory resource requests for containers based on historical and real-time usage. It makes pods "bigger" or "smaller" instead of adding more replicas.

How It Works

VPA consists of three components: the Recommender (monitors usage and computes optimal values), the Updater (evicts pods when recommendations diverge significantly), and the Admission Controller (mutates pod specs on creation with recommended values).

Update modes: Off (recommendations only), Initial (sets at creation only), Recreate (evicts and recreates pods), Auto (currently same as Recreate).

WarningVPA may restart pods to apply new resource values. Ensure your application handles restarts gracefully. Never run VPA and HPA on the same CPU/memory metric.

When to Use

Databases, caches, stateful singletons
You don't know the correct resource requests
Right-sizing over-provisioned pods to save costs
Workloads with varying resource needs over time

When NOT to Use

Stateless apps that can scale horizontally (use HPA)
Disruption-intolerant workloads
Already using HPA on CPU/memory (they conflict)

Real-World Example

PostgreSQL Batch Processing

A PostgreSQL pod needs 500m CPU during the day for OLTP queries but 2 CPU at night for ETL batch jobs. VPA automatically adjusts resource requests based on observed patterns, eliminating the need to over-provision for peak (saving 75% of daytime resources).

Step-by-Step Implementation

1. Install VPA

bash

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Verify
kubectl get pods -n kube-system | grep vpa

2. Start with recommendation-only mode

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: postgres-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: postgres
  updatePolicy:
    updateMode: "Off"   # Recommendation only

3. Enable auto-updates with bounds

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: postgres-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: postgres
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: postgres
      minAllowed:
        cpu: "250m"
        memory: "512Mi"
      maxAllowed:
        cpu: "4"
        memory: "8Gi"
      controlledResources: ["cpu", "memory"]

Common Pitfalls

Pitfall	Symptom	Fix
Pod eviction disruptions	Pods restart unexpectedly	Use "Off" mode initially; set PodDisruptionBudget
Conflicting with HPA	Oscillating behavior	Never run both on CPU/memory for the same deployment
No minAllowed set	VPA recommends tiny resources	Always set minAllowed in resourcePolicy
Slow to converge	Inaccurate recommendations	Let VPA observe for 24-48 hours before enabling Auto