Vertical Pod Autoscaler (VPA)
Automatically adjust CPU and memory requests for containers.
The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory resource requests for containers based on historical and real-time usage. It makes pods "bigger" or "smaller" instead of adding more replicas.
How It Works
VPA consists of three components: the Recommender (monitors usage and computes optimal values), the Updater (evicts pods when recommendations diverge significantly), and the Admission Controller (mutates pod specs on creation with recommended values).
Update modes: Off (recommendations only), Initial (sets at creation only), Recreate (evicts and recreates pods), Auto (currently same as Recreate).
When to Use
- Databases, caches, stateful singletons
- You don't know the correct resource requests
- Right-sizing over-provisioned pods to save costs
- Workloads with varying resource needs over time
When NOT to Use
- Stateless apps that can scale horizontally (use HPA)
- Disruption-intolerant workloads
- Already using HPA on CPU/memory (they conflict)
Real-World Example
PostgreSQL Batch Processing
A PostgreSQL pod needs 500m CPU during the day for OLTP queries but 2 CPU at night for ETL batch jobs. VPA automatically adjusts resource requests based on observed patterns, eliminating the need to over-provision for peak (saving 75% of daytime resources).
Step-by-Step Implementation
1. Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Verify
kubectl get pods -n kube-system | grep vpa2. Start with recommendation-only mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgres-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: postgres
updatePolicy:
updateMode: "Off" # Recommendation only3. Enable auto-updates with bounds
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgres-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: postgres
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: postgres
minAllowed:
cpu: "250m"
memory: "512Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
controlledResources: ["cpu", "memory"]Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Pod eviction disruptions | Pods restart unexpectedly | Use "Off" mode initially; set PodDisruptionBudget |
| Conflicting with HPA | Oscillating behavior | Never run both on CPU/memory for the same deployment |
| No minAllowed set | VPA recommends tiny resources | Always set minAllowed in resourcePolicy |
| Slow to converge | Inaccurate recommendations | Let VPA observe for 24-48 hours before enabling Auto |