A/B Testing Deployment
Route user segments to different versions to measure business impact.
A/B Testing deployment routes specific user segments to different application versions to measure business impact. Unlike canary (which validates technical health), A/B testing measures user behavior: conversion rates, engagement, revenue.
How It Works
Traffic routing is based on user attributes (geography, device, user ID hash, cookie) rather than random percentage. Both versions run simultaneously, each instrumented with analytics. After enough data is collected for statistical significance, the winning version is promoted.
When to Use
- UI/UX changes where business metrics matter more than error rates
- Pricing or checkout flow experiments
- Feature launches where user reception is uncertain
- You have an analytics platform for experiment analysis
When NOT to Use
- Backend infrastructure changes with no user-facing impact
- Bug fixes (just deploy them)
- You lack analytics infrastructure for measuring outcomes
- Legal/compliance changes that must apply to all users
Real-World Examples
Amazon - Checkout Button Placement
Amazon tested checkout button placement across 50 million users. Version A had the button above the fold, Version B below. The A/B test ran for 2 weeks and found a 3.2% conversion lift with the above-fold placement, translating to hundreds of millions in additional revenue.
Uber - Surge Pricing Display
Uber A/B tested surge pricing display formats: multiplier (2.3x) vs. flat fare estimate ($34.50). The flat fare format showed 18% higher ride acceptance rates, leading to a global rollout.
Step-by-Step Implementation
1. Deploy both versions with distinct labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-v1
spec:
replicas: 3
selector:
matchLabels:
app: checkout
version: v1
template:
metadata:
labels:
app: checkout
version: v1
spec:
containers:
- name: checkout
image: myregistry/checkout:1.0.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-v2
spec:
replicas: 3
selector:
matchLabels:
app: checkout
version: v2
template:
metadata:
labels:
app: checkout
version: v2
spec:
containers:
- name: checkout
image: myregistry/checkout:2.0.0-experiment2. Route by header or cookie (NGINX Ingress)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: checkout-ab
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "X-User-Group"
nginx.ingress.kubernetes.io/canary-by-header-value: "experiment"
spec:
rules:
- host: checkout.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: checkout-v2
port:
number: 803. Analyze results
# Check sample sizes and statistical significance
# Typically done through analytics platform (Amplitude, Mixpanel, etc.)
# Once experiment concludes, promote winner
kubectl scale deployment checkout-v1 --replicas=0
kubectl scale deployment checkout-v2 --replicas=6Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Insufficient sample size | Results are not statistically significant | Calculate required sample size before starting; run longer if needed |
| User experience leakage | Users see both versions across sessions | Ensure sticky routing via cookies or user ID hash |
| Too many concurrent experiments | Confounding variables, unreliable results | Limit overlapping experiments; use proper experiment framework |
| Ignoring segment bias | Results skewed by non-representative segments | Randomize user assignment; validate segment demographics match |