Docker health checks are essential for building self-healing container infrastructure. They enable automatic detection of application failures and trigger container restarts, ensuring your services remain available. This guide covers everything from basic HEALTHCHECK implementation to advanced production patterns.
The HEALTHCHECK Instruction
The HEALTHCHECK instruction in a Dockerfile tells Docker how to test whether a container is still working correctly:
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
Docker tracks three states: starting (during the start period), healthy (checks pass), and unhealthy (checks fail after retries). The --start-period is critical for applications with slow startup times – it suppresses failures during initialization.
| Option | Default | Description |
|---|---|---|
--interval | 30s | Time between checks |
--timeout | 30s | Per-check timeout |
--start-period | 0s | Grace period at startup |
--retries | 3 | Consecutive failures before unhealthy |
Curl vs TCP vs Custom Checks
The choice of check method depends on your application type:
# HTTP check (most common)
HEALTHCHECK CMD curl -f http://localhost:8080/api/health || exit 1
# TCP socket check (lightweight)
HEALTHCHECK CMD nc -z localhost 5432 || exit 1
# Custom script (complex logic)
HEALTHCHECK CMD /usr/local/bin/healthcheck.sh
HTTP checks validate status codes and response bodies. TCP checks are protocol-agnostic and lightweight. Custom scripts enable application-specific logic – verifying database connectivity, cache responsiveness, or queue depths.
Startup Probes vs Readiness Probes vs Liveness Probes
Docker Compose and Swarm support a three-probe model inspired by Kubernetes:
services:
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/ready"]
interval: 10s
timeout: 5s
retries: 3
start_period: 60s
A startup probe handles slow initialization, a readiness probe controls whether traffic routes to the container, and a liveness probe triggers restarts on failure. The start_period field in Docker Compose replaces the need for a separate startup probe in many cases.
Docker Compose Health Check Configuration
Health checks integrate naturally with service orchestration in docker-compose.yml:
services:
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
app:
image: myapp:latest
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
The depends_on with condition: service_healthy ensures the app container only starts after the database passes its health check, preventing connection errors at boot.
Monitoring and Alerting Integration
| Tool | Purpose | Integration |
|---|---|---|
| Prometheus | Metrics collection | Container metrics via cAdvisor |
| Grafana | Visualization | Health dashboards |
| Docker Events API | Real-time alerts | Container state streams |
| cAdvisor | Resource monitoring | Exposed metrics endpoint |
Use docker events --filter event=health_status to stream health changes. For production, route health status to alerting systems that notify your team when containers become unhealthy.
Common Pitfalls
Misconfigured intervals cause false alerts – setting --interval too aggressively can exhaust resources. DNS resolution failures inside containers can break curl-based checks. Watch for deadlock scenarios where service A’s health check depends on service B, which also depends on A. Use docker inspect --format='{{json .State.Health}}' to debug health check status.
Advanced Patterns
Composite health checks aggregate multiple subsystem statuses into a single result. Self-healing containers use auto-restart strategies combined with exponential backoff. In Docker Swarm, distributed health check coordination ensures that unhealthy containers are automatically removed from the load balancer pool.
Conclusion
Proper health check configuration is foundational to reliable Docker deployments. By combining HEALTHCHECK with dependency-aware orchestration and monitoring integration, you can build systems that detect and recover from failures automatically, minimizing downtime and operational overhead.
