Docker Health Checks and Container Monitoring Best Practices

Docker health checks are essential for building self-healing container infrastructure. They enable automatic detection of application failures and trigger container restarts, ensuring your services remain available. This guide covers everything from basic HEALTHCHECK implementation to advanced production patterns.

The HEALTHCHECK Instruction

The HEALTHCHECK instruction in a Dockerfile tells Docker how to test whether a container is still working correctly:

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Docker tracks three states: starting (during the start period), healthy (checks pass), and unhealthy (checks fail after retries). The --start-period is critical for applications with slow startup times – it suppresses failures during initialization.

Option	Default	Description
`--interval`	30s	Time between checks
`--timeout`	30s	Per-check timeout
`--start-period`	0s	Grace period at startup
`--retries`	3	Consecutive failures before unhealthy

Curl vs TCP vs Custom Checks

The choice of check method depends on your application type:

# HTTP check (most common)
HEALTHCHECK CMD curl -f http://localhost:8080/api/health || exit 1

# TCP socket check (lightweight)
HEALTHCHECK CMD nc -z localhost 5432 || exit 1

# Custom script (complex logic)
HEALTHCHECK CMD /usr/local/bin/healthcheck.sh

HTTP checks validate status codes and response bodies. TCP checks are protocol-agnostic and lightweight. Custom scripts enable application-specific logic – verifying database connectivity, cache responsiveness, or queue depths.

Startup Probes vs Readiness Probes vs Liveness Probes

Docker Compose and Swarm support a three-probe model inspired by Kubernetes:

services:
  app:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/ready"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s

A startup probe handles slow initialization, a readiness probe controls whether traffic routes to the container, and a liveness probe triggers restarts on failure. The start_period field in Docker Compose replaces the need for a separate startup probe in many cases.

Docker Compose Health Check Configuration

Health checks integrate naturally with service orchestration in docker-compose.yml:

services:
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]

The depends_on with condition: service_healthy ensures the app container only starts after the database passes its health check, preventing connection errors at boot.

Monitoring and Alerting Integration

Tool	Purpose	Integration
Prometheus	Metrics collection	Container metrics via cAdvisor
Grafana	Visualization	Health dashboards
Docker Events API	Real-time alerts	Container state streams
cAdvisor	Resource monitoring	Exposed metrics endpoint

Use docker events --filter event=health_status to stream health changes. For production, route health status to alerting systems that notify your team when containers become unhealthy.

Common Pitfalls

Misconfigured intervals cause false alerts – setting --interval too aggressively can exhaust resources. DNS resolution failures inside containers can break curl-based checks. Watch for deadlock scenarios where service A’s health check depends on service B, which also depends on A. Use docker inspect --format='{{json .State.Health}}' to debug health check status.

Advanced Patterns

Composite health checks aggregate multiple subsystem statuses into a single result. Self-healing containers use auto-restart strategies combined with exponential backoff. In Docker Swarm, distributed health check coordination ensures that unhealthy containers are automatically removed from the load balancer pool.

Conclusion

Proper health check configuration is foundational to reliable Docker deployments. By combining HEALTHCHECK with dependency-aware orchestration and monitoring integration, you can build systems that detect and recover from failures automatically, minimizing downtime and operational overhead.

Display speed of this page

Redirect	?Sec.
App cache	?Sec.
DNS lookup	?Sec.
TCP Connection	?Sec.
First Byte Download	?Sec.
DOMContentLoaded	?Sec.
Load	?Sec.

Completion time for displaying this page: ?Sec.
This is a standard measurement index called Navigation Timing Level 2 established by W3C Web Performance Working Group.