Featured image of post Docker Health Checks and Container Monitoring Best Practices

Docker Health Checks and Container Monitoring Best Practices

Comprehensive guide to Docker health checks including HEALTHCHECK instruction, curl vs TCP checks, startup probes, docker-compose config, monitoring integration, and zero-downtime patterns.

Docker health checks are essential for building self-healing container infrastructure. They enable automatic detection of application failures and trigger container restarts, ensuring your services remain available. This guide covers everything from basic HEALTHCHECK implementation to advanced production patterns.

The HEALTHCHECK Instruction

The HEALTHCHECK instruction in a Dockerfile tells Docker how to test whether a container is still working correctly:

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Docker tracks three states: starting (during the start period), healthy (checks pass), and unhealthy (checks fail after retries). The --start-period is critical for applications with slow startup times – it suppresses failures during initialization.

OptionDefaultDescription
--interval30sTime between checks
--timeout30sPer-check timeout
--start-period0sGrace period at startup
--retries3Consecutive failures before unhealthy

Curl vs TCP vs Custom Checks

The choice of check method depends on your application type:

# HTTP check (most common)
HEALTHCHECK CMD curl -f http://localhost:8080/api/health || exit 1

# TCP socket check (lightweight)
HEALTHCHECK CMD nc -z localhost 5432 || exit 1

# Custom script (complex logic)
HEALTHCHECK CMD /usr/local/bin/healthcheck.sh

HTTP checks validate status codes and response bodies. TCP checks are protocol-agnostic and lightweight. Custom scripts enable application-specific logic – verifying database connectivity, cache responsiveness, or queue depths.

Startup Probes vs Readiness Probes vs Liveness Probes

Docker Compose and Swarm support a three-probe model inspired by Kubernetes:

services:
  app:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/ready"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s

A startup probe handles slow initialization, a readiness probe controls whether traffic routes to the container, and a liveness probe triggers restarts on failure. The start_period field in Docker Compose replaces the need for a separate startup probe in many cases.


Docker Compose Health Check Configuration

Health checks integrate naturally with service orchestration in docker-compose.yml:

services:
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]

The depends_on with condition: service_healthy ensures the app container only starts after the database passes its health check, preventing connection errors at boot.

Monitoring and Alerting Integration

ToolPurposeIntegration
PrometheusMetrics collectionContainer metrics via cAdvisor
GrafanaVisualizationHealth dashboards
Docker Events APIReal-time alertsContainer state streams
cAdvisorResource monitoringExposed metrics endpoint

Use docker events --filter event=health_status to stream health changes. For production, route health status to alerting systems that notify your team when containers become unhealthy.


Common Pitfalls

Misconfigured intervals cause false alerts – setting --interval too aggressively can exhaust resources. DNS resolution failures inside containers can break curl-based checks. Watch for deadlock scenarios where service A’s health check depends on service B, which also depends on A. Use docker inspect --format='{{json .State.Health}}' to debug health check status.

Advanced Patterns

Composite health checks aggregate multiple subsystem statuses into a single result. Self-healing containers use auto-restart strategies combined with exponential backoff. In Docker Swarm, distributed health check coordination ensures that unhealthy containers are automatically removed from the load balancer pool.

Conclusion

Proper health check configuration is foundational to reliable Docker deployments. By combining HEALTHCHECK with dependency-aware orchestration and monitoring integration, you can build systems that detect and recover from failures automatically, minimizing downtime and operational overhead.