Health Checks & Application Metrics
Plugged.in exposes health check endpoints for load balancers and monitoring systems, plus comprehensive Node.js runtime metrics for performance tracking.Health Check Endpoint
GET /api/health
Returns the health status of the application with database connectivity check.Response Format
Overall health status:
healthy or unhealthy- 200 OK: Application is healthy
- 503 Service Unavailable: Application has issues
ISO 8601 timestamp of the health check
Individual health check results
Whitelisted IPs only: Process uptime in seconds
Whitelisted IPs only: Application version (from APP_VERSION env var)
Whitelisted IPs only: Runtime environment (development/production)
Security & IP Restrictions
Health endpoint uses the sameMETRICS_ALLOWED_IPS configuration as the metrics endpoint:
- Full health status with version, environment, uptime
- Basic health status only (status, timestamp, checks)
HEAD /api/health
Lightweight health check that returns only status code (no body).- Load balancer health checks
- Kubernetes liveness/readiness probes
- High-frequency monitoring
Application Metrics Endpoint
GET /api/metrics
Exposes Node.js runtime and HTTP metrics in Prometheus format.IP Whitelist Configuration
Configure allowed IPs in.env:
- IPv4:
127.0.0.1,10.0.0.1 - IPv6:
::1,fe80::1 - CIDR:
172.17.0.0/16,10.0.0.0/8
Node.js Runtime Metrics
Process Metrics
Process Metrics
Total user CPU time consumed by the process
Total system CPU time consumed by the process
Process start time in seconds since Unix epoch
Resident memory size in bytes
Node.js Heap Metrics
Node.js Heap Metrics
Event Loop Metrics
Event Loop Metrics
Event loop lag in seconds (sampled every 10ms)
50th percentile event loop lag
90th percentile event loop lag
99th percentile event loop lag
Garbage Collection Metrics
Garbage Collection Metrics
Garbage collection duration by GC typeLabels:
kind (minor/major/incremental/etc.)Buckets: 0.001s, 0.01s, 0.1s, 1s, 2s, 5sHTTP Metrics
Total HTTP requestsLabels:
method, path, status_codeHTTP request duration in secondsLabels:
method, path, status_codeBuckets: 0.01s, 0.05s, 0.1s, 0.5s, 1s, 2s, 5s, 10sTotal HTTP errors (4xx + 5xx responses)Labels:
method, path, error_typeError types: client_error (4xx), server_error (5xx), rate_limit, unauthorizedPrometheus Configuration
Add to yourprometheus.yml:
Alert Rules
Health Check Alerts
Add toprometheus/rules/pluggedin-app-alerts.yml:
Performance Alerts
Grafana Dashboard
Query Examples
Troubleshooting
Health check returns 503
Health check returns 503
- Check database connectivity:
psql $DATABASE_URL -c "SELECT 1" - Review application logs for database errors
- Verify database server is running
- Check connection pool settings
Metrics endpoint returns 403 Forbidden
Metrics endpoint returns 403 Forbidden
- Verify your IP is in
METRICS_ALLOWED_IPS - Check IP format (IPv4, IPv6, or CIDR)
- For CIDR, ensure proper notation (e.g.,
172.17.0.0/16) - Test from allowed IP:
curl -H "X-Forwarded-For: 127.0.0.1" http://localhost:12005/api/metrics
High event loop lag
High event loop lag
- Check for blocking synchronous operations
- Review CPU usage:
pluggedin_process_cpu_user_seconds_total - Identify long-running functions
- Consider offloading heavy work to background workers
Memory usage growing over time
Memory usage growing over time
- Check for memory leaks with heap snapshots
- Review
pluggedin_nodejs_heap_size_used_bytestrend - Check GC metrics:
pluggedin_nodejs_gc_duration_seconds - Consider increasing heap size or implementing memory limits
Best Practices
Health Check Frequency
Load Balancers: Poll every 10-30 seconds using HEAD requestMonitoring Systems: Poll every 30-60 seconds using GET requestAvoid: Polling more frequently than 10 seconds (unnecessary load)
IP Whitelist Security
Production: Only whitelist your specific monitoring server IPsNever: Use
0.0.0.0/0 or overly broad CIDR rangesReview: Audit whitelist quarterly, remove unused IPsMetrics Retention
Prometheus: 15-30 days for detailed metricsLong-term: Export to time-series database for historical analysisAggregation: Use recording rules for frequently-queried metrics

