Observability Stack
Plugged.in implements a comprehensive observability stack for OAuth 2.1 operations, enabling real-time monitoring, security event detection, and performance analysis.Architecture
The observability stack consists of three pillars:Logs (Loki)
Structured JSON logs for all OAuth operations, security events, and errors
Metrics (Prometheus)
Real-time counters, histograms, and gauges for performance tracking
Dashboards (Grafana)
Unified visualization combining logs and metrics for insights
System Diagram
Key Features
π Comprehensive Metrics
17 Prometheus metrics track all OAuth operations:- OAuth Flows: Initiation, completion, duration (counters + histograms)
- Token Operations: Refresh attempts, success rate, rotation tracking
- PKCE Security: State creation, validation, cleanup metrics
- Security Events: Code injection attempts, token reuse detection, integrity violations
- Discovery: RFC 9728 metadata discovery, success rates
- Registration: Dynamic client registration (RFC 7591)
π Structured Logging
All logs use JSON format for Loki compatibility:- OAuth Events: Flow tracking, state transitions, token operations
- Security Events: Suspicious activity, attack detection, compliance violations
- Performance: Timing, duration, resource usage
- Errors: Detailed error context with stack traces
π Automatic Redaction
Sensitive data is automatically redacted from logs:- Access tokens, refresh tokens
- PKCE code verifiers
- Client secrets
- Authorization codes
Quick Start
Prerequisites
- Prometheus (http://localhost:9090)
- Loki (http://localhost:3100)
- Grafana (http://localhost:3000)
- Promtail (log shipping)
Environment Variables
Add topluggedin-app/.env:
Verify Setup
- Check Logs:
- Check Metrics:
- Query Loki:
- Access Grafana:
What to Monitor
Critical Metrics
Token Reuse Detection
Metric:
oauth_token_refresh_total{status="reuse_detected"}Alert when: > 0Action: Immediate security review - indicates replay attack or race conditionCode Injection Attempts
Metric:
oauth_code_injection_attempts_totalAlert when: > 0Action: Review security logs, block attacker IP, audit user accountsOAuth Flow Success Rate
Metric:
oauth_flows_total{status="success"} / oauth_flows_totalAlert when: < 95%Action: Investigate failures, check auth server connectivityToken Refresh Duration
Metric:
oauth_token_refresh_duration_secondsAlert when: p99 > 5sAction: Check network latency to auth servers, database performanceSecurity Events
Monitor these log events continuously:oauth_refresh_token_reuse_detected(P0 - Critical)oauth_code_injection_attempt(P0 - Critical)oauth_integrity_violation(P1 - High)oauth_ownership_violation(P1 - High)pkce_replay_detected(P1 - High)
Log Levels
Configure based on environment:Performance Impact
The observability stack is designed for minimal overhead:- Logging: ~1-2ms per operation (async I/O)
- Metrics: ~0.1ms per increment (in-memory counters)
- Total: < 0.5% CPU overhead in production
Next Steps
Log Queries
Learn LogQL queries for OAuth operations
Metrics & PromQL
Explore Prometheus metrics and queries
Grafana Dashboards
Build custom dashboards and alerts
OAuth Security
OAuth 2.1 security implementation
Troubleshooting
Metrics not appearing in Prometheus
Metrics not appearing in Prometheus
- Check metrics endpoint:
curl http://localhost:12005/metrics - Verify Prometheus config targets pluggedin-app
- Check Prometheus logs:
docker logs prometheus - Ensure app is generating OAuth traffic
Logs not in Loki
Logs not in Loki
- Verify JSON log format:
docker logs pluggedin-app | head -1 | jq . - Check Promtail config includes app log path
- Review Promtail logs:
docker logs promtail - Test Loki API:
curl http://localhost:3100/ready
Grafana can't connect to data sources
Grafana can't connect to data sources
- Verify Prometheus URL:
http://prometheus:9090(Docker network) - Verify Loki URL:
http://loki:3100(Docker network) - Test connectivity:
docker exec grafana curl http://prometheus:9090/api/v1/status/config
High log volume
High log volume
- Increase
LOG_LEVELtowarnin production - Configure log sampling in Promtail
- Set Loki retention policy (default: 30 days)
- Archive old logs to S3/GCS

