Skip to main content

Observability Stack

Plugged.in implements a comprehensive observability stack for OAuth 2.1 operations, enabling real-time monitoring, security event detection, and performance analysis.

Architecture

The observability stack consists of three pillars:

Logs (Loki)

Structured JSON logs for all OAuth operations, security events, and errors

Metrics (Prometheus)

Real-time counters, histograms, and gauges for performance tracking

Dashboards (Grafana)

Unified visualization combining logs and metrics for insights

System Diagram

┌─────────────────────────────────────────────────────────────┐
│                     pluggedin-app                           │
│                                                              │
│  ┌───────────────────┐         ┌──────────────────┐        │
│  │  OAuth Operations │────────▶│ Structured Logs  │        │
│  │  (token refresh,  │         │ (Pino + JSON)    │────┐   │
│  │   PKCE, etc.)     │         └──────────────────┘    │   │
│  └───────────────────┘                                  │   │
│           │                                             │   │
│           ▼                                             │   │
│  ┌───────────────────┐                                 │   │
│  │ Prometheus Metrics│──────────┐                      │   │
│  │ (prom-client)     │          │                      │   │
│  └───────────────────┘          │                      │   │
└──────────────────────────────────┼──────────────────────┼───┘
                                   │                      │
                    ┌──────────────▼──────┐     ┌────────▼────────┐
                    │   Prometheus        │     │   Promtail      │
                    │   (Metrics Store)   │     │  (Log Shipper)  │
                    └──────────┬──────────┘     └────────┬────────┘
                               │                         │
                               │          ┌──────────────▼────────┐
                               │          │     Loki              │
                               │          │   (Log Aggregation)   │
                               │          └──────────┬────────────┘
                               │                     │
                    ┌──────────▼─────────────────────▼────────┐
                    │              Grafana                     │
                    │  (Unified Dashboards & Alerting)         │
                    └──────────────────────────────────────────┘

Key Features

📊 Comprehensive Metrics

17 Prometheus metrics track all OAuth operations:
  • OAuth Flows: Initiation, completion, duration (counters + histograms)
  • Token Operations: Refresh attempts, success rate, rotation tracking
  • PKCE Security: State creation, validation, cleanup metrics
  • Security Events: Code injection attempts, token reuse detection, integrity violations
  • Discovery: RFC 9728 metadata discovery, success rates
  • Registration: Dynamic client registration (RFC 7591)

📝 Structured Logging

All logs use JSON format for Loki compatibility:
  • OAuth Events: Flow tracking, state transitions, token operations
  • Security Events: Suspicious activity, attack detection, compliance violations
  • Performance: Timing, duration, resource usage
  • Errors: Detailed error context with stack traces

🔍 Automatic Redaction

Sensitive data is automatically redacted from logs:
  • Access tokens, refresh tokens
  • PKCE code verifiers
  • Client secrets
  • Authorization codes

Quick Start

Prerequisites

# Ensure pluggedin-observability stack is running
cd /path/to/pluggedin-observability
docker-compose up -d
This starts:

Environment Variables

Add to pluggedin-app/.env:
# Observability Configuration
SERVICE_NAME=pluggedin-app
APP_VERSION=2.14.0
LOG_LEVEL=info  # trace, debug, info, warn, error

# Optional: Prometheus Push Gateway
PROMETHEUS_PUSH_GATEWAY=http://localhost:9091

Verify Setup

  1. Check Logs:
# View JSON-formatted OAuth logs
docker logs pluggedin-app | grep oauth | jq .
  1. Check Metrics:
# Prometheus metrics endpoint
curl http://localhost:12005/metrics | grep oauth
  1. Query Loki:
# Query OAuth events from last hour
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={service_name="pluggedin-app"} |= "oauth"' \
  --data-urlencode "start=$(date -u -d '1 hour ago' +%s)000000000" \
  --data-urlencode "end=$(date -u +%s)000000000" | jq .
  1. Access Grafana:
Open http://localhost:3000
Default credentials: admin/admin

What to Monitor

Critical Metrics

Token Reuse Detection

Metric: oauth_token_refresh_total{status="reuse_detected"}Alert when: > 0Action: Immediate security review - indicates replay attack or race condition

Code Injection Attempts

Metric: oauth_code_injection_attempts_totalAlert when: > 0Action: Review security logs, block attacker IP, audit user accounts

OAuth Flow Success Rate

Metric: oauth_flows_total{status="success"} / oauth_flows_totalAlert when: < 95%Action: Investigate failures, check auth server connectivity

Token Refresh Duration

Metric: oauth_token_refresh_duration_secondsAlert when: p99 > 5sAction: Check network latency to auth servers, database performance

Security Events

Monitor these log events continuously:
  • oauth_refresh_token_reuse_detected (P0 - Critical)
  • oauth_code_injection_attempt (P0 - Critical)
  • oauth_integrity_violation (P1 - High)
  • oauth_ownership_violation (P1 - High)
  • pkce_replay_detected (P1 - High)

Log Levels

Configure based on environment:
# Development
LOG_LEVEL=debug  # Verbose logging for debugging

# Staging
LOG_LEVEL=info   # Standard operational logging

# Production
LOG_LEVEL=warn   # Errors and warnings only (reduces volume)

Performance Impact

The observability stack is designed for minimal overhead:
  • Logging: ~1-2ms per operation (async I/O)
  • Metrics: ~0.1ms per increment (in-memory counters)
  • Total: < 0.5% CPU overhead in production

Multi-Instance Considerations

Redis for Distributed Metrics

Production Requirement: When running multiple application instances, use a shared metrics backend to aggregate data across instances.
Options:
  1. Prometheus Federation (Recommended):
    # Each instance exposes /api/metrics
    # Prometheus scrapes all instances
    # Grafana queries aggregated data
    
  2. Shared Redis Backend:
    # Configure Redis for rate limiting
    export REDIS_URL="redis://redis-host:6379"
    
    # Rate limiting data is shared
    # Each instance contributes metrics
    

Performance Optimizations

OAuth Config Caching

Each instance maintains an LRU cache (5-minute TTL):
// Automatic caching reduces database load
// Cache invalidates on config updates
// Max 500 cached configurations per instance
Monitoring:
  • Cache hit rate: oauth_config_cache_hits / oauth_config_cache_requests
  • Database query reduction: ~80% for frequently used servers

Server Ownership Validation

Optimized JOIN query (60-70% faster):
-- Single query replaces 3 sequential queries
SELECT projects.user_id
FROM mcp_servers
INNER JOIN profiles ON mcp_servers.profile_uuid = profiles.uuid
INNER JOIN projects ON profiles.project_uuid = projects.uuid
WHERE mcp_servers.uuid = $1
Monitoring:
  • Query duration: oauth_ownership_validation_duration_seconds
  • Expected p95: < 50ms (was 150ms)

Request Timeouts

All OAuth API calls timeout after 10 seconds:
await fetch(endpoint, {
  signal: AbortSignal.timeout(10000)
});
Monitoring:
  • Timeout events: oauth_request_timeout_total
  • Avg duration: oauth_request_duration_seconds

Multi-Instance Deployment

For complete setup guide:

Multi-Instance Deployment

Production-ready horizontal scaling with Redis, load balancing, and monitoring

Next Steps

Troubleshooting

  1. Check metrics endpoint: curl http://localhost:12005/metrics
  2. Verify Prometheus config targets pluggedin-app
  3. Check Prometheus logs: docker logs prometheus
  4. Ensure app is generating OAuth traffic
  1. Verify JSON log format: docker logs pluggedin-app | head -1 | jq .
  2. Check Promtail config includes app log path
  3. Review Promtail logs: docker logs promtail
  4. Test Loki API: curl http://localhost:3100/ready
  1. Verify Prometheus URL: http://prometheus:9090 (Docker network)
  2. Verify Loki URL: http://loki:3100 (Docker network)
  3. Test connectivity: docker exec grafana curl http://prometheus:9090/api/v1/status/config
  1. Increase LOG_LEVEL to warn in production
  2. Configure log sampling in Promtail
  3. Set Loki retention policy (default: 30 days)
  4. Archive old logs to S3/GCS