Comprehensive system architecture diagrams for FortiBlox Nexus infrastructure platform

FortiBlox Nexus Architecture

This document provides detailed architectural diagrams and explanations of the FortiBlox Nexus infrastructure platform. Each diagram illustrates key components, data flows, and system interactions.

1. System Overview

The complete FortiBlox Nexus platform consists of multiple integrated services working together to provide high-performance X1 Blockchain infrastructure.

graph TB
    subgraph "Client Layer"
        Client[Client Applications]
        Browser[Web Browsers]
        Server[Server Apps]
    end

    subgraph "API Gateway Layer"
        Gateway[API Gateway<br/>Authentication<br/>Rate Limiting<br/>Routing]
    end

    subgraph "Service Layer"
        RPC[RPC Service<br/>Load Balancer]
        Geyser[Geyser REST API<br/>TimescaleDB Queries]
        WS[WebSocket Server<br/>Real-time Streaming]
    end

    subgraph "Data Layer"
        Cache[(Redis Cache<br/>5s-5min TTL)]
        DB[(TimescaleDB<br/>Historical Data)]
    end

    subgraph "Blockchain Layer"
        Validator1[X1 Validator 1<br/>RPC Endpoint]
        Validator2[X1 Validator 2<br/>RPC Endpoint]
        Validator3[X1 Validator 3<br/>RPC Endpoint]
        GeyserPlugin[Geyser Plugin<br/>Real-time Ingestion]
    end

    Client --> Gateway
    Browser --> Gateway
    Server --> Gateway

    Gateway --> RPC
    Gateway --> Geyser
    Gateway --> WS

    RPC --> Cache
    Geyser --> Cache

    Cache -.-> DB
    Geyser --> DB
    WS --> DB

    RPC --> Validator1
    RPC --> Validator2
    RPC --> Validator3

    GeyserPlugin --> DB
    Validator1 -.-> GeyserPlugin
    Validator2 -.-> GeyserPlugin
    Validator3 -.-> GeyserPlugin

    style Gateway fill:#4a90e2,stroke:#2e5c8a,color:#fff
    style Cache fill:#f39c12,stroke:#d68910,color:#fff
    style DB fill:#27ae60,stroke:#1e8449,color:#fff

Key Components

Client Layer:

Supports multiple client types: web browsers, mobile apps, server applications
All clients use HTTPS/WSS for secure communication
API keys authenticate every request

API Gateway:

Central entry point for all requests
Validates API keys and checks tier permissions
Enforces rate limits (10-1000+ req/s depending on tier)
Routes requests to appropriate services
Tracks credit consumption

Service Layer:

RPC Service: Intelligent load balancing across X1 validators
Geyser REST API: Historical data queries from TimescaleDB
WebSocket Server: Real-time streaming with pub/sub architecture

Data Layer:

Redis Cache: Multi-tier caching (5s-5min TTL based on data type)
TimescaleDB: Time-series optimized PostgreSQL for historical data

Blockchain Layer:

Multiple X1 validator nodes for redundancy
Geyser plugin captures real-time blockchain events
Automatic health monitoring and failover

Performance Characteristics

API Gateway Latency: <10ms overhead
Cache Hit Rate: 70-85% for typical workloads
RPC Latency: 40-100ms (including network)
Geyser API Latency: 30-80ms (cached), 100-300ms (database)
WebSocket Latency: <100ms event delivery

Scaling Strategy

Horizontal scaling of API Gateway (multiple instances)
Independent scaling of RPC, Geyser, and WebSocket services
Redis cluster for distributed caching
TimescaleDB read replicas for query scaling
Multi-region deployment for global availability

2. RPC Request Flow

Detailed flow of an RPC request through the FortiBlox Nexus infrastructure, showing authentication, routing, caching, and response.

sequenceDiagram
    participant C as Client
    participant G as API Gateway
    participant A as Auth Service
    participant RL as Rate Limiter
    participant Cache as Redis Cache
    participant LB as Load Balancer
    participant RPC1 as RPC Node 1
    participant RPC2 as RPC Node 2

    C->>G: POST /rpc<br/>X-API-Key: fbx_xxx<br/>method: getAccountInfo

    Note over G: Step 1: Authentication
    G->>A: Validate API Key
    A->>A: Check key validity<br/>Check tier permissions<br/>Check network access
    A-->>G: Auth Success<br/>Tier: Business<br/>Credits: 28.5M remaining

    Note over G: Step 2: Rate Limiting
    G->>RL: Check rate limit
    RL->>RL: Current: 45/200 req/s<br/>Allow request
    RL-->>G: Rate limit OK<br/>Remaining: 155

    Note over G: Step 3: Cache Check
    G->>Cache: Check cache<br/>Key: getAccountInfo:pubkey:confirmed
    alt Cache Hit
        Cache-->>G: Cached result (30s old)
        Note over G: Latency: ~5ms
        G-->>C: 200 OK<br/>X-Cache-Status: HIT<br/>X-RPC-Latency-Ms: 5<br/>Result
    else Cache Miss
        Cache-->>G: No cached result

        Note over G: Step 4: Node Selection
        G->>LB: Select healthy RPC node
        LB->>LB: Health scores:<br/>Node 1: 95%<br/>Node 2: 88%<br/>Node 3: 20% (skip)
        LB-->>G: Route to Node 1

        Note over G: Step 5: RPC Call
        G->>RPC1: Forward RPC request
        Note over RPC1: Process request<br/>Latency: 45ms
        RPC1-->>G: RPC Response

        Note over G: Step 6: Cache Result
        G->>Cache: Store result<br/>TTL: 30s
        Cache-->>G: Cached

        Note over G: Step 7: Deduct Credits
        G->>G: Deduct 1 credit<br/>Update usage metrics

        Note over G: Total Latency: ~55ms
        G-->>C: 200 OK<br/>X-Cache-Status: MISS<br/>X-RPC-Endpoint: node1<br/>X-RPC-Latency-Ms: 55<br/>X-RateLimit-Remaining: 155<br/>Result
    end

Latency Breakdown (Cache Miss)

Step	Component	Typical Latency
1. Authentication	Auth Service	2-5ms
2. Rate Limiting	Redis	1-3ms
3. Cache Check	Redis	1-2ms
4. Node Selection	Load Balancer	<1ms
5. RPC Call	X1 Validator	40-80ms
6. Cache Storage	Redis	1-2ms
Total		45-95ms

Latency Breakdown (Cache Hit)

Step	Component	Typical Latency
1. Authentication	Auth Service	2-5ms
2. Rate Limiting	Redis	1-3ms
3. Cache Retrieval	Redis	1-2ms
Total		4-10ms

Credit Consumption

Different methods consume different amounts of credits:

Standard methods (getHealth, getSlot, getBalance): 1 credit
Heavy methods (getBlock, getTransaction, getProgramAccounts): 5 credits
Transaction submission (sendTransaction, simulateTransaction): 10 credits

Error Handling

The system handles failures gracefully:

Invalid API Key: Return 401 immediately (no further processing)
Rate Limit Exceeded: Return 429 with Retry-After header
RPC Node Down: Automatic failover to healthy node
All Nodes Down: Return 503 Service Unavailable
Timeout: Return 504 Gateway Timeout after 30s

3. Geyser Streaming Architecture

The Geyser system captures real-time blockchain data and stores it in TimescaleDB for fast historical queries.

graph TB
    subgraph "X1 Blockchain Network"
        V1[Validator 1<br/>Block Production]
        V2[Validator 2<br/>Block Production]
        V3[Validator 3<br/>Block Production]
    end

    subgraph "Geyser Plugin Layer"
        GP1[Geyser Plugin 1<br/>Validator 1]
        GP2[Geyser Plugin 2<br/>Validator 2]
        GP3[Geyser Plugin 3<br/>Validator 3]
    end

    subgraph "Data Ingestion Pipeline"
        Queue[Message Queue<br/>Kafka/Redis Streams]
        Processor[Stream Processor<br/>Deduplication<br/>Transformation]
    end

    subgraph "Storage Layer"
        TSDB[(TimescaleDB<br/>Hypertables)]

        subgraph "Tables"
            TXTable[Transactions<br/>Indexed by signature]
            BlockTable[Blocks<br/>Indexed by slot]
            AccountTable[Accounts<br/>Indexed by address]
            TokenTable[Token Metadata<br/>Indexed by mint]
        end
    end

    subgraph "Query Layer"
        Cache2[(Redis Cache<br/>Query Results)]
        GeyserAPI[Geyser REST API<br/>Complex Queries]
    end

    subgraph "Optimization Layer"
        Materialized[Materialized Views<br/>Pre-aggregated Stats]
        Indexes[Hypertable Indexes<br/>Time + Attributes]
    end

    V1 --> GP1
    V2 --> GP2
    V3 --> GP3

    GP1 --> Queue
    GP2 --> Queue
    GP3 --> Queue

    Queue --> Processor

    Processor --> TXTable
    Processor --> BlockTable
    Processor --> AccountTable
    Processor --> TokenTable

    TXTable --> TSDB
    BlockTable --> TSDB
    AccountTable --> TSDB
    TokenTable --> TSDB

    TSDB --> Materialized
    TSDB --> Indexes

    GeyserAPI --> Cache2
    Cache2 -.->|Cache Miss| TSDB
    Materialized --> GeyserAPI
    Indexes --> GeyserAPI

    style Queue fill:#9b59b6,stroke:#7d3c98,color:#fff
    style TSDB fill:#27ae60,stroke:#1e8449,color:#fff
    style Cache2 fill:#f39c12,stroke:#d68910,color:#fff

Data Flow

Capture: Geyser plugins on validators capture every transaction, block, and account update
Queue: Events sent to message queue for reliable delivery
Process: Stream processor deduplicates, validates, and transforms data
Store: Data inserted into TimescaleDB hypertables with time-series optimization
Index: Automatic indexing on time, signature, address, program ID
Query: REST API serves queries with intelligent caching

Real-time vs Historical Data

Real-time (Processed/Confirmed):

Latency: 200-400ms behind blockchain
Commitment: processed or confirmed
Use case: Live dashboards, real-time monitoring
Cache TTL: 5-10 seconds

Historical (Finalized):

Latency: 30-45 seconds behind blockchain
Commitment: finalized
Use case: Analytics, permanent records
Cache TTL: 5 minutes

Query Optimization

1. Time-series Partitioning: TimescaleDB automatically partitions data by time (1-day chunks) for fast queries on recent data.

2. Materialized Views: Pre-aggregated statistics updated every minute:

Transactions per block
Validator statistics
Token transfer volumes
Program usage metrics

3. Multi-level Caching:

L1: Redis cache (70-85% hit rate)
L2: TimescaleDB query cache
L3: Materialized views

4. Indexes:

Time-based index (primary)
Signature hash index (unique lookups)
Account address index (account history)
Program ID index (program activity)
Composite indexes for common queries

Performance Characteristics

Ingestion Rate: 10,000+ events/second
Write Latency: <100ms from blockchain to database
Query Latency: 30-80ms (cached), 100-300ms (database)
Storage Growth: ~50GB/month for mainnet
Retention: Unlimited (all tiers include full history)

Scaling Approach

Write Scaling: Multiple Geyser plugins + message queue buffering
Read Scaling: TimescaleDB read replicas + Redis cache cluster
Storage Scaling: Automatic compression + continuous aggregates
Geographic Scaling: Regional read replicas for global access

4. WebSocket Architecture

Real-time streaming architecture using WebSocket for low-latency event delivery to connected clients.

graph TB
    subgraph "Client Connections"
        C1[Client 1<br/>WSS Connection]
        C2[Client 2<br/>WSS Connection]
        C3[Client 3<br/>WSS Connection]
        CN[Client N<br/>WSS Connection]
    end

    subgraph "WebSocket Server Cluster"
        WS1[WS Server 1<br/>Node.js/Go]
        WS2[WS Server 2<br/>Node.js/Go]
        WS3[WS Server 3<br/>Node.js/Go]
    end

    subgraph "Subscription Management"
        SubMgr[Subscription Manager<br/>Redis Pub/Sub]

        subgraph "Channels"
            TxChannel[transactions<br/>Channel]
            BlockChannel[blocks<br/>Channel]
            AccountChannel[accounts<br/>Channel]
            SlotChannel[slots<br/>Channel]
        end
    end

    subgraph "Event Sources"
        GeyserStream[Geyser Plugin<br/>Real-time Events]
        DBStream[TimescaleDB<br/>LISTEN/NOTIFY]
    end

    subgraph "Load Balancer"
        LB[Load Balancer<br/>Sticky Sessions]
    end

    C1 -.-> LB
    C2 -.-> LB
    C3 -.-> LB
    CN -.-> LB

    LB --> WS1
    LB --> WS2
    LB --> WS3

    WS1 --> SubMgr
    WS2 --> SubMgr
    WS3 --> SubMgr

    SubMgr --> TxChannel
    SubMgr --> BlockChannel
    SubMgr --> AccountChannel
    SubMgr --> SlotChannel

    GeyserStream --> TxChannel
    GeyserStream --> BlockChannel
    GeyserStream --> AccountChannel
    GeyserStream --> SlotChannel

    DBStream -.-> SubMgr

    style SubMgr fill:#e74c3c,stroke:#c0392b,color:#fff
    style LB fill:#3498db,stroke:#2874a6,color:#fff

Connection Lifecycle

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant WS as WebSocket Server
    participant Auth as Auth Service
    participant Sub as Subscription Manager
    participant Stream as Event Stream

    C->>LB: WSS Connection<br/>?api-key=fbx_xxx
    LB->>WS: Route to WS Server

    WS->>Auth: Validate API Key
    Auth->>Auth: Check tier limits<br/>Check connection count
    Auth-->>WS: Auth Success<br/>Max Connections: 5<br/>Current: 2/5

    WS-->>C: Connection Established<br/>type: connected

    Note over C: Subscribe to events
    C->>WS: {action: "subscribe"<br/>channel: "transactions"<br/>filters: {...}}

    WS->>Sub: Register subscription<br/>Client ID: c1234<br/>Channel: transactions
    Sub->>Sub: Add to channel<br/>Apply filters
    Sub-->>WS: Subscription active

    WS-->>C: {type: "subscribed"<br/>channel: "transactions"<br/>subscriptionId: "sub_xxx"}

    Note over Stream: New transaction occurs
    Stream->>Sub: Publish transaction event
    Sub->>Sub: Match filters<br/>Find subscribers
    Sub->>WS: Send to matching clients
    WS->>C: {type: "transaction"<br/>data: {...}}

    Note over C: Heartbeat every 30s
    C->>WS: {action: "ping"}
    WS-->>C: {type: "pong"}

    Note over C: Unsubscribe
    C->>WS: {action: "unsubscribe"<br/>subscriptionId: "sub_xxx"}
    WS->>Sub: Remove subscription
    Sub-->>WS: Unsubscribed
    WS-->>C: {type: "unsubscribed"}

    C->>WS: Close connection
    WS->>Sub: Remove all subscriptions
    WS-->>C: Connection closed

Fan-out Architecture

Each WebSocket server handles 1,000-5,000 concurrent connections:

Single Event Processing:

Geyser plugin emits transaction event
Event published to Redis Pub/Sub channel
All WS servers subscribed to channel receive event
Each WS server filters event against client subscriptions
Matching events sent to connected clients

Filtering Strategies:

Server-side filtering reduces bandwidth
Client-specific filters (account, program, commitment)
Subscription multiplexing (one connection, many subscriptions)

Scaling Strategy

Horizontal Scaling:

Multiple WS server instances behind load balancer
Redis Pub/Sub broadcasts to all servers
Each server handles subset of connections
Sticky sessions keep clients connected to same server

Connection Distribution:

Load Balancer
├── WS Server 1: 2,500 connections
├── WS Server 2: 2,500 connections
├── WS Server 3: 2,500 connections
└── WS Server 4: 2,500 connections
Total: 10,000 concurrent connections

Tier Limits:

Tier	Max Connections	Messages/Second
Free	5	100
Developer	5	500
Business	250	2,000
Professional	250	5,000
Enterprise	Custom	Custom

Performance Characteristics

Connection Latency: <200ms to establish
Event Latency: <100ms from blockchain to client
Message Rate: 100-5,000 msg/s per connection (tier-dependent)
Bandwidth: ~5KB/message average
Memory: ~50KB per connection
CPU: Low (event-driven architecture)

Reliability Features

1. Automatic Reconnection: Clients implement exponential backoff for reconnection

2. Heartbeat/Ping-Pong: 30-second heartbeat detects stale connections

3. Message Acknowledgment: Critical messages require client acknowledgment

4. Connection Recovery: Clients can resume subscriptions after reconnection

5. Graceful Degradation: Rate limiting prevents overwhelming clients

5. Authentication Flow

Comprehensive authentication and authorization flow showing API key validation, tier checking, and credit tracking.

graph TB
    Start([Client Request]) --> Auth{Valid API Key?}

    Auth -->|No| AuthErr[401 Unauthorized<br/>error: INVALID_API_KEY]
    Auth -->|Yes| Active{Key Active?}

    Active -->|No| ActiveErr[401 Unauthorized<br/>error: KEY_INACTIVE]
    Active -->|Yes| CheckCache{Check Redis Cache}

    CheckCache -->|Cache Hit| LoadCache[Load Tier Info<br/>from Cache]
    CheckCache -->|Cache Miss| LoadDB[Load from Database<br/>Cache for 5min]

    LoadCache --> ValidateTier
    LoadDB --> ValidateTier

    ValidateTier{Validate Tier<br/>Permissions}

    ValidateTier -->|Network Restricted| CheckNetwork{Network<br/>Allowed?}
    CheckNetwork -->|No| NetworkErr[403 Forbidden<br/>error: NETWORK_NOT_ALLOWED]
    CheckNetwork -->|Yes| CheckDomain

    ValidateTier -->|Domain Restricted| CheckDomain{Domain<br/>Allowed?}
    CheckDomain -->|No| DomainErr[403 Forbidden<br/>error: DOMAIN_NOT_ALLOWED]
    CheckDomain -->|Yes| CheckIP

    ValidateTier -->|IP Restricted| CheckIP{IP Address<br/>Allowed?}
    CheckIP -->|No| IPErr[403 Forbidden<br/>error: IP_NOT_ALLOWED]
    CheckIP -->|Yes| RateLimit

    ValidateTier -->|No Restrictions| RateLimit{Rate Limit<br/>Check}

    RateLimit -->|Exceeded| RateLimitErr[429 Too Many Requests<br/>error: RATE_LIMIT_EXCEEDED<br/>retry_after: Xs]
    RateLimit -->|OK| Credits{Sufficient<br/>Credits?}

    Credits -->|No| CreditsErr[402 Payment Required<br/>error: INSUFFICIENT_CREDITS]
    Credits -->|Yes| Success[Request Allowed<br/>Proceed to Service]

    Success --> Deduct[Deduct Credits<br/>Update Usage]
    Deduct --> Response[Process Request<br/>Return Response]

    AuthErr --> End([End])
    ActiveErr --> End
    NetworkErr --> End
    DomainErr --> End
    IPErr --> End
    RateLimitErr --> End
    CreditsErr --> End
    Response --> End

    style Success fill:#27ae60,stroke:#1e8449,color:#fff
    style AuthErr fill:#e74c3c,stroke:#c0392b,color:#fff
    style ActiveErr fill:#e74c3c,stroke:#c0392b,color:#fff
    style NetworkErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style DomainErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style IPErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style RateLimitErr fill:#f39c12,stroke:#d68910,color:#fff
    style CreditsErr fill:#e74c3c,stroke:#c0392b,color:#fff

Authentication Steps

1. API Key Validation (2-5ms)

Extract API key from header (X-API-Key or Authorization Bearer)
Validate format (fbx_xxx pattern)
Check key exists in system
Verify key is active (not revoked/expired)

2. Tier Information Loading (1-3ms)

Check Redis cache for tier info (5-minute TTL)
On cache miss, load from database
Cache includes: tier level, rate limits, credit balance, restrictions

3. Access Control Validation (1-2ms)

Network Restrictions: Verify requested network allowed (mainnet/devnet/testnet)
Domain Restrictions: Check Origin/Referer header against whitelist
IP Restrictions: Validate source IP against allowed ranges

4. Rate Limiting (1-3ms)

Check current request rate in Redis
Sliding window algorithm (per-second buckets)
Allow burst up to 2x limit for short periods
Return 429 if exceeded with Retry-After header

5. Credit Checking (1-2ms)

Load current credit balance from cache
Calculate request cost based on method
Verify sufficient credits available
Return 402 if insufficient

6. Credit Deduction (1-2ms)

Deduct credits from balance
Update usage metrics
Log usage for billing

Redis Cache Strategy

Cached Data (5-minute TTL):

{
  "apiKey": "fbx_xxx",
  "tier": "Business",
  "rateLimit": {
    "requestsPerSecond": 200,
    "burstLimit": 400
  },
  "credits": {
    "total": 30000000,
    "used": 1500000,
    "remaining": 28500000
  },
  "restrictions": {
    "networks": ["mainnet", "devnet"],
    "domains": ["https://myapp.com"],
    "ipRanges": ["203.0.113.0/24"]
  },
  "metadata": {
    "userId": "user_123",
    "keyName": "Production API Key",
    "createdAt": "2025-11-01T00:00:00Z"
  }
}

Rate Limiting Algorithm

Sliding Window:

// Redis keys for rate limiting
const key = `ratelimit:${apiKey}:${currentSecond}`;
const count = await redis.incr(key);
await redis.expire(key, 60); // Keep for 60 seconds

if (count > rateLimit.requestsPerSecond) {
  // Check burst allowance
  const windowTotal = await sumPreviousSeconds(apiKey, 10);
  const avgRate = windowTotal / 10;

  if (avgRate > rateLimit.requestsPerSecond && count > rateLimit.burstLimit) {
    throw new RateLimitError();
  }
}

Rate Limit Headers:

X-RateLimit-Limit: 200
X-RateLimit-Remaining: 187
X-RateLimit-Reset: 1700000000
X-RateLimit-Burst: 400

Credit Tracking

Credit Costs by Method:

Category	Methods	Credits
Standard	getHealth, getSlot, getBalance	1
Heavy	getBlock, getTransaction, getProgramAccounts	5
Transaction	sendTransaction, simulateTransaction	10
Geyser Simple	/transaction/:sig, /block/:slot	5
Geyser List	/transactions, /blocks	10
Geyser Search	Complex queries with filters	25

Usage Tracking:

Real-time credit deduction
Usage metrics logged to TimescaleDB
Daily usage summaries
Billing calculated from usage logs

6. Multi-Region Setup

FortiBlox Nexus supports multi-region deployment for global availability and low-latency access.

graph TB
    subgraph "Global DNS"
        DNS[Route 53<br/>GeoDNS Routing]
    end

    subgraph "Region: US East"
        USE_LB[Load Balancer<br/>us-east-1]
        USE_Gateway[API Gateway]
        USE_RPC[RPC Service]
        USE_Geyser[Geyser API]
        USE_WS[WebSocket Server]
        USE_Redis[(Redis Primary)]
        USE_TSDB[(TimescaleDB Primary)]
        USE_Validators[X1 Validators<br/>3 nodes]
    end

    subgraph "Region: EU West"
        EUW_LB[Load Balancer<br/>eu-west-1]
        EUW_Gateway[API Gateway]
        EUW_RPC[RPC Service]
        EUW_Geyser[Geyser API]
        EUW_WS[WebSocket Server]
        EUW_Redis[(Redis Replica)]
        EUW_TSDB[(TimescaleDB Replica)]
        EUW_Validators[X1 Validators<br/>2 nodes]
    end

    subgraph "Region: Asia Pacific"
        AP_LB[Load Balancer<br/>ap-southeast-1]
        AP_Gateway[API Gateway]
        AP_RPC[RPC Service]
        AP_Geyser[Geyser API]
        AP_WS[WebSocket Server]
        AP_Redis[(Redis Replica)]
        AP_TSDB[(TimescaleDB Replica)]
        AP_Validators[X1 Validators<br/>2 nodes]
    end

    subgraph "Monitoring & Control"
        Monitor[Global Monitoring<br/>Health Checks]
        Failover[Failover Controller<br/>Automatic Routing]
    end

    DNS --> USE_LB
    DNS --> EUW_LB
    DNS --> AP_LB

    USE_LB --> USE_Gateway
    USE_Gateway --> USE_RPC
    USE_Gateway --> USE_Geyser
    USE_Gateway --> USE_WS

    USE_RPC --> USE_Redis
    USE_Geyser --> USE_Redis
    USE_Geyser --> USE_TSDB
    USE_RPC --> USE_Validators

    EUW_LB --> EUW_Gateway
    EUW_Gateway --> EUW_RPC
    EUW_Gateway --> EUW_Geyser
    EUW_Gateway --> EUW_WS

    EUW_RPC --> EUW_Redis
    EUW_Geyser --> EUW_Redis
    EUW_Geyser --> EUW_TSDB
    EUW_RPC --> EUW_Validators

    AP_LB --> AP_Gateway
    AP_Gateway --> AP_RPC
    AP_Gateway --> AP_Geyser
    AP_Gateway --> AP_WS

    AP_RPC --> AP_Redis
    AP_Geyser --> AP_Redis
    AP_Geyser --> AP_TSDB
    AP_RPC --> AP_Validators

    USE_Redis -.->|Replication| EUW_Redis
    USE_Redis -.->|Replication| AP_Redis
    USE_TSDB -.->|Replication| EUW_TSDB
    USE_TSDB -.->|Replication| AP_TSDB

    Monitor --> USE_LB
    Monitor --> EUW_LB
    Monitor --> AP_LB

    Failover --> DNS
    Monitor --> Failover

    style DNS fill:#3498db,stroke:#2874a6,color:#fff
    style Monitor fill:#e74c3c,stroke:#c0392b,color:#fff

Geographic Routing

DNS-based Routing:

Route 53 GeoDNS routes users to nearest region
Latency-based routing for optimal performance
Health checks ensure region availability

Latency Benefits:

Client Region	US East	EU West	Asia Pacific
US East	10-20ms	80-100ms	150-200ms
EU West	80-100ms	10-20ms	120-180ms
Asia Pacific	150-200ms	120-180ms	10-20ms

Data Replication

Redis Cache Replication:

Master-replica replication (async)
10-50ms replication lag
Read from local replica
Writes to primary only (auth, rate limiting)

TimescaleDB Replication:

Streaming replication (async)
1-5 second replication lag
Read replicas for query distribution
Primary in US East for writes

Consistency Model:

Authentication: Eventual consistency (5-min cache)
Rate Limiting: Per-region enforcement (may exceed global limit slightly)
Historical Data: Eventual consistency (1-5s lag)
Real-time Data: Direct from validators in each region

Failover Logic

Health Monitoring:

// Health check every 10 seconds
const healthCheck = {
  endpoint: "https://us-east.nexus.fortiblox.com/health",
  interval: 10000,
  timeout: 5000,
  consecutiveFailures: 3
};

// Failover trigger
if (consecutiveFailures >= 3) {
  // Remove from DNS
  // Route traffic to healthy regions
  // Alert operations team
}

Automatic Failover:

Health check detects region failure (3 consecutive failures)
Remove failing region from DNS
Traffic redistributed to healthy regions
Operations team alerted
Automatic recovery when health restored

Manual Failover:

Operations dashboard for manual control
Gradual traffic shifting (0% → 25% → 50% → 100%)
Rollback capability

Load Distribution

Normal Operation:

US East: 50% traffic (largest user base)
EU West: 30% traffic
Asia Pacific: 20% traffic

During US East Failure:

EU West: 60% traffic (auto-scaled)
Asia Pacific: 40% traffic (auto-scaled)
Automatic capacity scaling in remaining regions

Scaling Strategy

Regional Auto-scaling:

Monitor CPU, memory, request rate
Scale API Gateway: 2-10 instances per region
Scale RPC Service: 2-8 instances per region
Scale WebSocket: 2-6 instances per region

Cross-region Scaling:

Primary region handles 80% of capacity
Secondary regions can handle 150% during failover

7. Monitoring & Observability

Comprehensive monitoring and observability infrastructure for FortiBlox Nexus platform.

graph TB
    subgraph "Service Layer"
        Gateway[API Gateway]
        RPC[RPC Service]
        Geyser[Geyser API]
        WS[WebSocket Server]
    end

    subgraph "Instrumentation"
        Logs[Structured Logging<br/>JSON Format]
        Metrics[Metrics Collection<br/>Prometheus]
        Traces[Distributed Tracing<br/>OpenTelemetry]
    end

    subgraph "Collection Layer"
        LogCollector[Log Aggregator<br/>Fluentd/Logstash]
        MetricsDB[(Prometheus<br/>Time-series DB)]
        TraceCollector[Trace Collector<br/>Jaeger]
    end

    subgraph "Storage Layer"
        LogStore[(Elasticsearch<br/>Log Storage)]
        MetricsStore[(Prometheus<br/>Long-term Storage)]
        TraceStore[(Jaeger Backend<br/>Trace Storage)]
    end

    subgraph "Analysis & Visualization"
        Grafana[Grafana Dashboards<br/>Real-time Metrics]
        Kibana[Kibana<br/>Log Analysis]
        Jaeger_UI[Jaeger UI<br/>Trace Visualization]
    end

    subgraph "Alerting"
        AlertManager[Alert Manager<br/>Prometheus]
        PagerDuty[PagerDuty<br/>On-call Alerts]
        Slack[Slack<br/>Team Notifications]
        Email[Email Alerts]
    end

    subgraph "External Monitoring"
        Uptime[Uptime Robot<br/>External Checks]
        StatusPage[Status Page<br/>status.fortiblox.com]
    end

    Gateway --> Logs
    Gateway --> Metrics
    Gateway --> Traces
    RPC --> Logs
    RPC --> Metrics
    RPC --> Traces
    Geyser --> Logs
    Geyser --> Metrics
    Geyser --> Traces
    WS --> Logs
    WS --> Metrics
    WS --> Traces

    Logs --> LogCollector
    Metrics --> MetricsDB
    Traces --> TraceCollector

    LogCollector --> LogStore
    MetricsDB --> MetricsStore
    TraceCollector --> TraceStore

    LogStore --> Kibana
    MetricsStore --> Grafana
    TraceStore --> Jaeger_UI

    MetricsDB --> AlertManager
    AlertManager --> PagerDuty
    AlertManager --> Slack
    AlertManager --> Email

    Uptime --> StatusPage
    AlertManager --> StatusPage

    style Grafana fill:#e74c3c,stroke:#c0392b,color:#fff
    style AlertManager fill:#f39c12,stroke:#d68910,color:#fff
    style StatusPage fill:#27ae60,stroke:#1e8449,color:#fff

Key Metrics Collected

Request Metrics:

// Prometheus metrics
http_requests_total{service="api-gateway", method="POST", endpoint="/rpc", status="200"}
http_request_duration_seconds{service="api-gateway", quantile="0.95"}
http_requests_in_flight{service="api-gateway"}

System Metrics:

CPU usage (per service)
Memory usage (per service)
Disk I/O (database servers)
Network throughput
Connection count (WebSocket)

Business Metrics:

API requests by tier
Credit consumption rate
Cache hit rate
RPC node health scores
Authentication success/failure rate

Error Metrics:

Error rate by service
Error types (401, 429, 500, etc.)
Failed RPC calls
WebSocket disconnections

Logging Strategy

Structured Logging Format:

{
  "timestamp": "2025-11-24T14:00:00.000Z",
  "level": "info",
  "service": "api-gateway",
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "apiKey": "fbx_xxx",
  "tier": "Business",
  "method": "POST",
  "endpoint": "/rpc",
  "rpcMethod": "getAccountInfo",
  "status": 200,
  "duration": 45,
  "cached": false,
  "rpcEndpoint": "http://validator1:8899",
  "credits": 1,
  "userAgent": "axios/1.6.0",
  "message": "RPC request completed successfully"
}

Log Levels:

DEBUG: Detailed debugging information
INFO: Normal operation logs (requests, responses)
WARN: Warning conditions (slow queries, cache misses)
ERROR: Error conditions (failed requests, timeouts)
FATAL: Critical errors (service crashes)

Retention:

DEBUG/INFO: 7 days
WARN: 30 days
ERROR/FATAL: 90 days

Distributed Tracing

Trace Spans:

Request: POST /rpc [Total: 45ms]
  ├─ authenticate [3ms]
  │   ├─ cache-lookup [1ms]
  │   └─ validate-permissions [2ms]
  ├─ rate-limit-check [2ms]
  ├─ cache-check [2ms]
  └─ rpc-call [38ms]
      ├─ load-balance [1ms]
      ├─ http-request [35ms]
      └─ cache-store [2ms]