FortiBlox LogoFortiBlox Docs

FortiBlox Nexus Architecture

Comprehensive system architecture diagrams for FortiBlox Nexus infrastructure platform

FortiBlox Nexus Architecture

This document provides detailed architectural diagrams and explanations of the FortiBlox Nexus infrastructure platform. Each diagram illustrates key components, data flows, and system interactions.

1. System Overview

The complete FortiBlox Nexus platform consists of multiple integrated services working together to provide high-performance X1 Blockchain infrastructure.

graph TB
    subgraph "Client Layer"
        Client[Client Applications]
        Browser[Web Browsers]
        Server[Server Apps]
    end

    subgraph "API Gateway Layer"
        Gateway[API Gateway<br/>Authentication<br/>Rate Limiting<br/>Routing]
    end

    subgraph "Service Layer"
        RPC[RPC Service<br/>Load Balancer]
        Geyser[Geyser REST API<br/>TimescaleDB Queries]
        WS[WebSocket Server<br/>Real-time Streaming]
    end

    subgraph "Data Layer"
        Cache[(Redis Cache<br/>5s-5min TTL)]
        DB[(TimescaleDB<br/>Historical Data)]
    end

    subgraph "Blockchain Layer"
        Validator1[X1 Validator 1<br/>RPC Endpoint]
        Validator2[X1 Validator 2<br/>RPC Endpoint]
        Validator3[X1 Validator 3<br/>RPC Endpoint]
        GeyserPlugin[Geyser Plugin<br/>Real-time Ingestion]
    end

    Client --> Gateway
    Browser --> Gateway
    Server --> Gateway

    Gateway --> RPC
    Gateway --> Geyser
    Gateway --> WS

    RPC --> Cache
    Geyser --> Cache

    Cache -.-> DB
    Geyser --> DB
    WS --> DB

    RPC --> Validator1
    RPC --> Validator2
    RPC --> Validator3

    GeyserPlugin --> DB
    Validator1 -.-> GeyserPlugin
    Validator2 -.-> GeyserPlugin
    Validator3 -.-> GeyserPlugin

    style Gateway fill:#4a90e2,stroke:#2e5c8a,color:#fff
    style Cache fill:#f39c12,stroke:#d68910,color:#fff
    style DB fill:#27ae60,stroke:#1e8449,color:#fff

Key Components

Client Layer:

  • Supports multiple client types: web browsers, mobile apps, server applications
  • All clients use HTTPS/WSS for secure communication
  • API keys authenticate every request

API Gateway:

  • Central entry point for all requests
  • Validates API keys and checks tier permissions
  • Enforces rate limits (10-1000+ req/s depending on tier)
  • Routes requests to appropriate services
  • Tracks credit consumption

Service Layer:

  • RPC Service: Intelligent load balancing across X1 validators
  • Geyser REST API: Historical data queries from TimescaleDB
  • WebSocket Server: Real-time streaming with pub/sub architecture

Data Layer:

  • Redis Cache: Multi-tier caching (5s-5min TTL based on data type)
  • TimescaleDB: Time-series optimized PostgreSQL for historical data

Blockchain Layer:

  • Multiple X1 validator nodes for redundancy
  • Geyser plugin captures real-time blockchain events
  • Automatic health monitoring and failover

Performance Characteristics

  • API Gateway Latency: <10ms overhead
  • Cache Hit Rate: 70-85% for typical workloads
  • RPC Latency: 40-100ms (including network)
  • Geyser API Latency: 30-80ms (cached), 100-300ms (database)
  • WebSocket Latency: <100ms event delivery

Scaling Strategy

  • Horizontal scaling of API Gateway (multiple instances)
  • Independent scaling of RPC, Geyser, and WebSocket services
  • Redis cluster for distributed caching
  • TimescaleDB read replicas for query scaling
  • Multi-region deployment for global availability

2. RPC Request Flow

Detailed flow of an RPC request through the FortiBlox Nexus infrastructure, showing authentication, routing, caching, and response.

sequenceDiagram
    participant C as Client
    participant G as API Gateway
    participant A as Auth Service
    participant RL as Rate Limiter
    participant Cache as Redis Cache
    participant LB as Load Balancer
    participant RPC1 as RPC Node 1
    participant RPC2 as RPC Node 2

    C->>G: POST /rpc<br/>X-API-Key: fbx_xxx<br/>method: getAccountInfo

    Note over G: Step 1: Authentication
    G->>A: Validate API Key
    A->>A: Check key validity<br/>Check tier permissions<br/>Check network access
    A-->>G: Auth Success<br/>Tier: Business<br/>Credits: 28.5M remaining

    Note over G: Step 2: Rate Limiting
    G->>RL: Check rate limit
    RL->>RL: Current: 45/200 req/s<br/>Allow request
    RL-->>G: Rate limit OK<br/>Remaining: 155

    Note over G: Step 3: Cache Check
    G->>Cache: Check cache<br/>Key: getAccountInfo:pubkey:confirmed
    alt Cache Hit
        Cache-->>G: Cached result (30s old)
        Note over G: Latency: ~5ms
        G-->>C: 200 OK<br/>X-Cache-Status: HIT<br/>X-RPC-Latency-Ms: 5<br/>Result
    else Cache Miss
        Cache-->>G: No cached result

        Note over G: Step 4: Node Selection
        G->>LB: Select healthy RPC node
        LB->>LB: Health scores:<br/>Node 1: 95%<br/>Node 2: 88%<br/>Node 3: 20% (skip)
        LB-->>G: Route to Node 1

        Note over G: Step 5: RPC Call
        G->>RPC1: Forward RPC request
        Note over RPC1: Process request<br/>Latency: 45ms
        RPC1-->>G: RPC Response

        Note over G: Step 6: Cache Result
        G->>Cache: Store result<br/>TTL: 30s
        Cache-->>G: Cached

        Note over G: Step 7: Deduct Credits
        G->>G: Deduct 1 credit<br/>Update usage metrics

        Note over G: Total Latency: ~55ms
        G-->>C: 200 OK<br/>X-Cache-Status: MISS<br/>X-RPC-Endpoint: node1<br/>X-RPC-Latency-Ms: 55<br/>X-RateLimit-Remaining: 155<br/>Result
    end

Latency Breakdown (Cache Miss)

StepComponentTypical Latency
1. AuthenticationAuth Service2-5ms
2. Rate LimitingRedis1-3ms
3. Cache CheckRedis1-2ms
4. Node SelectionLoad Balancer<1ms
5. RPC CallX1 Validator40-80ms
6. Cache StorageRedis1-2ms
Total45-95ms

Latency Breakdown (Cache Hit)

StepComponentTypical Latency
1. AuthenticationAuth Service2-5ms
2. Rate LimitingRedis1-3ms
3. Cache RetrievalRedis1-2ms
Total4-10ms

Credit Consumption

Different methods consume different amounts of credits:

  • Standard methods (getHealth, getSlot, getBalance): 1 credit
  • Heavy methods (getBlock, getTransaction, getProgramAccounts): 5 credits
  • Transaction submission (sendTransaction, simulateTransaction): 10 credits

Error Handling

The system handles failures gracefully:

  1. Invalid API Key: Return 401 immediately (no further processing)
  2. Rate Limit Exceeded: Return 429 with Retry-After header
  3. RPC Node Down: Automatic failover to healthy node
  4. All Nodes Down: Return 503 Service Unavailable
  5. Timeout: Return 504 Gateway Timeout after 30s

3. Geyser Streaming Architecture

The Geyser system captures real-time blockchain data and stores it in TimescaleDB for fast historical queries.

graph TB
    subgraph "X1 Blockchain Network"
        V1[Validator 1<br/>Block Production]
        V2[Validator 2<br/>Block Production]
        V3[Validator 3<br/>Block Production]
    end

    subgraph "Geyser Plugin Layer"
        GP1[Geyser Plugin 1<br/>Validator 1]
        GP2[Geyser Plugin 2<br/>Validator 2]
        GP3[Geyser Plugin 3<br/>Validator 3]
    end

    subgraph "Data Ingestion Pipeline"
        Queue[Message Queue<br/>Kafka/Redis Streams]
        Processor[Stream Processor<br/>Deduplication<br/>Transformation]
    end

    subgraph "Storage Layer"
        TSDB[(TimescaleDB<br/>Hypertables)]

        subgraph "Tables"
            TXTable[Transactions<br/>Indexed by signature]
            BlockTable[Blocks<br/>Indexed by slot]
            AccountTable[Accounts<br/>Indexed by address]
            TokenTable[Token Metadata<br/>Indexed by mint]
        end
    end

    subgraph "Query Layer"
        Cache2[(Redis Cache<br/>Query Results)]
        GeyserAPI[Geyser REST API<br/>Complex Queries]
    end

    subgraph "Optimization Layer"
        Materialized[Materialized Views<br/>Pre-aggregated Stats]
        Indexes[Hypertable Indexes<br/>Time + Attributes]
    end

    V1 --> GP1
    V2 --> GP2
    V3 --> GP3

    GP1 --> Queue
    GP2 --> Queue
    GP3 --> Queue

    Queue --> Processor

    Processor --> TXTable
    Processor --> BlockTable
    Processor --> AccountTable
    Processor --> TokenTable

    TXTable --> TSDB
    BlockTable --> TSDB
    AccountTable --> TSDB
    TokenTable --> TSDB

    TSDB --> Materialized
    TSDB --> Indexes

    GeyserAPI --> Cache2
    Cache2 -.->|Cache Miss| TSDB
    Materialized --> GeyserAPI
    Indexes --> GeyserAPI

    style Queue fill:#9b59b6,stroke:#7d3c98,color:#fff
    style TSDB fill:#27ae60,stroke:#1e8449,color:#fff
    style Cache2 fill:#f39c12,stroke:#d68910,color:#fff

Data Flow

  1. Capture: Geyser plugins on validators capture every transaction, block, and account update
  2. Queue: Events sent to message queue for reliable delivery
  3. Process: Stream processor deduplicates, validates, and transforms data
  4. Store: Data inserted into TimescaleDB hypertables with time-series optimization
  5. Index: Automatic indexing on time, signature, address, program ID
  6. Query: REST API serves queries with intelligent caching

Real-time vs Historical Data

Real-time (Processed/Confirmed):

  • Latency: 200-400ms behind blockchain
  • Commitment: processed or confirmed
  • Use case: Live dashboards, real-time monitoring
  • Cache TTL: 5-10 seconds

Historical (Finalized):

  • Latency: 30-45 seconds behind blockchain
  • Commitment: finalized
  • Use case: Analytics, permanent records
  • Cache TTL: 5 minutes

Query Optimization

1. Time-series Partitioning: TimescaleDB automatically partitions data by time (1-day chunks) for fast queries on recent data.

2. Materialized Views: Pre-aggregated statistics updated every minute:

  • Transactions per block
  • Validator statistics
  • Token transfer volumes
  • Program usage metrics

3. Multi-level Caching:

  • L1: Redis cache (70-85% hit rate)
  • L2: TimescaleDB query cache
  • L3: Materialized views

4. Indexes:

  • Time-based index (primary)
  • Signature hash index (unique lookups)
  • Account address index (account history)
  • Program ID index (program activity)
  • Composite indexes for common queries

Performance Characteristics

  • Ingestion Rate: 10,000+ events/second
  • Write Latency: <100ms from blockchain to database
  • Query Latency: 30-80ms (cached), 100-300ms (database)
  • Storage Growth: ~50GB/month for mainnet
  • Retention: Unlimited (all tiers include full history)

Scaling Approach

  • Write Scaling: Multiple Geyser plugins + message queue buffering
  • Read Scaling: TimescaleDB read replicas + Redis cache cluster
  • Storage Scaling: Automatic compression + continuous aggregates
  • Geographic Scaling: Regional read replicas for global access

4. WebSocket Architecture

Real-time streaming architecture using WebSocket for low-latency event delivery to connected clients.

graph TB
    subgraph "Client Connections"
        C1[Client 1<br/>WSS Connection]
        C2[Client 2<br/>WSS Connection]
        C3[Client 3<br/>WSS Connection]
        CN[Client N<br/>WSS Connection]
    end

    subgraph "WebSocket Server Cluster"
        WS1[WS Server 1<br/>Node.js/Go]
        WS2[WS Server 2<br/>Node.js/Go]
        WS3[WS Server 3<br/>Node.js/Go]
    end

    subgraph "Subscription Management"
        SubMgr[Subscription Manager<br/>Redis Pub/Sub]

        subgraph "Channels"
            TxChannel[transactions<br/>Channel]
            BlockChannel[blocks<br/>Channel]
            AccountChannel[accounts<br/>Channel]
            SlotChannel[slots<br/>Channel]
        end
    end

    subgraph "Event Sources"
        GeyserStream[Geyser Plugin<br/>Real-time Events]
        DBStream[TimescaleDB<br/>LISTEN/NOTIFY]
    end

    subgraph "Load Balancer"
        LB[Load Balancer<br/>Sticky Sessions]
    end

    C1 -.-> LB
    C2 -.-> LB
    C3 -.-> LB
    CN -.-> LB

    LB --> WS1
    LB --> WS2
    LB --> WS3

    WS1 --> SubMgr
    WS2 --> SubMgr
    WS3 --> SubMgr

    SubMgr --> TxChannel
    SubMgr --> BlockChannel
    SubMgr --> AccountChannel
    SubMgr --> SlotChannel

    GeyserStream --> TxChannel
    GeyserStream --> BlockChannel
    GeyserStream --> AccountChannel
    GeyserStream --> SlotChannel

    DBStream -.-> SubMgr

    style SubMgr fill:#e74c3c,stroke:#c0392b,color:#fff
    style LB fill:#3498db,stroke:#2874a6,color:#fff

Connection Lifecycle

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant WS as WebSocket Server
    participant Auth as Auth Service
    participant Sub as Subscription Manager
    participant Stream as Event Stream

    C->>LB: WSS Connection<br/>?api-key=fbx_xxx
    LB->>WS: Route to WS Server

    WS->>Auth: Validate API Key
    Auth->>Auth: Check tier limits<br/>Check connection count
    Auth-->>WS: Auth Success<br/>Max Connections: 5<br/>Current: 2/5

    WS-->>C: Connection Established<br/>type: connected

    Note over C: Subscribe to events
    C->>WS: {action: "subscribe"<br/>channel: "transactions"<br/>filters: {...}}

    WS->>Sub: Register subscription<br/>Client ID: c1234<br/>Channel: transactions
    Sub->>Sub: Add to channel<br/>Apply filters
    Sub-->>WS: Subscription active

    WS-->>C: {type: "subscribed"<br/>channel: "transactions"<br/>subscriptionId: "sub_xxx"}

    Note over Stream: New transaction occurs
    Stream->>Sub: Publish transaction event
    Sub->>Sub: Match filters<br/>Find subscribers
    Sub->>WS: Send to matching clients
    WS->>C: {type: "transaction"<br/>data: {...}}

    Note over C: Heartbeat every 30s
    C->>WS: {action: "ping"}
    WS-->>C: {type: "pong"}

    Note over C: Unsubscribe
    C->>WS: {action: "unsubscribe"<br/>subscriptionId: "sub_xxx"}
    WS->>Sub: Remove subscription
    Sub-->>WS: Unsubscribed
    WS-->>C: {type: "unsubscribed"}

    C->>WS: Close connection
    WS->>Sub: Remove all subscriptions
    WS-->>C: Connection closed

Fan-out Architecture

Each WebSocket server handles 1,000-5,000 concurrent connections:

Single Event Processing:

  1. Geyser plugin emits transaction event
  2. Event published to Redis Pub/Sub channel
  3. All WS servers subscribed to channel receive event
  4. Each WS server filters event against client subscriptions
  5. Matching events sent to connected clients

Filtering Strategies:

  • Server-side filtering reduces bandwidth
  • Client-specific filters (account, program, commitment)
  • Subscription multiplexing (one connection, many subscriptions)

Scaling Strategy

Horizontal Scaling:

  • Multiple WS server instances behind load balancer
  • Redis Pub/Sub broadcasts to all servers
  • Each server handles subset of connections
  • Sticky sessions keep clients connected to same server

Connection Distribution:

Load Balancer
├── WS Server 1: 2,500 connections
├── WS Server 2: 2,500 connections
├── WS Server 3: 2,500 connections
└── WS Server 4: 2,500 connections
Total: 10,000 concurrent connections

Tier Limits:

TierMax ConnectionsMessages/Second
Free5100
Developer5500
Business2502,000
Professional2505,000
EnterpriseCustomCustom

Performance Characteristics

  • Connection Latency: <200ms to establish
  • Event Latency: <100ms from blockchain to client
  • Message Rate: 100-5,000 msg/s per connection (tier-dependent)
  • Bandwidth: ~5KB/message average
  • Memory: ~50KB per connection
  • CPU: Low (event-driven architecture)

Reliability Features

1. Automatic Reconnection: Clients implement exponential backoff for reconnection

2. Heartbeat/Ping-Pong: 30-second heartbeat detects stale connections

3. Message Acknowledgment: Critical messages require client acknowledgment

4. Connection Recovery: Clients can resume subscriptions after reconnection

5. Graceful Degradation: Rate limiting prevents overwhelming clients


5. Authentication Flow

Comprehensive authentication and authorization flow showing API key validation, tier checking, and credit tracking.

graph TB
    Start([Client Request]) --> Auth{Valid API Key?}

    Auth -->|No| AuthErr[401 Unauthorized<br/>error: INVALID_API_KEY]
    Auth -->|Yes| Active{Key Active?}

    Active -->|No| ActiveErr[401 Unauthorized<br/>error: KEY_INACTIVE]
    Active -->|Yes| CheckCache{Check Redis Cache}

    CheckCache -->|Cache Hit| LoadCache[Load Tier Info<br/>from Cache]
    CheckCache -->|Cache Miss| LoadDB[Load from Database<br/>Cache for 5min]

    LoadCache --> ValidateTier
    LoadDB --> ValidateTier

    ValidateTier{Validate Tier<br/>Permissions}

    ValidateTier -->|Network Restricted| CheckNetwork{Network<br/>Allowed?}
    CheckNetwork -->|No| NetworkErr[403 Forbidden<br/>error: NETWORK_NOT_ALLOWED]
    CheckNetwork -->|Yes| CheckDomain

    ValidateTier -->|Domain Restricted| CheckDomain{Domain<br/>Allowed?}
    CheckDomain -->|No| DomainErr[403 Forbidden<br/>error: DOMAIN_NOT_ALLOWED]
    CheckDomain -->|Yes| CheckIP

    ValidateTier -->|IP Restricted| CheckIP{IP Address<br/>Allowed?}
    CheckIP -->|No| IPErr[403 Forbidden<br/>error: IP_NOT_ALLOWED]
    CheckIP -->|Yes| RateLimit

    ValidateTier -->|No Restrictions| RateLimit{Rate Limit<br/>Check}

    RateLimit -->|Exceeded| RateLimitErr[429 Too Many Requests<br/>error: RATE_LIMIT_EXCEEDED<br/>retry_after: Xs]
    RateLimit -->|OK| Credits{Sufficient<br/>Credits?}

    Credits -->|No| CreditsErr[402 Payment Required<br/>error: INSUFFICIENT_CREDITS]
    Credits -->|Yes| Success[Request Allowed<br/>Proceed to Service]

    Success --> Deduct[Deduct Credits<br/>Update Usage]
    Deduct --> Response[Process Request<br/>Return Response]

    AuthErr --> End([End])
    ActiveErr --> End
    NetworkErr --> End
    DomainErr --> End
    IPErr --> End
    RateLimitErr --> End
    CreditsErr --> End
    Response --> End

    style Success fill:#27ae60,stroke:#1e8449,color:#fff
    style AuthErr fill:#e74c3c,stroke:#c0392b,color:#fff
    style ActiveErr fill:#e74c3c,stroke:#c0392b,color:#fff
    style NetworkErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style DomainErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style IPErr fill:#e67e22,stroke:#ca6f1e,color:#fff
    style RateLimitErr fill:#f39c12,stroke:#d68910,color:#fff
    style CreditsErr fill:#e74c3c,stroke:#c0392b,color:#fff

Authentication Steps

1. API Key Validation (2-5ms)

  • Extract API key from header (X-API-Key or Authorization Bearer)
  • Validate format (fbx_xxx pattern)
  • Check key exists in system
  • Verify key is active (not revoked/expired)

2. Tier Information Loading (1-3ms)

  • Check Redis cache for tier info (5-minute TTL)
  • On cache miss, load from database
  • Cache includes: tier level, rate limits, credit balance, restrictions

3. Access Control Validation (1-2ms)

  • Network Restrictions: Verify requested network allowed (mainnet/devnet/testnet)
  • Domain Restrictions: Check Origin/Referer header against whitelist
  • IP Restrictions: Validate source IP against allowed ranges

4. Rate Limiting (1-3ms)

  • Check current request rate in Redis
  • Sliding window algorithm (per-second buckets)
  • Allow burst up to 2x limit for short periods
  • Return 429 if exceeded with Retry-After header

5. Credit Checking (1-2ms)

  • Load current credit balance from cache
  • Calculate request cost based on method
  • Verify sufficient credits available
  • Return 402 if insufficient

6. Credit Deduction (1-2ms)

  • Deduct credits from balance
  • Update usage metrics
  • Log usage for billing

Redis Cache Strategy

Cached Data (5-minute TTL):

{
  "apiKey": "fbx_xxx",
  "tier": "Business",
  "rateLimit": {
    "requestsPerSecond": 200,
    "burstLimit": 400
  },
  "credits": {
    "total": 30000000,
    "used": 1500000,
    "remaining": 28500000
  },
  "restrictions": {
    "networks": ["mainnet", "devnet"],
    "domains": ["https://myapp.com"],
    "ipRanges": ["203.0.113.0/24"]
  },
  "metadata": {
    "userId": "user_123",
    "keyName": "Production API Key",
    "createdAt": "2025-11-01T00:00:00Z"
  }
}

Rate Limiting Algorithm

Sliding Window:

// Redis keys for rate limiting
const key = `ratelimit:${apiKey}:${currentSecond}`;
const count = await redis.incr(key);
await redis.expire(key, 60); // Keep for 60 seconds

if (count > rateLimit.requestsPerSecond) {
  // Check burst allowance
  const windowTotal = await sumPreviousSeconds(apiKey, 10);
  const avgRate = windowTotal / 10;

  if (avgRate > rateLimit.requestsPerSecond && count > rateLimit.burstLimit) {
    throw new RateLimitError();
  }
}

Rate Limit Headers:

X-RateLimit-Limit: 200
X-RateLimit-Remaining: 187
X-RateLimit-Reset: 1700000000
X-RateLimit-Burst: 400

Credit Tracking

Credit Costs by Method:

CategoryMethodsCredits
StandardgetHealth, getSlot, getBalance1
HeavygetBlock, getTransaction, getProgramAccounts5
TransactionsendTransaction, simulateTransaction10
Geyser Simple/transaction/:sig, /block/:slot5
Geyser List/transactions, /blocks10
Geyser SearchComplex queries with filters25

Usage Tracking:

  • Real-time credit deduction
  • Usage metrics logged to TimescaleDB
  • Daily usage summaries
  • Billing calculated from usage logs

6. Multi-Region Setup

FortiBlox Nexus supports multi-region deployment for global availability and low-latency access.

graph TB
    subgraph "Global DNS"
        DNS[Route 53<br/>GeoDNS Routing]
    end

    subgraph "Region: US East"
        USE_LB[Load Balancer<br/>us-east-1]
        USE_Gateway[API Gateway]
        USE_RPC[RPC Service]
        USE_Geyser[Geyser API]
        USE_WS[WebSocket Server]
        USE_Redis[(Redis Primary)]
        USE_TSDB[(TimescaleDB Primary)]
        USE_Validators[X1 Validators<br/>3 nodes]
    end

    subgraph "Region: EU West"
        EUW_LB[Load Balancer<br/>eu-west-1]
        EUW_Gateway[API Gateway]
        EUW_RPC[RPC Service]
        EUW_Geyser[Geyser API]
        EUW_WS[WebSocket Server]
        EUW_Redis[(Redis Replica)]
        EUW_TSDB[(TimescaleDB Replica)]
        EUW_Validators[X1 Validators<br/>2 nodes]
    end

    subgraph "Region: Asia Pacific"
        AP_LB[Load Balancer<br/>ap-southeast-1]
        AP_Gateway[API Gateway]
        AP_RPC[RPC Service]
        AP_Geyser[Geyser API]
        AP_WS[WebSocket Server]
        AP_Redis[(Redis Replica)]
        AP_TSDB[(TimescaleDB Replica)]
        AP_Validators[X1 Validators<br/>2 nodes]
    end

    subgraph "Monitoring & Control"
        Monitor[Global Monitoring<br/>Health Checks]
        Failover[Failover Controller<br/>Automatic Routing]
    end

    DNS --> USE_LB
    DNS --> EUW_LB
    DNS --> AP_LB

    USE_LB --> USE_Gateway
    USE_Gateway --> USE_RPC
    USE_Gateway --> USE_Geyser
    USE_Gateway --> USE_WS

    USE_RPC --> USE_Redis
    USE_Geyser --> USE_Redis
    USE_Geyser --> USE_TSDB
    USE_RPC --> USE_Validators

    EUW_LB --> EUW_Gateway
    EUW_Gateway --> EUW_RPC
    EUW_Gateway --> EUW_Geyser
    EUW_Gateway --> EUW_WS

    EUW_RPC --> EUW_Redis
    EUW_Geyser --> EUW_Redis
    EUW_Geyser --> EUW_TSDB
    EUW_RPC --> EUW_Validators

    AP_LB --> AP_Gateway
    AP_Gateway --> AP_RPC
    AP_Gateway --> AP_Geyser
    AP_Gateway --> AP_WS

    AP_RPC --> AP_Redis
    AP_Geyser --> AP_Redis
    AP_Geyser --> AP_TSDB
    AP_RPC --> AP_Validators

    USE_Redis -.->|Replication| EUW_Redis
    USE_Redis -.->|Replication| AP_Redis
    USE_TSDB -.->|Replication| EUW_TSDB
    USE_TSDB -.->|Replication| AP_TSDB

    Monitor --> USE_LB
    Monitor --> EUW_LB
    Monitor --> AP_LB

    Failover --> DNS
    Monitor --> Failover

    style DNS fill:#3498db,stroke:#2874a6,color:#fff
    style Monitor fill:#e74c3c,stroke:#c0392b,color:#fff

Geographic Routing

DNS-based Routing:

  • Route 53 GeoDNS routes users to nearest region
  • Latency-based routing for optimal performance
  • Health checks ensure region availability

Latency Benefits:

Client RegionUS EastEU WestAsia Pacific
US East10-20ms80-100ms150-200ms
EU West80-100ms10-20ms120-180ms
Asia Pacific150-200ms120-180ms10-20ms

Data Replication

Redis Cache Replication:

  • Master-replica replication (async)
  • 10-50ms replication lag
  • Read from local replica
  • Writes to primary only (auth, rate limiting)

TimescaleDB Replication:

  • Streaming replication (async)
  • 1-5 second replication lag
  • Read replicas for query distribution
  • Primary in US East for writes

Consistency Model:

  • Authentication: Eventual consistency (5-min cache)
  • Rate Limiting: Per-region enforcement (may exceed global limit slightly)
  • Historical Data: Eventual consistency (1-5s lag)
  • Real-time Data: Direct from validators in each region

Failover Logic

Health Monitoring:

// Health check every 10 seconds
const healthCheck = {
  endpoint: "https://us-east.nexus.fortiblox.com/health",
  interval: 10000,
  timeout: 5000,
  consecutiveFailures: 3
};

// Failover trigger
if (consecutiveFailures >= 3) {
  // Remove from DNS
  // Route traffic to healthy regions
  // Alert operations team
}

Automatic Failover:

  1. Health check detects region failure (3 consecutive failures)
  2. Remove failing region from DNS
  3. Traffic redistributed to healthy regions
  4. Operations team alerted
  5. Automatic recovery when health restored

Manual Failover:

  • Operations dashboard for manual control
  • Gradual traffic shifting (0% → 25% → 50% → 100%)
  • Rollback capability

Load Distribution

Normal Operation:

  • US East: 50% traffic (largest user base)
  • EU West: 30% traffic
  • Asia Pacific: 20% traffic

During US East Failure:

  • EU West: 60% traffic (auto-scaled)
  • Asia Pacific: 40% traffic (auto-scaled)
  • Automatic capacity scaling in remaining regions

Scaling Strategy

Regional Auto-scaling:

  • Monitor CPU, memory, request rate
  • Scale API Gateway: 2-10 instances per region
  • Scale RPC Service: 2-8 instances per region
  • Scale WebSocket: 2-6 instances per region

Cross-region Scaling:

  • Primary region handles 80% of capacity
  • Secondary regions can handle 150% during failover

7. Monitoring & Observability

Comprehensive monitoring and observability infrastructure for FortiBlox Nexus platform.

graph TB
    subgraph "Service Layer"
        Gateway[API Gateway]
        RPC[RPC Service]
        Geyser[Geyser API]
        WS[WebSocket Server]
    end

    subgraph "Instrumentation"
        Logs[Structured Logging<br/>JSON Format]
        Metrics[Metrics Collection<br/>Prometheus]
        Traces[Distributed Tracing<br/>OpenTelemetry]
    end

    subgraph "Collection Layer"
        LogCollector[Log Aggregator<br/>Fluentd/Logstash]
        MetricsDB[(Prometheus<br/>Time-series DB)]
        TraceCollector[Trace Collector<br/>Jaeger]
    end

    subgraph "Storage Layer"
        LogStore[(Elasticsearch<br/>Log Storage)]
        MetricsStore[(Prometheus<br/>Long-term Storage)]
        TraceStore[(Jaeger Backend<br/>Trace Storage)]
    end

    subgraph "Analysis & Visualization"
        Grafana[Grafana Dashboards<br/>Real-time Metrics]
        Kibana[Kibana<br/>Log Analysis]
        Jaeger_UI[Jaeger UI<br/>Trace Visualization]
    end

    subgraph "Alerting"
        AlertManager[Alert Manager<br/>Prometheus]
        PagerDuty[PagerDuty<br/>On-call Alerts]
        Slack[Slack<br/>Team Notifications]
        Email[Email Alerts]
    end

    subgraph "External Monitoring"
        Uptime[Uptime Robot<br/>External Checks]
        StatusPage[Status Page<br/>status.fortiblox.com]
    end

    Gateway --> Logs
    Gateway --> Metrics
    Gateway --> Traces
    RPC --> Logs
    RPC --> Metrics
    RPC --> Traces
    Geyser --> Logs
    Geyser --> Metrics
    Geyser --> Traces
    WS --> Logs
    WS --> Metrics
    WS --> Traces

    Logs --> LogCollector
    Metrics --> MetricsDB
    Traces --> TraceCollector

    LogCollector --> LogStore
    MetricsDB --> MetricsStore
    TraceCollector --> TraceStore

    LogStore --> Kibana
    MetricsStore --> Grafana
    TraceStore --> Jaeger_UI

    MetricsDB --> AlertManager
    AlertManager --> PagerDuty
    AlertManager --> Slack
    AlertManager --> Email

    Uptime --> StatusPage
    AlertManager --> StatusPage

    style Grafana fill:#e74c3c,stroke:#c0392b,color:#fff
    style AlertManager fill:#f39c12,stroke:#d68910,color:#fff
    style StatusPage fill:#27ae60,stroke:#1e8449,color:#fff

Key Metrics Collected

Request Metrics:

// Prometheus metrics
http_requests_total{service="api-gateway", method="POST", endpoint="/rpc", status="200"}
http_request_duration_seconds{service="api-gateway", quantile="0.95"}
http_requests_in_flight{service="api-gateway"}

System Metrics:

  • CPU usage (per service)
  • Memory usage (per service)
  • Disk I/O (database servers)
  • Network throughput
  • Connection count (WebSocket)

Business Metrics:

  • API requests by tier
  • Credit consumption rate
  • Cache hit rate
  • RPC node health scores
  • Authentication success/failure rate

Error Metrics:

  • Error rate by service
  • Error types (401, 429, 500, etc.)
  • Failed RPC calls
  • WebSocket disconnections

Logging Strategy

Structured Logging Format:

{
  "timestamp": "2025-11-24T14:00:00.000Z",
  "level": "info",
  "service": "api-gateway",
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "apiKey": "fbx_xxx",
  "tier": "Business",
  "method": "POST",
  "endpoint": "/rpc",
  "rpcMethod": "getAccountInfo",
  "status": 200,
  "duration": 45,
  "cached": false,
  "rpcEndpoint": "http://validator1:8899",
  "credits": 1,
  "userAgent": "axios/1.6.0",
  "message": "RPC request completed successfully"
}

Log Levels:

  • DEBUG: Detailed debugging information
  • INFO: Normal operation logs (requests, responses)
  • WARN: Warning conditions (slow queries, cache misses)
  • ERROR: Error conditions (failed requests, timeouts)
  • FATAL: Critical errors (service crashes)

Retention:

  • DEBUG/INFO: 7 days
  • WARN: 30 days
  • ERROR/FATAL: 90 days

Distributed Tracing

Trace Spans:

Request: POST /rpc [Total: 45ms]
  ├─ authenticate [3ms]
  │   ├─ cache-lookup [1ms]
  │   └─ validate-permissions [2ms]
  ├─ rate-limit-check [2ms]
  ├─ cache-check [2ms]
  └─ rpc-call [38ms]
      ├─ load-balance [1ms]
      ├─ http-request [35ms]
      └─ cache-store [2ms]

Trace Context Propagation:

  • Unique trace ID generated for each request
  • Propagated across all services
  • Stored in logs for correlation
  • Visualized in Jaeger UI

Alerting Rules

Critical Alerts (PagerDuty):

  • Service down (all instances)
  • Error rate >5% for 5 minutes
  • Response time >1000ms (p95) for 5 minutes
  • Database connection failures
  • All RPC nodes unhealthy

Warning Alerts (Slack):

  • Error rate >1% for 10 minutes
  • Response time >500ms (p95) for 10 minutes
  • Cache hit rate <50% for 15 minutes
  • Single RPC node unhealthy
  • High memory usage (>80%)

Info Alerts (Email):

  • Unusual traffic patterns
  • New API key created
  • Tier upgrade/downgrade
  • Daily usage report

Dashboard Examples

1. Service Health Dashboard:

  • Request rate (per service)
  • Error rate (per service)
  • Response time (p50, p95, p99)
  • Uptime percentage (24h, 7d, 30d)

2. Infrastructure Dashboard:

  • CPU/Memory usage
  • Disk usage
  • Network throughput
  • Database connections

3. Business Metrics Dashboard:

  • Requests by tier
  • Credit consumption
  • Top users by usage
  • Revenue metrics

4. RPC Node Dashboard:

  • Health scores per node
  • Request distribution
  • Response times per node
  • Error rates per node

Performance Characteristics

  • Metrics Collection Overhead: <1% CPU
  • Log Collection Overhead: <2% CPU
  • Trace Collection Overhead: <3% CPU
  • Metrics Retention: 90 days (5-minute resolution)
  • Log Retention: 7-90 days (based on level)
  • Trace Retention: 7 days (sampled)

External Monitoring

Uptime Robot Checks (every 5 minutes):

  • HTTPS endpoint health
  • WebSocket connectivity
  • DNS resolution
  • SSL certificate validity

Status Page Components:

  • API Gateway (US East, EU West, Asia Pacific)
  • RPC Nodes (mainnet, devnet, testnet)
  • Geyser API
  • WebSocket Streaming
  • Database Services

Summary

FortiBlox Nexus is a comprehensive X1 Blockchain infrastructure platform built with:

  • High Performance: Sub-100ms latency for most operations
  • High Availability: Multi-region deployment with automatic failover
  • Scalability: Horizontal scaling of all components
  • Reliability: 99.9% uptime SLA with comprehensive monitoring
  • Security: Multi-layer authentication and authorization
  • Developer Experience: Simple APIs with detailed documentation

Key Technologies

  • API Gateway: Node.js/Express or Go
  • RPC Load Balancer: Custom Go service
  • Geyser API: Node.js/Express with TypeScript
  • WebSocket Server: Node.js/ws or Go
  • Caching: Redis (cluster mode)
  • Database: TimescaleDB (PostgreSQL extension)
  • Monitoring: Prometheus + Grafana
  • Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
  • Tracing: OpenTelemetry + Jaeger

Architecture Principles

  1. Microservices: Independent services for each function
  2. Stateless Services: All state in Redis/TimescaleDB
  3. Horizontal Scaling: Scale by adding more instances
  4. Defense in Depth: Multiple layers of security
  5. Observability First: Comprehensive logging, metrics, and tracing
  6. Graceful Degradation: Continue operating during partial failures
  7. Cost Optimization: Intelligent caching reduces infrastructure costs

Next Steps