FortiBlox Nexus Architecture
Comprehensive system architecture diagrams for FortiBlox Nexus infrastructure platform
FortiBlox Nexus Architecture
This document provides detailed architectural diagrams and explanations of the FortiBlox Nexus infrastructure platform. Each diagram illustrates key components, data flows, and system interactions.
1. System Overview
The complete FortiBlox Nexus platform consists of multiple integrated services working together to provide high-performance X1 Blockchain infrastructure.
graph TB
subgraph "Client Layer"
Client[Client Applications]
Browser[Web Browsers]
Server[Server Apps]
end
subgraph "API Gateway Layer"
Gateway[API Gateway<br/>Authentication<br/>Rate Limiting<br/>Routing]
end
subgraph "Service Layer"
RPC[RPC Service<br/>Load Balancer]
Geyser[Geyser REST API<br/>TimescaleDB Queries]
WS[WebSocket Server<br/>Real-time Streaming]
end
subgraph "Data Layer"
Cache[(Redis Cache<br/>5s-5min TTL)]
DB[(TimescaleDB<br/>Historical Data)]
end
subgraph "Blockchain Layer"
Validator1[X1 Validator 1<br/>RPC Endpoint]
Validator2[X1 Validator 2<br/>RPC Endpoint]
Validator3[X1 Validator 3<br/>RPC Endpoint]
GeyserPlugin[Geyser Plugin<br/>Real-time Ingestion]
end
Client --> Gateway
Browser --> Gateway
Server --> Gateway
Gateway --> RPC
Gateway --> Geyser
Gateway --> WS
RPC --> Cache
Geyser --> Cache
Cache -.-> DB
Geyser --> DB
WS --> DB
RPC --> Validator1
RPC --> Validator2
RPC --> Validator3
GeyserPlugin --> DB
Validator1 -.-> GeyserPlugin
Validator2 -.-> GeyserPlugin
Validator3 -.-> GeyserPlugin
style Gateway fill:#4a90e2,stroke:#2e5c8a,color:#fff
style Cache fill:#f39c12,stroke:#d68910,color:#fff
style DB fill:#27ae60,stroke:#1e8449,color:#fffKey Components
Client Layer:
- Supports multiple client types: web browsers, mobile apps, server applications
- All clients use HTTPS/WSS for secure communication
- API keys authenticate every request
API Gateway:
- Central entry point for all requests
- Validates API keys and checks tier permissions
- Enforces rate limits (10-1000+ req/s depending on tier)
- Routes requests to appropriate services
- Tracks credit consumption
Service Layer:
- RPC Service: Intelligent load balancing across X1 validators
- Geyser REST API: Historical data queries from TimescaleDB
- WebSocket Server: Real-time streaming with pub/sub architecture
Data Layer:
- Redis Cache: Multi-tier caching (5s-5min TTL based on data type)
- TimescaleDB: Time-series optimized PostgreSQL for historical data
Blockchain Layer:
- Multiple X1 validator nodes for redundancy
- Geyser plugin captures real-time blockchain events
- Automatic health monitoring and failover
Performance Characteristics
- API Gateway Latency: <10ms overhead
- Cache Hit Rate: 70-85% for typical workloads
- RPC Latency: 40-100ms (including network)
- Geyser API Latency: 30-80ms (cached), 100-300ms (database)
- WebSocket Latency: <100ms event delivery
Scaling Strategy
- Horizontal scaling of API Gateway (multiple instances)
- Independent scaling of RPC, Geyser, and WebSocket services
- Redis cluster for distributed caching
- TimescaleDB read replicas for query scaling
- Multi-region deployment for global availability
2. RPC Request Flow
Detailed flow of an RPC request through the FortiBlox Nexus infrastructure, showing authentication, routing, caching, and response.
sequenceDiagram
participant C as Client
participant G as API Gateway
participant A as Auth Service
participant RL as Rate Limiter
participant Cache as Redis Cache
participant LB as Load Balancer
participant RPC1 as RPC Node 1
participant RPC2 as RPC Node 2
C->>G: POST /rpc<br/>X-API-Key: fbx_xxx<br/>method: getAccountInfo
Note over G: Step 1: Authentication
G->>A: Validate API Key
A->>A: Check key validity<br/>Check tier permissions<br/>Check network access
A-->>G: Auth Success<br/>Tier: Business<br/>Credits: 28.5M remaining
Note over G: Step 2: Rate Limiting
G->>RL: Check rate limit
RL->>RL: Current: 45/200 req/s<br/>Allow request
RL-->>G: Rate limit OK<br/>Remaining: 155
Note over G: Step 3: Cache Check
G->>Cache: Check cache<br/>Key: getAccountInfo:pubkey:confirmed
alt Cache Hit
Cache-->>G: Cached result (30s old)
Note over G: Latency: ~5ms
G-->>C: 200 OK<br/>X-Cache-Status: HIT<br/>X-RPC-Latency-Ms: 5<br/>Result
else Cache Miss
Cache-->>G: No cached result
Note over G: Step 4: Node Selection
G->>LB: Select healthy RPC node
LB->>LB: Health scores:<br/>Node 1: 95%<br/>Node 2: 88%<br/>Node 3: 20% (skip)
LB-->>G: Route to Node 1
Note over G: Step 5: RPC Call
G->>RPC1: Forward RPC request
Note over RPC1: Process request<br/>Latency: 45ms
RPC1-->>G: RPC Response
Note over G: Step 6: Cache Result
G->>Cache: Store result<br/>TTL: 30s
Cache-->>G: Cached
Note over G: Step 7: Deduct Credits
G->>G: Deduct 1 credit<br/>Update usage metrics
Note over G: Total Latency: ~55ms
G-->>C: 200 OK<br/>X-Cache-Status: MISS<br/>X-RPC-Endpoint: node1<br/>X-RPC-Latency-Ms: 55<br/>X-RateLimit-Remaining: 155<br/>Result
endLatency Breakdown (Cache Miss)
| Step | Component | Typical Latency |
|---|---|---|
| 1. Authentication | Auth Service | 2-5ms |
| 2. Rate Limiting | Redis | 1-3ms |
| 3. Cache Check | Redis | 1-2ms |
| 4. Node Selection | Load Balancer | <1ms |
| 5. RPC Call | X1 Validator | 40-80ms |
| 6. Cache Storage | Redis | 1-2ms |
| Total | 45-95ms |
Latency Breakdown (Cache Hit)
| Step | Component | Typical Latency |
|---|---|---|
| 1. Authentication | Auth Service | 2-5ms |
| 2. Rate Limiting | Redis | 1-3ms |
| 3. Cache Retrieval | Redis | 1-2ms |
| Total | 4-10ms |
Credit Consumption
Different methods consume different amounts of credits:
- Standard methods (getHealth, getSlot, getBalance): 1 credit
- Heavy methods (getBlock, getTransaction, getProgramAccounts): 5 credits
- Transaction submission (sendTransaction, simulateTransaction): 10 credits
Error Handling
The system handles failures gracefully:
- Invalid API Key: Return 401 immediately (no further processing)
- Rate Limit Exceeded: Return 429 with Retry-After header
- RPC Node Down: Automatic failover to healthy node
- All Nodes Down: Return 503 Service Unavailable
- Timeout: Return 504 Gateway Timeout after 30s
3. Geyser Streaming Architecture
The Geyser system captures real-time blockchain data and stores it in TimescaleDB for fast historical queries.
graph TB
subgraph "X1 Blockchain Network"
V1[Validator 1<br/>Block Production]
V2[Validator 2<br/>Block Production]
V3[Validator 3<br/>Block Production]
end
subgraph "Geyser Plugin Layer"
GP1[Geyser Plugin 1<br/>Validator 1]
GP2[Geyser Plugin 2<br/>Validator 2]
GP3[Geyser Plugin 3<br/>Validator 3]
end
subgraph "Data Ingestion Pipeline"
Queue[Message Queue<br/>Kafka/Redis Streams]
Processor[Stream Processor<br/>Deduplication<br/>Transformation]
end
subgraph "Storage Layer"
TSDB[(TimescaleDB<br/>Hypertables)]
subgraph "Tables"
TXTable[Transactions<br/>Indexed by signature]
BlockTable[Blocks<br/>Indexed by slot]
AccountTable[Accounts<br/>Indexed by address]
TokenTable[Token Metadata<br/>Indexed by mint]
end
end
subgraph "Query Layer"
Cache2[(Redis Cache<br/>Query Results)]
GeyserAPI[Geyser REST API<br/>Complex Queries]
end
subgraph "Optimization Layer"
Materialized[Materialized Views<br/>Pre-aggregated Stats]
Indexes[Hypertable Indexes<br/>Time + Attributes]
end
V1 --> GP1
V2 --> GP2
V3 --> GP3
GP1 --> Queue
GP2 --> Queue
GP3 --> Queue
Queue --> Processor
Processor --> TXTable
Processor --> BlockTable
Processor --> AccountTable
Processor --> TokenTable
TXTable --> TSDB
BlockTable --> TSDB
AccountTable --> TSDB
TokenTable --> TSDB
TSDB --> Materialized
TSDB --> Indexes
GeyserAPI --> Cache2
Cache2 -.->|Cache Miss| TSDB
Materialized --> GeyserAPI
Indexes --> GeyserAPI
style Queue fill:#9b59b6,stroke:#7d3c98,color:#fff
style TSDB fill:#27ae60,stroke:#1e8449,color:#fff
style Cache2 fill:#f39c12,stroke:#d68910,color:#fffData Flow
- Capture: Geyser plugins on validators capture every transaction, block, and account update
- Queue: Events sent to message queue for reliable delivery
- Process: Stream processor deduplicates, validates, and transforms data
- Store: Data inserted into TimescaleDB hypertables with time-series optimization
- Index: Automatic indexing on time, signature, address, program ID
- Query: REST API serves queries with intelligent caching
Real-time vs Historical Data
Real-time (Processed/Confirmed):
- Latency: 200-400ms behind blockchain
- Commitment: processed or confirmed
- Use case: Live dashboards, real-time monitoring
- Cache TTL: 5-10 seconds
Historical (Finalized):
- Latency: 30-45 seconds behind blockchain
- Commitment: finalized
- Use case: Analytics, permanent records
- Cache TTL: 5 minutes
Query Optimization
1. Time-series Partitioning: TimescaleDB automatically partitions data by time (1-day chunks) for fast queries on recent data.
2. Materialized Views: Pre-aggregated statistics updated every minute:
- Transactions per block
- Validator statistics
- Token transfer volumes
- Program usage metrics
3. Multi-level Caching:
- L1: Redis cache (70-85% hit rate)
- L2: TimescaleDB query cache
- L3: Materialized views
4. Indexes:
- Time-based index (primary)
- Signature hash index (unique lookups)
- Account address index (account history)
- Program ID index (program activity)
- Composite indexes for common queries
Performance Characteristics
- Ingestion Rate: 10,000+ events/second
- Write Latency: <100ms from blockchain to database
- Query Latency: 30-80ms (cached), 100-300ms (database)
- Storage Growth: ~50GB/month for mainnet
- Retention: Unlimited (all tiers include full history)
Scaling Approach
- Write Scaling: Multiple Geyser plugins + message queue buffering
- Read Scaling: TimescaleDB read replicas + Redis cache cluster
- Storage Scaling: Automatic compression + continuous aggregates
- Geographic Scaling: Regional read replicas for global access
4. WebSocket Architecture
Real-time streaming architecture using WebSocket for low-latency event delivery to connected clients.
graph TB
subgraph "Client Connections"
C1[Client 1<br/>WSS Connection]
C2[Client 2<br/>WSS Connection]
C3[Client 3<br/>WSS Connection]
CN[Client N<br/>WSS Connection]
end
subgraph "WebSocket Server Cluster"
WS1[WS Server 1<br/>Node.js/Go]
WS2[WS Server 2<br/>Node.js/Go]
WS3[WS Server 3<br/>Node.js/Go]
end
subgraph "Subscription Management"
SubMgr[Subscription Manager<br/>Redis Pub/Sub]
subgraph "Channels"
TxChannel[transactions<br/>Channel]
BlockChannel[blocks<br/>Channel]
AccountChannel[accounts<br/>Channel]
SlotChannel[slots<br/>Channel]
end
end
subgraph "Event Sources"
GeyserStream[Geyser Plugin<br/>Real-time Events]
DBStream[TimescaleDB<br/>LISTEN/NOTIFY]
end
subgraph "Load Balancer"
LB[Load Balancer<br/>Sticky Sessions]
end
C1 -.-> LB
C2 -.-> LB
C3 -.-> LB
CN -.-> LB
LB --> WS1
LB --> WS2
LB --> WS3
WS1 --> SubMgr
WS2 --> SubMgr
WS3 --> SubMgr
SubMgr --> TxChannel
SubMgr --> BlockChannel
SubMgr --> AccountChannel
SubMgr --> SlotChannel
GeyserStream --> TxChannel
GeyserStream --> BlockChannel
GeyserStream --> AccountChannel
GeyserStream --> SlotChannel
DBStream -.-> SubMgr
style SubMgr fill:#e74c3c,stroke:#c0392b,color:#fff
style LB fill:#3498db,stroke:#2874a6,color:#fffConnection Lifecycle
sequenceDiagram
participant C as Client
participant LB as Load Balancer
participant WS as WebSocket Server
participant Auth as Auth Service
participant Sub as Subscription Manager
participant Stream as Event Stream
C->>LB: WSS Connection<br/>?api-key=fbx_xxx
LB->>WS: Route to WS Server
WS->>Auth: Validate API Key
Auth->>Auth: Check tier limits<br/>Check connection count
Auth-->>WS: Auth Success<br/>Max Connections: 5<br/>Current: 2/5
WS-->>C: Connection Established<br/>type: connected
Note over C: Subscribe to events
C->>WS: {action: "subscribe"<br/>channel: "transactions"<br/>filters: {...}}
WS->>Sub: Register subscription<br/>Client ID: c1234<br/>Channel: transactions
Sub->>Sub: Add to channel<br/>Apply filters
Sub-->>WS: Subscription active
WS-->>C: {type: "subscribed"<br/>channel: "transactions"<br/>subscriptionId: "sub_xxx"}
Note over Stream: New transaction occurs
Stream->>Sub: Publish transaction event
Sub->>Sub: Match filters<br/>Find subscribers
Sub->>WS: Send to matching clients
WS->>C: {type: "transaction"<br/>data: {...}}
Note over C: Heartbeat every 30s
C->>WS: {action: "ping"}
WS-->>C: {type: "pong"}
Note over C: Unsubscribe
C->>WS: {action: "unsubscribe"<br/>subscriptionId: "sub_xxx"}
WS->>Sub: Remove subscription
Sub-->>WS: Unsubscribed
WS-->>C: {type: "unsubscribed"}
C->>WS: Close connection
WS->>Sub: Remove all subscriptions
WS-->>C: Connection closedFan-out Architecture
Each WebSocket server handles 1,000-5,000 concurrent connections:
Single Event Processing:
- Geyser plugin emits transaction event
- Event published to Redis Pub/Sub channel
- All WS servers subscribed to channel receive event
- Each WS server filters event against client subscriptions
- Matching events sent to connected clients
Filtering Strategies:
- Server-side filtering reduces bandwidth
- Client-specific filters (account, program, commitment)
- Subscription multiplexing (one connection, many subscriptions)
Scaling Strategy
Horizontal Scaling:
- Multiple WS server instances behind load balancer
- Redis Pub/Sub broadcasts to all servers
- Each server handles subset of connections
- Sticky sessions keep clients connected to same server
Connection Distribution:
Load Balancer
├── WS Server 1: 2,500 connections
├── WS Server 2: 2,500 connections
├── WS Server 3: 2,500 connections
└── WS Server 4: 2,500 connections
Total: 10,000 concurrent connectionsTier Limits:
| Tier | Max Connections | Messages/Second |
|---|---|---|
| Free | 5 | 100 |
| Developer | 5 | 500 |
| Business | 250 | 2,000 |
| Professional | 250 | 5,000 |
| Enterprise | Custom | Custom |
Performance Characteristics
- Connection Latency: <200ms to establish
- Event Latency: <100ms from blockchain to client
- Message Rate: 100-5,000 msg/s per connection (tier-dependent)
- Bandwidth: ~5KB/message average
- Memory: ~50KB per connection
- CPU: Low (event-driven architecture)
Reliability Features
1. Automatic Reconnection: Clients implement exponential backoff for reconnection
2. Heartbeat/Ping-Pong: 30-second heartbeat detects stale connections
3. Message Acknowledgment: Critical messages require client acknowledgment
4. Connection Recovery: Clients can resume subscriptions after reconnection
5. Graceful Degradation: Rate limiting prevents overwhelming clients
5. Authentication Flow
Comprehensive authentication and authorization flow showing API key validation, tier checking, and credit tracking.
graph TB
Start([Client Request]) --> Auth{Valid API Key?}
Auth -->|No| AuthErr[401 Unauthorized<br/>error: INVALID_API_KEY]
Auth -->|Yes| Active{Key Active?}
Active -->|No| ActiveErr[401 Unauthorized<br/>error: KEY_INACTIVE]
Active -->|Yes| CheckCache{Check Redis Cache}
CheckCache -->|Cache Hit| LoadCache[Load Tier Info<br/>from Cache]
CheckCache -->|Cache Miss| LoadDB[Load from Database<br/>Cache for 5min]
LoadCache --> ValidateTier
LoadDB --> ValidateTier
ValidateTier{Validate Tier<br/>Permissions}
ValidateTier -->|Network Restricted| CheckNetwork{Network<br/>Allowed?}
CheckNetwork -->|No| NetworkErr[403 Forbidden<br/>error: NETWORK_NOT_ALLOWED]
CheckNetwork -->|Yes| CheckDomain
ValidateTier -->|Domain Restricted| CheckDomain{Domain<br/>Allowed?}
CheckDomain -->|No| DomainErr[403 Forbidden<br/>error: DOMAIN_NOT_ALLOWED]
CheckDomain -->|Yes| CheckIP
ValidateTier -->|IP Restricted| CheckIP{IP Address<br/>Allowed?}
CheckIP -->|No| IPErr[403 Forbidden<br/>error: IP_NOT_ALLOWED]
CheckIP -->|Yes| RateLimit
ValidateTier -->|No Restrictions| RateLimit{Rate Limit<br/>Check}
RateLimit -->|Exceeded| RateLimitErr[429 Too Many Requests<br/>error: RATE_LIMIT_EXCEEDED<br/>retry_after: Xs]
RateLimit -->|OK| Credits{Sufficient<br/>Credits?}
Credits -->|No| CreditsErr[402 Payment Required<br/>error: INSUFFICIENT_CREDITS]
Credits -->|Yes| Success[Request Allowed<br/>Proceed to Service]
Success --> Deduct[Deduct Credits<br/>Update Usage]
Deduct --> Response[Process Request<br/>Return Response]
AuthErr --> End([End])
ActiveErr --> End
NetworkErr --> End
DomainErr --> End
IPErr --> End
RateLimitErr --> End
CreditsErr --> End
Response --> End
style Success fill:#27ae60,stroke:#1e8449,color:#fff
style AuthErr fill:#e74c3c,stroke:#c0392b,color:#fff
style ActiveErr fill:#e74c3c,stroke:#c0392b,color:#fff
style NetworkErr fill:#e67e22,stroke:#ca6f1e,color:#fff
style DomainErr fill:#e67e22,stroke:#ca6f1e,color:#fff
style IPErr fill:#e67e22,stroke:#ca6f1e,color:#fff
style RateLimitErr fill:#f39c12,stroke:#d68910,color:#fff
style CreditsErr fill:#e74c3c,stroke:#c0392b,color:#fffAuthentication Steps
1. API Key Validation (2-5ms)
- Extract API key from header (X-API-Key or Authorization Bearer)
- Validate format (fbx_xxx pattern)
- Check key exists in system
- Verify key is active (not revoked/expired)
2. Tier Information Loading (1-3ms)
- Check Redis cache for tier info (5-minute TTL)
- On cache miss, load from database
- Cache includes: tier level, rate limits, credit balance, restrictions
3. Access Control Validation (1-2ms)
- Network Restrictions: Verify requested network allowed (mainnet/devnet/testnet)
- Domain Restrictions: Check Origin/Referer header against whitelist
- IP Restrictions: Validate source IP against allowed ranges
4. Rate Limiting (1-3ms)
- Check current request rate in Redis
- Sliding window algorithm (per-second buckets)
- Allow burst up to 2x limit for short periods
- Return 429 if exceeded with Retry-After header
5. Credit Checking (1-2ms)
- Load current credit balance from cache
- Calculate request cost based on method
- Verify sufficient credits available
- Return 402 if insufficient
6. Credit Deduction (1-2ms)
- Deduct credits from balance
- Update usage metrics
- Log usage for billing
Redis Cache Strategy
Cached Data (5-minute TTL):
{
"apiKey": "fbx_xxx",
"tier": "Business",
"rateLimit": {
"requestsPerSecond": 200,
"burstLimit": 400
},
"credits": {
"total": 30000000,
"used": 1500000,
"remaining": 28500000
},
"restrictions": {
"networks": ["mainnet", "devnet"],
"domains": ["https://myapp.com"],
"ipRanges": ["203.0.113.0/24"]
},
"metadata": {
"userId": "user_123",
"keyName": "Production API Key",
"createdAt": "2025-11-01T00:00:00Z"
}
}Rate Limiting Algorithm
Sliding Window:
// Redis keys for rate limiting
const key = `ratelimit:${apiKey}:${currentSecond}`;
const count = await redis.incr(key);
await redis.expire(key, 60); // Keep for 60 seconds
if (count > rateLimit.requestsPerSecond) {
// Check burst allowance
const windowTotal = await sumPreviousSeconds(apiKey, 10);
const avgRate = windowTotal / 10;
if (avgRate > rateLimit.requestsPerSecond && count > rateLimit.burstLimit) {
throw new RateLimitError();
}
}Rate Limit Headers:
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 187
X-RateLimit-Reset: 1700000000
X-RateLimit-Burst: 400Credit Tracking
Credit Costs by Method:
| Category | Methods | Credits |
|---|---|---|
| Standard | getHealth, getSlot, getBalance | 1 |
| Heavy | getBlock, getTransaction, getProgramAccounts | 5 |
| Transaction | sendTransaction, simulateTransaction | 10 |
| Geyser Simple | /transaction/:sig, /block/:slot | 5 |
| Geyser List | /transactions, /blocks | 10 |
| Geyser Search | Complex queries with filters | 25 |
Usage Tracking:
- Real-time credit deduction
- Usage metrics logged to TimescaleDB
- Daily usage summaries
- Billing calculated from usage logs
6. Multi-Region Setup
FortiBlox Nexus supports multi-region deployment for global availability and low-latency access.
graph TB
subgraph "Global DNS"
DNS[Route 53<br/>GeoDNS Routing]
end
subgraph "Region: US East"
USE_LB[Load Balancer<br/>us-east-1]
USE_Gateway[API Gateway]
USE_RPC[RPC Service]
USE_Geyser[Geyser API]
USE_WS[WebSocket Server]
USE_Redis[(Redis Primary)]
USE_TSDB[(TimescaleDB Primary)]
USE_Validators[X1 Validators<br/>3 nodes]
end
subgraph "Region: EU West"
EUW_LB[Load Balancer<br/>eu-west-1]
EUW_Gateway[API Gateway]
EUW_RPC[RPC Service]
EUW_Geyser[Geyser API]
EUW_WS[WebSocket Server]
EUW_Redis[(Redis Replica)]
EUW_TSDB[(TimescaleDB Replica)]
EUW_Validators[X1 Validators<br/>2 nodes]
end
subgraph "Region: Asia Pacific"
AP_LB[Load Balancer<br/>ap-southeast-1]
AP_Gateway[API Gateway]
AP_RPC[RPC Service]
AP_Geyser[Geyser API]
AP_WS[WebSocket Server]
AP_Redis[(Redis Replica)]
AP_TSDB[(TimescaleDB Replica)]
AP_Validators[X1 Validators<br/>2 nodes]
end
subgraph "Monitoring & Control"
Monitor[Global Monitoring<br/>Health Checks]
Failover[Failover Controller<br/>Automatic Routing]
end
DNS --> USE_LB
DNS --> EUW_LB
DNS --> AP_LB
USE_LB --> USE_Gateway
USE_Gateway --> USE_RPC
USE_Gateway --> USE_Geyser
USE_Gateway --> USE_WS
USE_RPC --> USE_Redis
USE_Geyser --> USE_Redis
USE_Geyser --> USE_TSDB
USE_RPC --> USE_Validators
EUW_LB --> EUW_Gateway
EUW_Gateway --> EUW_RPC
EUW_Gateway --> EUW_Geyser
EUW_Gateway --> EUW_WS
EUW_RPC --> EUW_Redis
EUW_Geyser --> EUW_Redis
EUW_Geyser --> EUW_TSDB
EUW_RPC --> EUW_Validators
AP_LB --> AP_Gateway
AP_Gateway --> AP_RPC
AP_Gateway --> AP_Geyser
AP_Gateway --> AP_WS
AP_RPC --> AP_Redis
AP_Geyser --> AP_Redis
AP_Geyser --> AP_TSDB
AP_RPC --> AP_Validators
USE_Redis -.->|Replication| EUW_Redis
USE_Redis -.->|Replication| AP_Redis
USE_TSDB -.->|Replication| EUW_TSDB
USE_TSDB -.->|Replication| AP_TSDB
Monitor --> USE_LB
Monitor --> EUW_LB
Monitor --> AP_LB
Failover --> DNS
Monitor --> Failover
style DNS fill:#3498db,stroke:#2874a6,color:#fff
style Monitor fill:#e74c3c,stroke:#c0392b,color:#fffGeographic Routing
DNS-based Routing:
- Route 53 GeoDNS routes users to nearest region
- Latency-based routing for optimal performance
- Health checks ensure region availability
Latency Benefits:
| Client Region | US East | EU West | Asia Pacific |
|---|---|---|---|
| US East | 10-20ms | 80-100ms | 150-200ms |
| EU West | 80-100ms | 10-20ms | 120-180ms |
| Asia Pacific | 150-200ms | 120-180ms | 10-20ms |
Data Replication
Redis Cache Replication:
- Master-replica replication (async)
- 10-50ms replication lag
- Read from local replica
- Writes to primary only (auth, rate limiting)
TimescaleDB Replication:
- Streaming replication (async)
- 1-5 second replication lag
- Read replicas for query distribution
- Primary in US East for writes
Consistency Model:
- Authentication: Eventual consistency (5-min cache)
- Rate Limiting: Per-region enforcement (may exceed global limit slightly)
- Historical Data: Eventual consistency (1-5s lag)
- Real-time Data: Direct from validators in each region
Failover Logic
Health Monitoring:
// Health check every 10 seconds
const healthCheck = {
endpoint: "https://us-east.nexus.fortiblox.com/health",
interval: 10000,
timeout: 5000,
consecutiveFailures: 3
};
// Failover trigger
if (consecutiveFailures >= 3) {
// Remove from DNS
// Route traffic to healthy regions
// Alert operations team
}Automatic Failover:
- Health check detects region failure (3 consecutive failures)
- Remove failing region from DNS
- Traffic redistributed to healthy regions
- Operations team alerted
- Automatic recovery when health restored
Manual Failover:
- Operations dashboard for manual control
- Gradual traffic shifting (0% → 25% → 50% → 100%)
- Rollback capability
Load Distribution
Normal Operation:
- US East: 50% traffic (largest user base)
- EU West: 30% traffic
- Asia Pacific: 20% traffic
During US East Failure:
- EU West: 60% traffic (auto-scaled)
- Asia Pacific: 40% traffic (auto-scaled)
- Automatic capacity scaling in remaining regions
Scaling Strategy
Regional Auto-scaling:
- Monitor CPU, memory, request rate
- Scale API Gateway: 2-10 instances per region
- Scale RPC Service: 2-8 instances per region
- Scale WebSocket: 2-6 instances per region
Cross-region Scaling:
- Primary region handles 80% of capacity
- Secondary regions can handle 150% during failover
7. Monitoring & Observability
Comprehensive monitoring and observability infrastructure for FortiBlox Nexus platform.
graph TB
subgraph "Service Layer"
Gateway[API Gateway]
RPC[RPC Service]
Geyser[Geyser API]
WS[WebSocket Server]
end
subgraph "Instrumentation"
Logs[Structured Logging<br/>JSON Format]
Metrics[Metrics Collection<br/>Prometheus]
Traces[Distributed Tracing<br/>OpenTelemetry]
end
subgraph "Collection Layer"
LogCollector[Log Aggregator<br/>Fluentd/Logstash]
MetricsDB[(Prometheus<br/>Time-series DB)]
TraceCollector[Trace Collector<br/>Jaeger]
end
subgraph "Storage Layer"
LogStore[(Elasticsearch<br/>Log Storage)]
MetricsStore[(Prometheus<br/>Long-term Storage)]
TraceStore[(Jaeger Backend<br/>Trace Storage)]
end
subgraph "Analysis & Visualization"
Grafana[Grafana Dashboards<br/>Real-time Metrics]
Kibana[Kibana<br/>Log Analysis]
Jaeger_UI[Jaeger UI<br/>Trace Visualization]
end
subgraph "Alerting"
AlertManager[Alert Manager<br/>Prometheus]
PagerDuty[PagerDuty<br/>On-call Alerts]
Slack[Slack<br/>Team Notifications]
Email[Email Alerts]
end
subgraph "External Monitoring"
Uptime[Uptime Robot<br/>External Checks]
StatusPage[Status Page<br/>status.fortiblox.com]
end
Gateway --> Logs
Gateway --> Metrics
Gateway --> Traces
RPC --> Logs
RPC --> Metrics
RPC --> Traces
Geyser --> Logs
Geyser --> Metrics
Geyser --> Traces
WS --> Logs
WS --> Metrics
WS --> Traces
Logs --> LogCollector
Metrics --> MetricsDB
Traces --> TraceCollector
LogCollector --> LogStore
MetricsDB --> MetricsStore
TraceCollector --> TraceStore
LogStore --> Kibana
MetricsStore --> Grafana
TraceStore --> Jaeger_UI
MetricsDB --> AlertManager
AlertManager --> PagerDuty
AlertManager --> Slack
AlertManager --> Email
Uptime --> StatusPage
AlertManager --> StatusPage
style Grafana fill:#e74c3c,stroke:#c0392b,color:#fff
style AlertManager fill:#f39c12,stroke:#d68910,color:#fff
style StatusPage fill:#27ae60,stroke:#1e8449,color:#fffKey Metrics Collected
Request Metrics:
// Prometheus metrics
http_requests_total{service="api-gateway", method="POST", endpoint="/rpc", status="200"}
http_request_duration_seconds{service="api-gateway", quantile="0.95"}
http_requests_in_flight{service="api-gateway"}System Metrics:
- CPU usage (per service)
- Memory usage (per service)
- Disk I/O (database servers)
- Network throughput
- Connection count (WebSocket)
Business Metrics:
- API requests by tier
- Credit consumption rate
- Cache hit rate
- RPC node health scores
- Authentication success/failure rate
Error Metrics:
- Error rate by service
- Error types (401, 429, 500, etc.)
- Failed RPC calls
- WebSocket disconnections
Logging Strategy
Structured Logging Format:
{
"timestamp": "2025-11-24T14:00:00.000Z",
"level": "info",
"service": "api-gateway",
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"apiKey": "fbx_xxx",
"tier": "Business",
"method": "POST",
"endpoint": "/rpc",
"rpcMethod": "getAccountInfo",
"status": 200,
"duration": 45,
"cached": false,
"rpcEndpoint": "http://validator1:8899",
"credits": 1,
"userAgent": "axios/1.6.0",
"message": "RPC request completed successfully"
}Log Levels:
- DEBUG: Detailed debugging information
- INFO: Normal operation logs (requests, responses)
- WARN: Warning conditions (slow queries, cache misses)
- ERROR: Error conditions (failed requests, timeouts)
- FATAL: Critical errors (service crashes)
Retention:
- DEBUG/INFO: 7 days
- WARN: 30 days
- ERROR/FATAL: 90 days
Distributed Tracing
Trace Spans:
Request: POST /rpc [Total: 45ms]
├─ authenticate [3ms]
│ ├─ cache-lookup [1ms]
│ └─ validate-permissions [2ms]
├─ rate-limit-check [2ms]
├─ cache-check [2ms]
└─ rpc-call [38ms]
├─ load-balance [1ms]
├─ http-request [35ms]
└─ cache-store [2ms]Trace Context Propagation:
- Unique trace ID generated for each request
- Propagated across all services
- Stored in logs for correlation
- Visualized in Jaeger UI
Alerting Rules
Critical Alerts (PagerDuty):
- Service down (all instances)
- Error rate >5% for 5 minutes
- Response time >1000ms (p95) for 5 minutes
- Database connection failures
- All RPC nodes unhealthy
Warning Alerts (Slack):
- Error rate >1% for 10 minutes
- Response time >500ms (p95) for 10 minutes
- Cache hit rate <50% for 15 minutes
- Single RPC node unhealthy
- High memory usage (>80%)
Info Alerts (Email):
- Unusual traffic patterns
- New API key created
- Tier upgrade/downgrade
- Daily usage report
Dashboard Examples
1. Service Health Dashboard:
- Request rate (per service)
- Error rate (per service)
- Response time (p50, p95, p99)
- Uptime percentage (24h, 7d, 30d)
2. Infrastructure Dashboard:
- CPU/Memory usage
- Disk usage
- Network throughput
- Database connections
3. Business Metrics Dashboard:
- Requests by tier
- Credit consumption
- Top users by usage
- Revenue metrics
4. RPC Node Dashboard:
- Health scores per node
- Request distribution
- Response times per node
- Error rates per node
Performance Characteristics
- Metrics Collection Overhead: <1% CPU
- Log Collection Overhead: <2% CPU
- Trace Collection Overhead: <3% CPU
- Metrics Retention: 90 days (5-minute resolution)
- Log Retention: 7-90 days (based on level)
- Trace Retention: 7 days (sampled)
External Monitoring
Uptime Robot Checks (every 5 minutes):
- HTTPS endpoint health
- WebSocket connectivity
- DNS resolution
- SSL certificate validity
Status Page Components:
- API Gateway (US East, EU West, Asia Pacific)
- RPC Nodes (mainnet, devnet, testnet)
- Geyser API
- WebSocket Streaming
- Database Services
Summary
FortiBlox Nexus is a comprehensive X1 Blockchain infrastructure platform built with:
- High Performance: Sub-100ms latency for most operations
- High Availability: Multi-region deployment with automatic failover
- Scalability: Horizontal scaling of all components
- Reliability: 99.9% uptime SLA with comprehensive monitoring
- Security: Multi-layer authentication and authorization
- Developer Experience: Simple APIs with detailed documentation
Key Technologies
- API Gateway: Node.js/Express or Go
- RPC Load Balancer: Custom Go service
- Geyser API: Node.js/Express with TypeScript
- WebSocket Server: Node.js/ws or Go
- Caching: Redis (cluster mode)
- Database: TimescaleDB (PostgreSQL extension)
- Monitoring: Prometheus + Grafana
- Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
- Tracing: OpenTelemetry + Jaeger
Architecture Principles
- Microservices: Independent services for each function
- Stateless Services: All state in Redis/TimescaleDB
- Horizontal Scaling: Scale by adding more instances
- Defense in Depth: Multiple layers of security
- Observability First: Comprehensive logging, metrics, and tracing
- Graceful Degradation: Continue operating during partial failures
- Cost Optimization: Intelligent caching reduces infrastructure costs