High Availability
HA Building Blocks
- Gateway: horizontally scalable stateless wire endpoints
- Storage Engine: replicated storage semantics with leader/replica behavior
- Coordinator: stateless routing/orchestration replicas
- etcd: metadata and coordination backend
Validation
For local cluster behavior validation:
make e2e-local-cluster
For production, run failure drills (node termination, network disruption, process restarts) and verify correctness plus latency SLOs.
Operational Signals
Track:
- command error rate by service
- leader/replica health indicators
- request latency p95/p99 under failover
- cursor/query retries and backpressure trends
Source of Truth
services/storage-engine/src/raft/mod.rsservices/coordinator/internal/region/failover.goservices/gateway/src/retry/mod.rstests/e2e/matrix/raft/test_failover.py