Introducing Thermocline - The Document Database, Reimagined
We're excited to announce Thermocline, a fully open-source, AI-native document database with automatic tiered storage, built-in vector search, MVCC time travel, and Raft consensus replication.
Thermocline is not a proxy or compatibility layer — it is a complete database built from the ground up with its own native storage engine. Applications connect using standard MongoDB drivers with only a connection string change.
The Problem We're Solving
Document databases revolutionized application development, but modern applications face challenges that no existing document database adequately addresses:
- The Cost Crisis — 80% of data is accessed less than once per month, yet all of it sits on expensive, high-performance storage
- No Native AI Integration — AI-powered applications need vector search and embedding storage; self-hosted MongoDB has no native vector capability
- No Time Travel — Regulatory compliance, debugging, and analytics frequently require querying data as it existed at a specific point in time
- Vendor Lock-in — Atlas pricing is opaque, with hidden charges for data transfer, IOPS, and backup
- Operational Complexity — Sharding requires shard key selection anxiety, replica set failover can take 10+ seconds
How Thermocline Works
Thermocline is a standalone database. Applications connect directly using any MongoDB driver — the only change is the connection string. Under the hood, Thermocline provides:
- A native LSM-tree storage engine with WAL for sub-millisecond reads and >100K writes/sec
- Automatic tiering — data flows from hot (NVMe) to cold (Parquet on object storage) based on policies
- Transparent federation — a single query seamlessly spans hot and cold tiers
- Built-in vector search — HNSW and DiskANN indexes for semantic similarity
- MVCC time travel — query data as it existed at any historical timestamp
The result is 60-80% cost reduction with capabilities no other document database offers.
Key Features
- Native Storage Engine — LSM-tree + WAL with sub-millisecond reads and crash recovery
- Built-in Vector Search — HNSW and DiskANN indexes with
$vectorSearchaggregation stage - MVCC Time Travel — Query data at any historical timestamp within a configurable retention window
- Raft Consensus — Automatic leader election with <5s failover and no split-brain
- ACID Transactions — Multi-document snapshot isolation with >50K TPS
- WAL Change Streams — Guaranteed-order event streams for RAG pipelines and event-driven architectures
- MongoDB Compatible — Full wire protocol and MQL support; zero application changes
- SSPL Licensed — Open source with full access to the source code
Architecture
Thermocline consists of these core components:
- Gateway — MongoDB wire protocol termination, TLS, and authentication (Rust)
- Storage Engine — Native LSM-tree with WAL, MVCC, and vector indexes (Rust)
- Query Coordinator — Query analysis, tier routing, and result merging (Go)
- Query Engine — Cold storage execution against Parquet via DataFusion (Rust)
- Lifecycle Controller — Policy-driven archival and data management (Go)
- Replication Manager — Raft consensus, log shipping, and failover (Go + Rust)
- Metadata Store — Cluster state and data locations (etcd)
Getting Started
# Clone and start with Docker Compose
git clone https://github.com/strongly-ai/thermocline.git
cd thermocline
docker compose up -d
# Connect with any MongoDB driver
mongosh "mongodb://localhost:27017/mydb"
Check out our Quick Start Guide for a complete walkthrough including vector search, time travel, and cross-tier queries.
What's Next
We're actively developing Thermocline v4.0 across six phases. Here's what's coming:
- Full MQL query engine with tiered execution
- ACID transactions with MVCC snapshot isolation
- Raft consensus replication with <5s failover
- HNSW and DiskANN vector search indexes
- Time travel queries and named snapshots
- Horizontal sharding up to 1024 shards
- MongoDB migration CLI (4.4-7.0)
Get Involved
- GitHub Repository — Star us, open issues, submit PRs
- Contributing Guide — How to contribute
- Documentation — Full documentation
- Discussions — Ask questions and share ideas
We believe the document database should be reimagined for AI workloads, cost efficiency, and operational simplicity. Thermocline makes that vision a reality.
— The Thermocline Team