Skip to main content

Introducing Thermocline - The Document Database, Reimagined

· 4 min read

We're excited to announce Thermocline, a fully open-source, AI-native document database with automatic tiered storage, built-in vector search, MVCC time travel, and Raft consensus replication.

Thermocline is not a proxy or compatibility layer — it is a complete database built from the ground up with its own native storage engine. Applications connect using standard MongoDB drivers with only a connection string change.

The Problem We're Solving

Document databases revolutionized application development, but modern applications face challenges that no existing document database adequately addresses:

  • The Cost Crisis — 80% of data is accessed less than once per month, yet all of it sits on expensive, high-performance storage
  • No Native AI Integration — AI-powered applications need vector search and embedding storage; self-hosted MongoDB has no native vector capability
  • No Time Travel — Regulatory compliance, debugging, and analytics frequently require querying data as it existed at a specific point in time
  • Vendor Lock-in — Atlas pricing is opaque, with hidden charges for data transfer, IOPS, and backup
  • Operational Complexity — Sharding requires shard key selection anxiety, replica set failover can take 10+ seconds

How Thermocline Works

Thermocline is a standalone database. Applications connect directly using any MongoDB driver — the only change is the connection string. Under the hood, Thermocline provides:

  1. A native LSM-tree storage engine with WAL for sub-millisecond reads and >100K writes/sec
  2. Automatic tiering — data flows from hot (NVMe) to cold (Parquet on object storage) based on policies
  3. Transparent federation — a single query seamlessly spans hot and cold tiers
  4. Built-in vector search — HNSW and DiskANN indexes for semantic similarity
  5. MVCC time travel — query data as it existed at any historical timestamp

The result is 60-80% cost reduction with capabilities no other document database offers.

Key Features

  • Native Storage Engine — LSM-tree + WAL with sub-millisecond reads and crash recovery
  • Built-in Vector Search — HNSW and DiskANN indexes with $vectorSearch aggregation stage
  • MVCC Time Travel — Query data at any historical timestamp within a configurable retention window
  • Raft Consensus — Automatic leader election with <5s failover and no split-brain
  • ACID Transactions — Multi-document snapshot isolation with >50K TPS
  • WAL Change Streams — Guaranteed-order event streams for RAG pipelines and event-driven architectures
  • MongoDB Compatible — Full wire protocol and MQL support; zero application changes
  • SSPL Licensed — Open source with full access to the source code

Architecture

Thermocline consists of these core components:

  • Gateway — MongoDB wire protocol termination, TLS, and authentication (Rust)
  • Storage Engine — Native LSM-tree with WAL, MVCC, and vector indexes (Rust)
  • Query Coordinator — Query analysis, tier routing, and result merging (Go)
  • Query Engine — Cold storage execution against Parquet via DataFusion (Rust)
  • Lifecycle Controller — Policy-driven archival and data management (Go)
  • Replication Manager — Raft consensus, log shipping, and failover (Go + Rust)
  • Metadata Store — Cluster state and data locations (etcd)

Getting Started

# Clone and start with Docker Compose
git clone https://github.com/strongly-ai/thermocline.git
cd thermocline
docker compose up -d

# Connect with any MongoDB driver
mongosh "mongodb://localhost:27017/mydb"

Check out our Quick Start Guide for a complete walkthrough including vector search, time travel, and cross-tier queries.

What's Next

We're actively developing Thermocline v4.0 across six phases. Here's what's coming:

  • Full MQL query engine with tiered execution
  • ACID transactions with MVCC snapshot isolation
  • Raft consensus replication with <5s failover
  • HNSW and DiskANN vector search indexes
  • Time travel queries and named snapshots
  • Horizontal sharding up to 1024 shards
  • MongoDB migration CLI (4.4-7.0)

Get Involved

We believe the document database should be reimagined for AI workloads, cost efficiency, and operational simplicity. Thermocline makes that vision a reality.

— The Thermocline Team