Roadmap

Thermocline v4.0 is an AI-native document database with built-in vector search and automatic hot/cold storage tiering. Development is organized into six sequential phases spanning approximately 17 months.

This roadmap is subject to change based on community feedback and adoption patterns.

Key Capabilities at v4.0 GA

Native LSM-tree storage engine with WAL and crash recovery
Automatic hot/cold data tiering (NVMe to Parquet on object storage)
Built-in vector search (HNSW + DiskANN, up to 1 billion vectors)
ACID transactions with snapshot isolation (>50K TPS)
Raft consensus replication (<5s failover)
MVCC time travel queries
WAL-based change streams
MongoDB wire protocol compatibility (all drivers, zero app changes)
SSPL v1 license

Phase 1: Foundation (4 months) — In Development

Core storage engine and wire protocol.

Exit criteria: CRUD tests pass, wire protocol compatible with mongosh and all official MongoDB drivers, crash recovery verified, point read <1ms p50.

Phase 2: Query Engine (3 months)

Full MQL support and cold storage queries.

All MongoDB query operators on native storage
MQL-to-DataFusion translation for cold queries
Full aggregation pipeline ($match, $group, $sort, $project, $lookup, $unwind, $facet, etc.)
Tiered query execution with result merging
Archival pipeline (BSON to Parquet with zone maps and bloom filters)
B-tree indexes with online builds
Cursor management and streaming results
Predicate pushdown and column pruning for Parquet
CollectionPolicy CRD for archival rules

Exit criteria: >95% MongoDB query compatibility, correct tiered query results, valid Parquet output, cold query on 1M docs <2s p50.

Phase 3: Transactions & Replication (3 months)

ACID transactions and Raft consensus.

MVCC version store with configurable retention
Transaction manager (begin, commit, abort)
Lock manager with deadlock detection (wait-for graph)
Snapshot isolation and read committed levels
Raft consensus (leader election, log replication)
Membership changes (add/remove nodes)
Snapshot transfer for new replicas
Automatic failover with <5s target
Read replicas with bounded staleness

Exit criteria: ACID verified under concurrent load, Raft failover <5s with zero committed data loss, >50K TPS on 3-node cluster, deadlock resolution <100ms.

Phase 4: AI Features (3 months)

Vector search and hybrid queries.

HNSW index for hot vectors (in-memory, <50M vectors)
Distance metrics: cosine, euclidean (L2), dot product
$vectorSearch aggregation stage (Atlas-compatible syntax)
Hybrid search (pre-filter and post-filter modes)
SQ8 scalar quantization (~4x memory reduction)
Cold vector search on archived Parquet embeddings
Online index builds without blocking writes
Vector index persistence and recovery

Exit criteria: >95% recall at <50ms p50 for 1M vectors, hybrid search correctness, Atlas syntax compatibility, SQ8 <2% recall loss.

Phase 5: Advanced Features (2 months)

Time travel, change streams, and sharding.

Time travel queries (readConcern.atClusterTime)
Named snapshots (pin a point-in-time)
Configurable version retention (1-365 days)
WAL-based change streams with resume tokens
Change stream filtering and projection
RAG embedding triggers (auto-generate embeddings on write)
Hash-based sharding (up to 1024 shards)
Shard-aware query routing

Exit criteria: Correct historical reads, change stream ordering and resume, even shard distribution (<10% skew), RAG trigger overhead <100ms.

Phase 6: Scale & Polish (2 months)

Billion-scale vectors, migration tooling, production hardening.

Exit criteria: 100M vector search <200ms p50, 100TB hot data stable 72+ hours, migration from MongoDB 4.4-7.0, docs complete, K8s operator manages full lifecycle.

Post-v4.0 Future

Feature	Priority
Range sharding	High
GPU-accelerated vector search	Medium
Row-level security	Medium
Full-text search	Medium
Graph queries ($graphLookup)	Medium
Geospatial queries	Medium
Time series collections	Medium
Multi-region replication	Medium
Predictive ML-based tiering	Low
Column-level encryption	Low

Driver Compatibility

All official MongoDB drivers work with Thermocline using standard connection strings — no modifications needed:

Driver	Status
Node.js / Mongoose	Supported
Python / PyMongo / Motor	Supported
Java	Supported
Go	Supported
Rust	Supported
C# / .NET	Supported
C / C++	Supported
Ruby	Supported
PHP	Supported

How to Influence the Roadmap

Vote on issues — Use reactions on GitHub issues to show interest
Open feature requests — Describe your use case and requirements
Contribute — Implementation PRs are always welcome
Discuss — Join GitHub Discussions to share your perspective

Release Cadence

Patch releases — As needed for bug fixes and security patches
Minor releases — Every 2-3 months with new features
Major releases — When breaking changes are necessary (rare)

Key Capabilities at v4.0 GA
Phase 1: Foundation (4 months) — In Development
Phase 2: Query Engine (3 months)
Phase 3: Transactions & Replication (3 months)
Phase 4: AI Features (3 months)
Phase 5: Advanced Features (2 months)
Phase 6: Scale & Polish (2 months)
Post-v4.0 Future
Driver Compatibility
How to Influence the Roadmap
Release Cadence

Key Capabilities at v4.0 GA​

Phase 1: Foundation (4 months) — In Development​

Phase 2: Query Engine (3 months)​

Phase 3: Transactions & Replication (3 months)​

Phase 4: AI Features (3 months)​

Phase 5: Advanced Features (2 months)​

Phase 6: Scale & Polish (2 months)​

Post-v4.0 Future​

Driver Compatibility​

How to Influence the Roadmap​

Release Cadence​