📊 MongoDB 5.0+

Time Series Collections

A deep dive into MongoDB's specialized storage for time-stamped data: bucketing, compression, and internal architecture

Part I

Fundamentals

Understanding what Time Series collections are and when to use them

01

What are Time Series Collections?

🎯 Why This Matters

The Problem: Time series data (IoT sensors, metrics, logs) generates massive volumes - often millions of documents per day. Regular MongoDB collections store each document separately, leading to:

❌ 10x more storage than needed ❌ Slow range queries ❌ Index bloat

The Solution: Time Series collections automatically bucket, compress, and optimize this data - giving you 90%+ storage savings and 10x faster queries with zero application changes.

⏱️

Time-Ordered Data

Optimized for data that arrives sequentially over time: IoT sensors, metrics, logs, stock prices, and any timestamped events.

📦

Automatic Bucketing

MongoDB groups documents into "buckets" based on time intervals and metadata, dramatically reducing storage overhead.

🗜️

Column Compression

Data within buckets is stored in columnar format with delta encoding, achieving up to 90%+ compression ratios.

02

Real-World Use Cases

🌡️

IoT Sensor Data

Temperature, humidity, pressure readings

// Document structure
timeField: "timestamp"
metaField: "device"
granularity: "seconds"
High Frequency ~95% Compression
📈

Financial Market Data

Stock prices, trades, order book

// Document structure
timeField: "tradeTime"
metaField: "symbol"
granularity: "seconds"
Ultra High Freq ~80% Compression
🖥️

Application Metrics

CPU, memory, latency metrics

// Document structure
timeField: "collectedAt"
metaField: "host"
granularity: "minutes"
Medium Freq ~90% Compression
📋

Event Logs

Application logs, audit trails

// Document structure
timeField: "eventTime"
metaField: "service"
granularity: "hours"
Variable Freq ~70% Compression
Part II

Core Concepts

How buckets, granularity, data flow, and compression work together

03

Inside a Bucket Document

🎯 Why This Matters

Buckets are the secret sauce. Instead of storing 1 million individual documents, MongoDB groups them into ~1,000 buckets. Each bucket holds documents with the same metadata within a time window. This means: fewer documents to scan, better compression, and indexes that are 1000x smaller. Understanding bucket structure helps you choose the right granularity and metadata fields.

📥
Incoming Documents
10:00:01 sensor_A 23.5°C
10:00:02 sensor_B 45.2°C
10:00:03 sensor_A 23.6°C
10:00:04 sensor_B 45.1°C
10:00:05 sensor_A 23.7°C
🔀
Grouping by Metadata + Time
sensor_A group
10:00:01 23.5°C
10:00:03 23.6°C
10:00:05 23.7°C
sensor_B group
10:00:02 45.2°C
10:00:04 45.1°C
📦
Stored as Buckets
Bucket #1 - sensor_A 10:00 - 10:59
Bucket #2 - sensor_B 10:00 - 10:59
💡

Key Insight: Bucket Granularity

MongoDB automatically determines bucket boundaries based on the granularity setting (seconds, minutes, hours). Documents with the same metadata that arrive within the same time window go into the same bucket.

04

Granularity Deep Dive

⚠️

Common Misconception!

"seconds" granularity does NOT mean buckets close every second!

Granularity controls the bucket time span and timestamp rounding, not how often buckets close.

📏 What Granularity Actually Controls

⏱️
seconds
1 HOUR
Bucket Time Span
Best for: >1 doc/sec
⏱️
minutes
24 HOURS
Bucket Time Span
Best for: 1 doc/min
⏱️
hours
30 DAYS
Bucket Time Span
Best for: <1 doc/hour

🔄 When Do Buckets Actually Close?

A bucket closes when ANY of these conditions is met:

📊
~1000
Documents reached
💾
~125 KB
Size limit reached
Time Span
Exceeded (1h/24h/30d)
📝 Example: "seconds" granularity with 10 docs/second
Bucket fills in: 1000 docs ÷ 10 docs/sec = ~100 seconds

NOT every second! The bucket stays open until it fills up or hits the 1-hour time boundary.

05

Inside a Bucket Document

Under the hood, MongoDB stores time series data in a special system.buckets.<collection> collection. Each bucket is a single document with a specific structure:

Actual Bucket Document (system.buckets.sensorData)
{
  "_id": ObjectId("..."),           // Encodes bucket start time
  "control": {
    "version": 1,                     // Bucket format version
    "min": {
      "_id": ObjectId("..."),        // Min values for pruning
      "timestamp": ISODate("2024-01-01T10:00:00Z"),
      "temperature": 22.1
    },
    "max": {
      "timestamp": ISODate("2024-01-01T10:59:59Z"),
      "temperature": 28.7
    },
    "closed": false,               // Is bucket accepting writes?
    "count": 847                     // Number of measurements
  },
  "meta": {                            // Your metaField value
    "sensorId": "sensor_001",
    "location": "NYC"
  },
  "data": {                            // Columnar compressed data
    "timestamp": { "0": ISODate(...), "1": ISODate(...), ... },
    "temperature": { "0": 23.5, "1": 23.6, "2": 23.7, ... },
    "humidity": { "0": 45, "1": 46, "2": 44, ... }
  }
}

Bucket Document Structure

BUCKET ID
_id: ObjectId("...")
CONTROL - TIME BOUNDS
control.min.timestamp
control.max.timestamp
METADATA
meta: { sensorId: "A", location: "NYC" }
COLUMNAR DATA
data.timestamp: [t1, t2, t3...]
data.temperature: [23.5, 23.6, 23.7...]
data.humidity: [45, 46, 44...]

Columnar Storage Format

Instead of storing each measurement as a separate document, values are stored column-by-column, enabling efficient compression:

timestamp
10:00:01
10:00:02
10:00:03
10:00:04
10:00:05
sensorId
A
A
A
A
A
temperature
23.5
23.6
23.7
23.8
23.9
humidity
45
46
44
45
47
06

Compression Magic: Delta Encoding

💰 Why This Matters

Compression directly impacts your costs. At scale, storing 1TB of raw sensor data might cost $100/month. With 90% compression, that drops to $10/month. Time series collections achieve this automatically through delta encoding - no application changes needed. Understanding how it works helps you design schemas that compress even better.

Time series data is highly compressible because consecutive values are often similar. MongoDB uses delta encoding to store only the differences between values:

📊 Original Timestamps
t₁ 1702742401000
t₂ 1702742402000
t₃ 1702742403000
t₄ 1702742404000
t₅ 1702742405000
Total: 65 bytes
🗜️ Delta Encoded
base 1702742401000
Δ₁ +1000
Δ₂ +1000
Δ₃ +1000
Δ₄ +1000
Total: 21 bytes
100 MB
Regular Collection
~90% smaller
10 MB
Time Series Collection
10x
Storage Reduction
1000+
Docs per Bucket
90%
Fewer Index Entries
50%
Faster Queries
Part III

Internal Architecture

Deep dive into how MongoDB manages buckets, memory, and storage

07

Internal Architecture

📝

Application Layer

Your app writes documents with timestamp, metadata, and measurements

🔄

Time Series Engine

Routes documents, manages bucket lifecycle, handles compression

📊

Bucket Manager

Creates, closes, and maintains buckets based on time + metadata

🗜️

Column Compressor

Applies delta encoding, RLE, and zstd compression

📁

system.buckets.*

Internal collection storing compressed bucket documents

🔍

Clustered Index

Time-based clustering for efficient range queries

💾

WiredTiger Storage

Block-compressed storage with efficient I/O patterns

08

Bucket Lifecycle & Memory Management

Why This Matters

Memory management directly affects write performance. MongoDB keeps "open" buckets in RAM for fast inserts. If you have too many unique metadata combinations, you'll exhaust the bucket catalog and trigger constant disk I/O. Understanding the lifecycle helps you design metadata fields that don't explode your memory usage.

🧠 The Bucket Catalog

MongoDB maintains an in-memory Bucket Catalog that tracks all open buckets. This is crucial for high-performance writes - instead of searching disk for the right bucket, MongoDB keeps active buckets readily accessible in RAM.

💾 Memory Allocation
128
MB Used
256
MB Limit
50%

When memory pressure increases, older buckets are closed and flushed to disk.

📋 Open Bucket Registry 4 open buckets
sensor_A → Building_NYC
10:00:00 - 10:59:59 (current)
847 docs
sensor_B → Building_NYC
10:00:00 - 10:59:59 (current)
632 docs
sensor_A → Building_SF
09:00:00 - 09:59:59
1000 docs
sensor_C → Building_LA
08:00:00 - 08:59:59
Flushed

🔄 Bucket State Machine

Every bucket transitions through distinct states. Understanding this lifecycle is key to optimizing time series performance.

CREATED
New bucket allocated in memory
📝
OPEN
Actively receiving documents
🔒
CLOSING
Preparing for compression
💾
CLOSED
Compressed & persisted to disk

What Triggers Bucket Closure?

📊
Document Limit
~1000 docs
Bucket reaches maximum document count
💾
Size Limit
~125 KB
Uncompressed bucket size exceeds threshold
Time Boundary
Granularity
Timestamp falls outside bucket's time range

🎮 Live Bucket Manager Simulation

Memory Pressure

0 MB 0 MB / 256 MB 256 MB
Open Buckets: 0
Closed Buckets: 0
Docs/sec: 0

Active Buckets

09

WiredTiger Storage Integration

When a bucket is closed, it flows through MongoDB's WiredTiger storage engine. Here's the complete journey from memory to disk:

📝
Bucket (Memory)
Raw columnar data in RAM
🗜️
Delta Encoder
Compute deltas for timestamps
📊
RLE Compressor
Run-length encoding
🔧
BSON Serializer
Convert to BSON format
📋
WiredTiger
Block compression (zstd)
💾
Disk
Persistent storage

Compression Pipeline Breakdown

📦
Raw Data
1000 KB
Uncompressed columnar arrays
Δ
Delta Encoded
350 KB
-65% for sequential data
🔁
RLE Applied
180 KB
-50% for repeated values
Final (zstd)
90 KB
-50% block compression
🎯

Total Compression: 91% Reduction

From 1000 KB down to 90 KB. This is why time series collections are so storage-efficient! The combination of columnar storage + delta encoding + RLE + block compression creates massive savings.

Part IV

Operations

Creating collections, querying data, indexes, and aggregations

10

Creating a Time Series Collection

MongoDB Shell
// Create a time series collection
db.createCollection("sensorData", {
    timeseries: {
        timeField: "timestamp",        // Required: field containing timestamp
        metaField: "metadata",        // Optional: field for grouping
        granularity: "minutes"       // "seconds" | "minutes" | "hours"
    },
    expireAfterSeconds: 86400 * 30   // Optional: auto-delete after 30 days
});

// Insert a document
db.sensorData.insertOne({
    timestamp: new Date(),
    metadata: {
        sensorId: "sensor_001",
        location: "building_A"
    },
    temperature: 23.5,
    humidity: 45.2,
    pressure: 1013.25
});

Granularity Options

Granularity Bucket Time Span Best For Max Docs/Bucket
seconds 1 hour High-frequency data (100+ docs/min) ~1000
minutes 24 hours Medium frequency (1-100 docs/min) ~1000
hours 30 days Low frequency (<1 doc/min) ~1000
11

Write & Query Paths

Write Path

Optimal case: Documents arrive in chronological order. MongoDB efficiently appends to the current open bucket.

10:00:01
doc_1 → Bucket A
10:00:02
doc_2 → Bucket A
10:00:03
doc_3 → Bucket A

✓ Fast: Single bucket lookup, append operation

Out-of-order writes: When documents arrive with timestamps outside the current bucket's range, MongoDB must find or create the appropriate bucket.

10:00:01
doc_1 → Bucket A
09:45:00 ⚠️
doc_2 → New Bucket!
10:00:02
doc_3 → Bucket A

⚠️ Creates additional buckets, reducing compression efficiency

MongoDB 6.0+: Closed buckets can be reopened if a new document falls within their time range. This improves handling of late-arriving data.

Before 6.0

Late data → Creates new bucket

❌ More buckets, less compression

MongoDB 6.0+

Late data → Reopens existing bucket

✓ Better bucket utilization

💡

bucketMaxSpanSeconds & bucketRoundingSeconds

In MongoDB 6.3+, you can fine-tune bucket time spans with bucketMaxSpanSeconds for custom granularity beyond seconds/minutes/hours.

📥
Insert Document
🔍
Find/Create Bucket
📦
Add to Bucket
💾
Compress & Store
Step 1: Document Arrives
Extract timeField and metaField
MongoDB extracts the timestamp and metadata fields you specified when creating the collection.
Step 2: Bucket Selection
Find matching open bucket
The engine looks for an open bucket with matching metadata and time range. If none exists, a new bucket is created.
Step 3: Bucket Update
Append to columnar arrays
Each field value is appended to its corresponding column array within the bucket document.
Step 4: Bucket Close (When Full)
Apply final compression
When a bucket reaches its capacity or time limit, it's closed and final compression is applied.

Query Path

🔎
Parse Query
📋
Bucket Pruning
📖
Decompress
Reconstruct Docs

Query Optimization

Time-based queries skip entire buckets using the control.min/max bounds. A query for "last 24 hours" only reads recent buckets, not the entire collection!

12

Query Execution & Bucket Pruning

Why This Matters

Bucket pruning is why time series queries are fast. When you query "last 24 hours", MongoDB doesn't scan all your data. Each bucket has control.min and control.max fields that let MongoDB skip entire buckets that can't possibly contain matching documents. A year of data might have 8,760 buckets, but a "last hour" query only touches 1-2 buckets.

🔍 Step-by-Step: How a Read Query Executes

1
Query Arrives
db.sensors.find({
  timestamp: { $gte: ISODate("2024-01-15T10:00:00Z") },
  "metadata.sensorId": "sensor_001"
})

MongoDB parses the query and identifies filterable fields.

2
Secondary Index Lookup (on system.buckets.*)
Index Used
{ meta.sensorId: 1, control.min.timestamp: 1 }

Index returns bucket IDs that MIGHT contain matching documents (based on metadata match).

3
Bucket Pruning using control.min/max
Bucket A
min: Jan 14 08:00
max: Jan 14 09:00
❌ PRUNED
max < query min
Bucket B
min: Jan 14 09:00
max: Jan 14 10:00
❌ PRUNED
max < query min
Bucket C
min: Jan 15 09:30
max: Jan 15 10:30
✅ SCAN
overlaps query
Bucket D
min: Jan 15 10:30
max: Jan 15 11:30
✅ SCAN
after query min

Key insight: MongoDB checks each bucket's control.min.timestamp and control.max.timestamp. If the query range doesn't overlap → bucket is skipped entirely!

4
Decompress & Filter Matching Buckets
Compressed
12 KB
Decompressed
125 KB
Filtered Results
847 docs

Only matching buckets are decompressed. Documents are reconstructed from columnar format and filtered.

🎮 Interactive Query Analyzer

db.sensors.find({ timestamp: { $gte: ISODate("...") } })
Bucket Scan Visualization (30 buckets over 30 days):
Scanned (I/O required)
Pruned (Zero I/O!)
3
Buckets Scanned
27
Buckets Pruned
90%
I/O Saved
3.6 MB
vs 36 MB full scan
13

Secondary Indexes on Time Series

🔍 Why This Matters

Indexes on time series collections are 1000x smaller than on regular collections because they index buckets, not individual documents. A collection with 10 million documents might have only 10,000 buckets - so your index has 10,000 entries instead of 10 million. But they work differently: indexes are created on the internal system.buckets.* collection, and query planning uses bucket metadata for pruning.

Time series collections support secondary indexes, but they work differently than regular collections:

✅ Automatically Created

Clustered Index
{ meta: 1, timestamp: 1 }

Created automatically on the internal bucket collection

📌 Common Index Patterns

// Index on metadata subfields
db.sensors.createIndex({ "metadata.region": 1 })

// Compound with time (for range + filter)
db.sensors.createIndex({ 
  "metadata.sensorType": 1,
  "timestamp": -1 
})

// Index on measurement fields
db.sensors.createIndex({ "temperature": 1 })

🔍 How Indexes Work Internally

1
Index on Bucket Documents

Indexes are created on system.buckets.*, not the view

2
One Entry Per Bucket

~1000x fewer index entries than regular collections

3
Uses control.min/max

Bucket bounds enable efficient range pruning

🎯 Interactive Index Query Visualizer

Select Query Type:
Generated Query:

⚠️ Important Distinction: The B-Tree Index (indexing structure) and Buckets (time series storage) are separate concepts. The index stores pointers TO bucket documents, not the buckets themselves.

🌳
B-Tree Index
Indexing Layer
{ meta.region: 1 }
INDEX ROOT meta.region us-east → ObjectId... → ObjectId... us-west → ObjectId... → ObjectId... eu-west → ObjectId... → ObjectId... Index stores keys + pointers to buckets
📦
Bucket Storage
system.buckets.*
Bucket 1 us-east
min.t: Jan 15 00:00
max.t: Jan 15 01:00
docs: 847
💤 Compressed
Bucket 2 us-east
min.t: Jan 15 01:00
max.t: Jan 15 02:00
docs: 923
💤 Compressed
Bucket 3 us-west
min.t: Jan 15 00:00
max.t: Jan 15 01:00
docs: 756
💤 Compressed
Bucket 4 us-west
min.t: Jan 15 01:00
max.t: Jan 15 02:00
docs: 812
💤 Compressed
Bucket 5 eu-west
min.t: Jan 15 00:00
max.t: Jan 15 01:00
docs: 634
💤 Compressed
Bucket 6 eu-west
min.t: Jan 15 01:00
max.t: Jan 15 02:00
docs: 701
💤 Compressed
Fetched Pruned by control.min/max Skipped (wrong meta)
Query Execution Flow
1
Parse Query
Analyze predicates
2
Index Lookup
B-tree traversal
3
Get Pointers
Bucket ObjectIds
4
Prune Buckets
Check min/max
5
Decompress
Filter & return
14

Time Series Aggregations

MongoDB provides powerful aggregation stages designed specifically for time series analysis:

📅

$densify

Fill gaps in time series data by generating documents for missing time intervals.

{ $densify: {
  field: "timestamp",
  range: {
    step: 1,
    unit: "hour",
    bounds: "full"
  }
}}
🔄

$fill

Fill null values using linear interpolation or last observed value (LOCF).

{ $fill: {
  sortBy: { timestamp: 1 },
  output: {
    temperature: {
      method: "linear"
    }
  }
}}
📊

$setWindowFields

Compute moving averages, running totals, and rankings over time windows.

{ $setWindowFields: {
  sortBy: { timestamp: 1 },
  output: {
    movingAvg: {
      $avg: "$temp",
      window: { range: [-1, 0], unit: "hour" }
    }
  }
}}
Part V

Comparisons & Trade-offs

Understanding when to use time series collections and their limitations

15

Regular Collection vs Time Series

Aspect Regular Collection Time Series Collection
Document Storage One document = one BSON document Many documents = one bucket document
Field Storage Row-oriented (all fields together) Column-oriented (fields stored separately)
Compression Block-level only (WiredTiger) Delta + RLE + Block compression
Index Entries One per document One per bucket (~1000x fewer)
Time Range Queries Scans matching documents Bucket pruning + clustered access
Write Pattern Individual inserts Batched into bucket updates
Storage for 1M docs ~500 MB ~50 MB (90% savings)
16

MongoDB vs Other Time Series Databases

Feature MongoDB InfluxDB TimescaleDB
Data Model Document (BSON) Line Protocol Relational (SQL)
Query Language MQL + Aggregation Flux / InfluxQL SQL
Schema Flexibility ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Joins & Relations $lookup, embedding Limited Full SQL joins
Ecosystem Atlas, Charts, Search Telegraf, Grafana PostgreSQL ecosystem
Best For Mixed workloads Pure metrics SQL shops
Compression ~90% ~95% ~85%
🎯

MongoDB's Unique Advantage

MongoDB lets you store time series data alongside your operational data in one database. No need for a separate TSDB + complex ETL pipelines. Combine sensor readings with device metadata, user profiles, and application state in unified queries.

17

Limitations & Gotchas

Time series collections have some restrictions to be aware of. Understanding these helps you design better schemas:

🚫

Cannot Modify timeField

Once inserted, the timestamp field cannot be updated. You must delete and re-insert to change it.

🚫

Cannot Modify metaField

The metadata field is immutable after insertion. Plan your metadata structure carefully upfront.

🚫

No Transactions

Time series collections don't support multi-document transactions. Use for append-only workloads.

🚫

No Change Streams (Pre-6.0)

Change streams weren't supported until MongoDB 6.0. Now available with some limitations.

⚠️

Deletes are Expensive

Individual deletes must decompress buckets. Use TTL instead for automatic expiration.

⚠️

Schema Changes Limited

Can't change granularity or field mappings after creation. Plan schema carefully.

Pro Tip: Design for Append-Only

Time series collections work best for append-only, immutable data. If you need frequent updates, consider a regular collection with appropriate indexes.

Part VI

Tools & Reference

Best practices, calculators, and interactive demos

18

Best Practices

Choose Right Granularity

Match granularity to your data frequency. Higher frequency = finer granularity for optimal bucketing.

Use MetaField Wisely

Group related measurements together. High-cardinality metadata creates more buckets = less compression.

Insert In Order

Insert documents in time order when possible. Out-of-order inserts may create additional buckets.

Add Secondary Indexes

Create indexes on metaField sub-fields if you query by those values frequently.

Use TTL for Cleanup

Set expireAfterSeconds to automatically remove old data and keep storage costs down.

Query Time Ranges

Always include time range filters in queries to benefit from bucket pruning optimization.

19

Storage Savings Calculator

Estimate your storage savings by switching to time series collections:

📝 Input Your Data Profile

50% 85% 95%

📊 Storage Comparison

Regular Collection: 73.0 GB
Time Series Collection: 11.0 GB
62.0 GB
Storage Saved
85% compression
20

Version History & Evolution

MongoDB 5.0 July 2021
  • ✨ Time Series collections introduced
  • ✨ Automatic bucketing & compression
  • ✨ TTL support for automatic data expiration
  • ✨ Secondary indexes on metadata
MongoDB 6.0 July 2022
  • 🔄 Bucket reopening for out-of-order data
  • 📡 Change streams support
  • 🗑️ Delete operations support
  • 📊 $densify and $fill aggregation stages
MongoDB 6.3 March 2023
  • ⏱️ bucketMaxSpanSeconds for custom granularity
  • 🔄 bucketRoundingSeconds for alignment
  • 📈 Improved query performance
MongoDB 7.0+ August 2023
  • 🔀 Sharding support for time series
  • ⚡ Column store indexes (preview)
  • 📊 Enhanced aggregation pushdown
  • 🚀 Improved compression algorithms
21

Live Visualization

📥 Incoming Documents

📦 Buckets

0
Documents Inserted
0
Buckets
0%
Compression
0 KB
Storage Saved