MongoDB Time Series Collections

01

What are Time Series Collections?

🎯 Why This Matters

The Problem: Time series data (IoT sensors, metrics, logs) generates massive volumes - often millions of documents per day. Regular MongoDB collections store each document separately, leading to:

❌ 10x more storage than needed ❌ Slow range queries ❌ Index bloat

The Solution: Time Series collections automatically bucket, compress, and optimize this data - giving you 90%+ storage savings and 10x faster queries with zero application changes.

⏱️

Time-Ordered Data

Optimized for data that arrives sequentially over time: IoT sensors, metrics, logs, stock prices, and any timestamped events.

📦

Automatic Bucketing

MongoDB groups documents into "buckets" based on time intervals and metadata, dramatically reducing storage overhead.

🗜️

Column Compression

Data within buckets is stored in columnar format with delta encoding, achieving up to 90%+ compression ratios.

02

Real-World Use Cases

🌡️

IoT Sensor Data

Temperature, humidity, pressure readings

// Document structure
timeField: "timestamp"
metaField: "device"
granularity: "seconds"

High Frequency ~95% Compression

📈

Financial Market Data

Stock prices, trades, order book

// Document structure
timeField: "tradeTime"
metaField: "symbol"
granularity: "seconds"

Ultra High Freq ~80% Compression

🖥️

Application Metrics

CPU, memory, latency metrics

// Document structure
timeField: "collectedAt"
metaField: "host"
granularity: "minutes"

Medium Freq ~90% Compression

📋

Event Logs

Application logs, audit trails

// Document structure
timeField: "eventTime"
metaField: "service"
granularity: "hours"

Variable Freq ~70% Compression

03

Inside a Bucket Document

🎯 Why This Matters

Buckets are the secret sauce. Instead of storing 1 million individual documents, MongoDB groups them into ~1,000 buckets. Each bucket holds documents with the same metadata within a time window. This means: fewer documents to scan, better compression, and indexes that are 1000x smaller. Understanding bucket structure helps you choose the right granularity and metadata fields.

📥

Incoming Documents

10:00:01 sensor_A 23.5°C

10:00:02 sensor_B 45.2°C

10:00:03 sensor_A 23.6°C

10:00:04 sensor_B 45.1°C

10:00:05 sensor_A 23.7°C

→

🔀

Grouping by Metadata + Time

sensor_A group

10:00:01 23.5°C

10:00:03 23.6°C

10:00:05 23.7°C

sensor_B group

10:00:02 45.2°C

10:00:04 45.1°C

→

📦

Stored as Buckets

Bucket #1 - sensor_A 10:00 - 10:59

Bucket #2 - sensor_B 10:00 - 10:59

💡

Key Insight: Bucket Granularity

MongoDB automatically determines bucket boundaries based on the granularity setting (seconds, minutes, hours). Documents with the same metadata that arrive within the same time window go into the same bucket.

04

Granularity Deep Dive

⚠️

Common Misconception!

"seconds" granularity does NOT mean buckets close every second!

Granularity controls the bucket time span and timestamp rounding, not how often buckets close.

📏 What Granularity Actually Controls

⏱️

seconds

1 HOUR

Bucket Time Span

Best for: >1 doc/sec

⏱️

minutes

24 HOURS

Bucket Time Span

Best for: 1 doc/min

⏱️

hours

30 DAYS

Bucket Time Span

Best for: <1 doc/hour

🔄 When Do Buckets Actually Close?

A bucket closes when ANY of these conditions is met:

📊

~1000

Documents reached

💾

~125 KB

Size limit reached

⏰

Time Span

Exceeded (1h/24h/30d)

📝 Example: "seconds" granularity with 10 docs/second

                                Bucket fills in:
                                1000 docs ÷ 10 docs/sec = 
                                ~100 seconds
                            

NOT every second! The bucket stays open until it fills up or hits the 1-hour time boundary.

05

Inside a Bucket Document

Under the hood, MongoDB stores time series data in a special system.buckets.<collection> collection. Each bucket is a single document with a specific structure:

Actual Bucket Document (system.buckets.sensorData)

{
  "_id": ObjectId("..."),           // Encodes bucket start time
  "control": {
    "version": 1,                     // Bucket format version
    "min": {
      "_id": ObjectId("..."),        // Min values for pruning
      "timestamp": ISODate("2024-01-01T10:00:00Z"),
      "temperature": 22.1
    },
    "max": {
      "timestamp": ISODate("2024-01-01T10:59:59Z"),
      "temperature": 28.7
    },
    "closed": false,               // Is bucket accepting writes?
    "count": 847                     // Number of measurements
  },
  "meta": {                            // Your metaField value
    "sensorId": "sensor_001",
    "location": "NYC"
  },
  "data": {                            // Columnar compressed data
    "timestamp": { "0": ISODate(...), "1": ISODate(...), ... },
    "temperature": { "0": 23.5, "1": 23.6, "2": 23.7, ... },
    "humidity": { "0": 45, "1": 46, "2": 44, ... }
  }
}

Bucket Document Structure

BUCKET ID

_id: ObjectId("...")

CONTROL - TIME BOUNDS

control.min.timestamp

control.max.timestamp

METADATA

meta: { sensorId: "A", location: "NYC" }

COLUMNAR DATA

data.timestamp: [t1, t2, t3...]

data.temperature: [23.5, 23.6, 23.7...]

data.humidity: [45, 46, 44...]

Columnar Storage Format

Instead of storing each measurement as a separate document, values are stored column-by-column, enabling efficient compression:

timestamp

10:00:01

10:00:02

10:00:03

10:00:04

10:00:05

sensorId

A

temperature

23.5

23.6

23.7

23.8

23.9

humidity

45

46

44

45

47

06

Compression Magic: Delta Encoding

💰 Why This Matters

Compression directly impacts your costs. At scale, storing 1TB of raw sensor data might cost $100/month. With 90% compression, that drops to $10/month. Time series collections achieve this automatically through delta encoding - no application changes needed. Understanding how it works helps you design schemas that compress even better.

Time series data is highly compressible because consecutive values are often similar. MongoDB uses delta encoding to store only the differences between values:

📊 Original Timestamps

t₁ 1702742401000

t₂ 1702742402000

t₃ 1702742403000

t₄ 1702742404000

t₅ 1702742405000

Total: 65 bytes

→

🗜️ Delta Encoded

base 1702742401000

Δ₁ +1000

Δ₂ +1000

Δ₃ +1000

Δ₄ +1000

Total: 21 bytes

100 MB

Regular Collection

→

~90% smaller

10 MB

Time Series Collection

10x

Storage Reduction

1000+

Docs per Bucket

90%

Fewer Index Entries

50%

Faster Queries

07

Internal Architecture

📝

Application Layer

Your app writes documents with timestamp, metadata, and measurements

🔄

Time Series Engine

Routes documents, manages bucket lifecycle, handles compression

📊

Bucket Manager

Creates, closes, and maintains buckets based on time + metadata

🗜️

Column Compressor

Applies delta encoding, RLE, and zstd compression

📁

system.buckets.*

Internal collection storing compressed bucket documents

🔍

Clustered Index

Time-based clustering for efficient range queries

💾

WiredTiger Storage

Block-compressed storage with efficient I/O patterns

08

Bucket Lifecycle & Memory Management

⚡ Why This Matters

Memory management directly affects write performance. MongoDB keeps "open" buckets in RAM for fast inserts. If you have too many unique metadata combinations, you'll exhaust the bucket catalog and trigger constant disk I/O. Understanding the lifecycle helps you design metadata fields that don't explode your memory usage.

🧠 The Bucket Catalog

MongoDB maintains an in-memory Bucket Catalog that tracks all open buckets. This is crucial for high-performance writes - instead of searching disk for the right bucket, MongoDB keeps active buckets readily accessible in RAM.

💾 Memory Allocation

128

MB Used

256

MB Limit

50%

When memory pressure increases, older buckets are closed and flushed to disk.

📋 Open Bucket Registry 4 open buckets

sensor_A → Building_NYC

10:00:00 - 10:59:59 (current)

847 docs

sensor_B → Building_NYC

10:00:00 - 10:59:59 (current)

632 docs

sensor_A → Building_SF

09:00:00 - 09:59:59

1000 docs

sensor_C → Building_LA

08:00:00 - 08:59:59

Flushed

🔄 Bucket State Machine

Every bucket transitions through distinct states. Understanding this lifecycle is key to optimizing time series performance.

✨

CREATED

New bucket allocated in memory

📝

OPEN

Actively receiving documents

🔒

CLOSING

Preparing for compression

💾

CLOSED

Compressed & persisted to disk

What Triggers Bucket Closure?

📊

Document Limit

~1000 docs

Bucket reaches maximum document count

💾

Size Limit

~125 KB

Uncompressed bucket size exceeds threshold

⏰

Time Boundary

Granularity

Timestamp falls outside bucket's time range

🎮 Live Bucket Manager Simulation

Memory Pressure

0 MB 0 MB / 256 MB 256 MB

Open Buckets: 0

Closed Buckets: 0

Docs/sec: 0

Active Buckets

09

WiredTiger Storage Integration

When a bucket is closed, it flows through MongoDB's WiredTiger storage engine. Here's the complete journey from memory to disk:

📝

Bucket (Memory)

Raw columnar data in RAM

→

🗜️

Delta Encoder

Compute deltas for timestamps

→

📊

RLE Compressor

Run-length encoding

→

🔧

BSON Serializer

Convert to BSON format

→

📋

WiredTiger

Block compression (zstd)

→

💾

Disk

Persistent storage

Compression Pipeline Breakdown

📦

Raw Data

1000 KB

Uncompressed columnar arrays

Δ

Delta Encoded

350 KB

-65% for sequential data

🔁

RLE Applied

180 KB

-50% for repeated values

✓

Final (zstd)

90 KB

-50% block compression

🎯

Total Compression: 91% Reduction

From 1000 KB down to 90 KB. This is why time series collections are so storage-efficient! The combination of columnar storage + delta encoding + RLE + block compression creates massive savings.

10

Creating a Time Series Collection

MongoDB Shell

// Create a time series collection
db.createCollection("sensorData", {
    timeseries: {
        timeField: "timestamp",        // Required: field containing timestamp
        metaField: "metadata",        // Optional: field for grouping
        granularity: "minutes"       // "seconds" | "minutes" | "hours"
    },
    expireAfterSeconds: 86400 * 30   // Optional: auto-delete after 30 days
});

// Insert a document
db.sensorData.insertOne({
    timestamp: new Date(),
    metadata: {
        sensorId: "sensor_001",
        location: "building_A"
    },
    temperature: 23.5,
    humidity: 45.2,
    pressure: 1013.25
});

Granularity Options

Granularity	Bucket Time Span	Best For	Max Docs/Bucket
seconds	1 hour	High-frequency data (100+ docs/min)	~1000
minutes	24 hours	Medium frequency (1-100 docs/min)	~1000
hours	30 days	Low frequency (<1 doc/min)	~1000

11

Write & Query Paths

Write Path

Optimal case: Documents arrive in chronological order. MongoDB efficiently appends to the current open bucket.

10:00:01

doc_1 → Bucket A

→

10:00:02

doc_2 → Bucket A

→

10:00:03

doc_3 → Bucket A

✓ Fast: Single bucket lookup, append operation

Out-of-order writes: When documents arrive with timestamps outside the current bucket's range, MongoDB must find or create the appropriate bucket.

10:00:01

doc_1 → Bucket A

→

09:45:00 ⚠️

doc_2 → New Bucket!

→

10:00:02

doc_3 → Bucket A

⚠️ Creates additional buckets, reducing compression efficiency

MongoDB 6.0+: Closed buckets can be reopened if a new document falls within their time range. This improves handling of late-arriving data.

Before 6.0

Late data → Creates new bucket

❌ More buckets, less compression

MongoDB 6.0+

Late data → Reopens existing bucket

✓ Better bucket utilization

💡

bucketMaxSpanSeconds & bucketRoundingSeconds

In MongoDB 6.3+, you can fine-tune bucket time spans with bucketMaxSpanSeconds for custom granularity beyond seconds/minutes/hours.

📥

Insert Document

🔍

Find/Create Bucket

📦

Add to Bucket

💾

Compress & Store

Step 1: Document Arrives

Extract timeField and metaField

MongoDB extracts the timestamp and metadata fields you specified when creating the collection.

Step 2: Bucket Selection

Find matching open bucket

The engine looks for an open bucket with matching metadata and time range. If none exists, a new bucket is created.

Step 3: Bucket Update

Append to columnar arrays

Each field value is appended to its corresponding column array within the bucket document.

Step 4: Bucket Close (When Full)

Apply final compression

When a bucket reaches its capacity or time limit, it's closed and final compression is applied.

Query Path

🔎

Parse Query

📋

Bucket Pruning

📖

Decompress

✨

Reconstruct Docs

⚡

Query Optimization

Time-based queries skip entire buckets using the control.min/max bounds. A query for "last 24 hours" only reads recent buckets, not the entire collection!

12

Query Execution & Bucket Pruning

⚡ Why This Matters

Bucket pruning is why time series queries are fast. When you query "last 24 hours", MongoDB doesn't scan all your data. Each bucket has control.min and control.max fields that let MongoDB skip entire buckets that can't possibly contain matching documents. A year of data might have 8,760 buckets, but a "last hour" query only touches 1-2 buckets.

🔍 Step-by-Step: How a Read Query Executes

1

Query Arrives

                                    db.sensors.find({

                                      timestamp: { $gte: ISODate("2024-01-15T10:00:00Z") },

                                      "metadata.sensorId": "sensor_001"

                                    })

MongoDB parses the query and identifies filterable fields.

2

Secondary Index Lookup (on system.buckets.*)

Index Used

{ meta.sensorId: 1, control.min.timestamp: 1 }

Index returns bucket IDs that MIGHT contain matching documents (based on metadata match).

3

Bucket Pruning using control.min/max

Bucket A

                                            min: Jan 14 08:00

                                            max: Jan 14 09:00

❌ PRUNED

max < query min

Bucket B

                                            min: Jan 14 09:00

                                            max: Jan 14 10:00

❌ PRUNED

max < query min

Bucket C

                                            min: Jan 15 09:30

                                            max: Jan 15 10:30

✅ SCAN

overlaps query

Bucket D

                                            min: Jan 15 10:30

                                            max: Jan 15 11:30

✅ SCAN

after query min

Key insight: MongoDB checks each bucket's control.min.timestamp and control.max.timestamp. If the query range doesn't overlap → bucket is skipped entirely!

4

Decompress & Filter Matching Buckets

Compressed

12 KB

→

Decompressed

125 KB

→

Filtered Results

847 docs

Only matching buckets are decompressed. Documents are reconstructed from columnar format and filtered.

🎮 Interactive Query Analyzer

Select Query Type:

Query Preview:

                            db.sensors.find({ timestamp: { $gte: ISODate("...") } })
                        

Bucket Scan Visualization (30 buckets over 30 days):

Scanned (I/O required)

Pruned (Zero I/O!)

3

Buckets Scanned

27

Buckets Pruned

90%

I/O Saved

3.6 MB

vs 36 MB full scan

13

Secondary Indexes on Time Series

🔍 Why This Matters

Indexes on time series collections are 1000x smaller than on regular collections because they index buckets, not individual documents. A collection with 10 million documents might have only 10,000 buckets - so your index has 10,000 entries instead of 10 million. But they work differently: indexes are created on the internal system.buckets.* collection, and query planning uses bucket metadata for pruning.

Time series collections support secondary indexes, but they work differently than regular collections:

✅ Automatically Created

Clustered Index

{ meta: 1, timestamp: 1 }

Created automatically on the internal bucket collection

📌 Common Index Patterns

// Index on metadata subfields
db.sensors.createIndex({ "metadata.region": 1 })

// Compound with time (for range + filter)
db.sensors.createIndex({ 
  "metadata.sensorType": 1,
  "timestamp": -1 
})

// Index on measurement fields
db.sensors.createIndex({ "temperature": 1 })

🔍 How Indexes Work Internally

1

Index on Bucket Documents

Indexes are created on system.buckets.*, not the view

2

One Entry Per Bucket

~1000x fewer index entries than regular collections

3

Uses control.min/max

Bucket bounds enable efficient range pruning

🎯 Interactive Index Query Visualizer

Select Query Type:

Generated Query:

⚠️ Important Distinction: The B-Tree Index (indexing structure) and Buckets (time series storage) are separate concepts. The index stores pointers TO bucket documents, not the buckets themselves.

🌳

B-Tree Index

Indexing Layer

                                { meta.region: 1 }
                            

📦

Bucket Storage

system.buckets.*

Bucket 1 us-east

min.t: Jan 15 00:00

max.t: Jan 15 01:00

docs: 847

💤 Compressed

Bucket 2 us-east

min.t: Jan 15 01:00

max.t: Jan 15 02:00

docs: 923

💤 Compressed

Bucket 3 us-west

min.t: Jan 15 00:00

max.t: Jan 15 01:00

docs: 756

💤 Compressed

Bucket 4 us-west

min.t: Jan 15 01:00

max.t: Jan 15 02:00

docs: 812

💤 Compressed

Bucket 5 eu-west

min.t: Jan 15 00:00

max.t: Jan 15 01:00

docs: 634

💤 Compressed

Bucket 6 eu-west

min.t: Jan 15 01:00

max.t: Jan 15 02:00

docs: 701

💤 Compressed

Fetched Pruned by control.min/max Skipped (wrong meta)

Query Execution Flow

1

Parse Query

Analyze predicates

⏳

2

Index Lookup

B-tree traversal

⏳

3

Get Pointers

Bucket ObjectIds

⏳

4

Prune Buckets

Check min/max

⏳

5

Decompress

Filter & return

⏳

14

Time Series Aggregations

MongoDB provides powerful aggregation stages designed specifically for time series analysis:

📅

$densify

Fill gaps in time series data by generating documents for missing time intervals.

{ $densify: {
  field: "timestamp",
  range: {
    step: 1,
    unit: "hour",
    bounds: "full"
  }
}}

🔄

$fill

Fill null values using linear interpolation or last observed value (LOCF).

{ $fill: {
  sortBy: { timestamp: 1 },
  output: {
    temperature: {
      method: "linear"
    }
  }
}}

📊

$setWindowFields

Compute moving averages, running totals, and rankings over time windows.

{ $setWindowFields: {
  sortBy: { timestamp: 1 },
  output: {
    movingAvg: {
      $avg: "$temp",
      window: { range: [-1, 0], unit: "hour" }
    }
  }
}}

15

Regular Collection vs Time Series

Aspect	Regular Collection	Time Series Collection
Document Storage	One document = one BSON document	Many documents = one bucket document
Field Storage	Row-oriented (all fields together)	Column-oriented (fields stored separately)
Compression	Block-level only (WiredTiger)	Delta + RLE + Block compression
Index Entries	One per document	One per bucket (~1000x fewer)
Time Range Queries	Scans matching documents	Bucket pruning + clustered access
Write Pattern	Individual inserts	Batched into bucket updates
Storage for 1M docs	~500 MB	~50 MB (90% savings)

16

MongoDB vs Other Time Series Databases

Feature	MongoDB	InfluxDB	TimescaleDB
Data Model	Document (BSON)	Line Protocol	Relational (SQL)
Query Language	MQL + Aggregation	Flux / InfluxQL	SQL
Schema Flexibility	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
Joins & Relations	$lookup, embedding	Limited	Full SQL joins
Ecosystem	Atlas, Charts, Search	Telegraf, Grafana	PostgreSQL ecosystem
Best For	Mixed workloads	Pure metrics	SQL shops
Compression	~90%	~95%	~85%

🎯

MongoDB's Unique Advantage

MongoDB lets you store time series data alongside your operational data in one database. No need for a separate TSDB + complex ETL pipelines. Combine sensor readings with device metadata, user profiles, and application state in unified queries.

17

Limitations & Gotchas

Time series collections have some restrictions to be aware of. Understanding these helps you design better schemas:

🚫

Cannot Modify timeField

Once inserted, the timestamp field cannot be updated. You must delete and re-insert to change it.

🚫

Cannot Modify metaField

The metadata field is immutable after insertion. Plan your metadata structure carefully upfront.

🚫

No Transactions

Time series collections don't support multi-document transactions. Use for append-only workloads.

🚫

No Change Streams (Pre-6.0)

Change streams weren't supported until MongoDB 6.0. Now available with some limitations.

⚠️

Deletes are Expensive

Individual deletes must decompress buckets. Use TTL instead for automatic expiration.

⚠️

Schema Changes Limited

Can't change granularity or field mappings after creation. Plan schema carefully.

✅

Pro Tip: Design for Append-Only

Time series collections work best for append-only, immutable data. If you need frequent updates, consider a regular collection with appropriate indexes.

18

Best Practices

✅

Choose Right Granularity

Match granularity to your data frequency. Higher frequency = finer granularity for optimal bucketing.

✅

Use MetaField Wisely

Group related measurements together. High-cardinality metadata creates more buckets = less compression.

✅

Insert In Order

Insert documents in time order when possible. Out-of-order inserts may create additional buckets.

✅

Add Secondary Indexes

Create indexes on metaField sub-fields if you query by those values frequently.

✅

Use TTL for Cleanup

Set expireAfterSeconds to automatically remove old data and keep storage costs down.

✅

Query Time Ranges

Always include time range filters in queries to benefit from bucket pruning optimization.

19

Storage Savings Calculator

Estimate your storage savings by switching to time series collections:

📝 Input Your Data Profile

Documents per day:

Average document size (bytes):

Retention period (days):

Compression ratio (adjust based on data):

50% 85% 95%

📊 Storage Comparison

Regular Collection: 73.0 GB

Time Series Collection: 11.0 GB

62.0 GB

Storage Saved

85% compression

20

Version History & Evolution

MongoDB 5.0 July 2021

✨ Time Series collections introduced
✨ Automatic bucketing & compression
✨ TTL support for automatic data expiration
✨ Secondary indexes on metadata

MongoDB 6.0 July 2022

🔄 Bucket reopening for out-of-order data
📡 Change streams support
🗑️ Delete operations support
📊 $densify and $fill aggregation stages

MongoDB 6.3 March 2023

⏱️ bucketMaxSpanSeconds for custom granularity
🔄 bucketRoundingSeconds for alignment
📈 Improved query performance

MongoDB 7.0+ August 2023

🔀 Sharding support for time series
⚡ Column store indexes (preview)
📊 Enhanced aggregation pushdown
🚀 Improved compression algorithms

21

Live Visualization

📥 Incoming Documents

📦 Buckets

0

Documents Inserted

0

Buckets

0%

Compression

0 KB

Storage Saved

Fundamentals

What are Time Series Collections?

🎯 Why This Matters

Time-Ordered Data

Automatic Bucketing

Column Compression

Real-World Use Cases

IoT Sensor Data

Financial Market Data

Application Metrics

Event Logs

Core Concepts

Inside a Bucket Document

🎯 Why This Matters

Key Insight: Bucket Granularity

Granularity Deep Dive

Common Misconception!

📏 What Granularity Actually Controls

🔄 When Do Buckets Actually Close?

📝 Example: "seconds" granularity with 10 docs/second

Inside a Bucket Document

Bucket Document Structure

Columnar Storage Format

Compression Magic: Delta Encoding

💰 Why This Matters

Internal Architecture

Internal Architecture

Application Layer

Time Series Engine

Bucket Manager

Column Compressor

system.buckets.*

Clustered Index

WiredTiger Storage

Bucket Lifecycle & Memory Management

⚡ Why This Matters

🧠 The Bucket Catalog

🔄 Bucket State Machine

What Triggers Bucket Closure?

🎮 Live Bucket Manager Simulation

Memory Pressure

Active Buckets

WiredTiger Storage Integration

Compression Pipeline Breakdown

Total Compression: 91% Reduction

Operations

Creating a Time Series Collection

Granularity Options

Write & Query Paths

Write Path

Before 6.0

MongoDB 6.0+

bucketMaxSpanSeconds & bucketRoundingSeconds

Query Path

Query Optimization

Query Execution & Bucket Pruning

⚡ Why This Matters

🔍 Step-by-Step: How a Read Query Executes

Query Arrives

Secondary Index Lookup (on system.buckets.*)

Bucket Pruning using control.min/max

Decompress & Filter Matching Buckets

🎮 Interactive Query Analyzer

Bucket Scan Visualization (30 buckets over 30 days):

Secondary Indexes on Time Series

🔍 Why This Matters

✅ Automatically Created

📌 Common Index Patterns

🔍 How Indexes Work Internally

🎯 Interactive Index Query Visualizer

B-Tree Index

Bucket Storage

Query Execution Flow

📈 Query Statistics

Time Series Aggregations

$densify

$fill

$setWindowFields

Comparisons & Trade-offs

Regular Collection vs Time Series