What is MongoDB Aggregation?

MongoDB's Aggregation Framework is a powerful data processing pipeline that allows you to transform and analyze documents in a collection. Think of it as a sophisticated assembly line where your data flows through multiple stages, each performing a specific transformation.

Key Concept: An aggregation pipeline consists of one or more stages that process documents sequentially. Each stage transforms the documents and passes the results to the next stage.

Visual Representation

šŸ“„ Input
Collection Documents
→
Stage 1
Filter
→
Stage 2
Transform
→
Stage 3
Group
→
šŸ“¤ Output
Results

Basic Syntax

db.collection.aggregate([
  { $stage1: { ... } },
  { $stage2: { ... } },
  { $stage3: { ... } }
])

// Example: Find active users and count by country
db.users.aggregate([
  { $match: { status: "active" } },
  { $group: {
      _id: "$country",
      count: { $sum: 1 }
    }
  },
  { $sort: { count: -1 } }
])

Core Characteristics

  • Pipeline-based: Data flows through stages sequentially
  • Declarative: You specify what you want, not how to get it
  • Server-side: Processing happens in MongoDB, reducing network overhead
  • Optimized: MongoDB can reorder and combine stages for efficiency
  • Flexible: Supports complex transformations, joins, and analytics

Why Use Aggregation Pipelines?

Aggregation pipelines solve complex data processing challenges that simple queries cannot handle. They're essential for analytics, reporting, and data transformation tasks.

1. Performance & Efficiency

Server-Side Processing: All computation happens on the database server, minimizing data transfer and leveraging MongoDB's optimized query engine.
// āŒ BAD: Fetch all data and process in application
const users = await db.users.find({}).toArray();
const result = users
  .filter(u => u.age >= 18)
  .reduce((acc, u) => {
    acc[u.country] = (acc[u.country] || 0) + 1;
    return acc;
  }, {});

// āœ… GOOD: Process on database server
const result = await db.users.aggregate([
  { $match: { age: { $gte: 18 } } },
  { $group: { _id: "$country", count: { $sum: 1 } } }
]).toArray();

2. Complex Data Transformations

Aggregation enables sophisticated operations that would be difficult or impossible with simple queries:

  • Grouping & Aggregating: Calculate sums, averages, min/max across groups
  • Joining Collections: Combine data from multiple collections ($lookup)
  • Array Operations: Flatten, filter, and transform array fields
  • Computed Fields: Create new fields based on calculations
  • Text Search & Analysis: Full-text search with scoring

3. Real-World Use Cases

šŸ“Š Analytics & Reporting

Sales by region
User engagement metrics
Revenue trends

šŸ”„ Data Transformation

ETL pipelines
Data normalization
Format conversion

šŸŽÆ Business Intelligence

Customer segmentation
Cohort analysis
Funnel metrics

šŸ” Advanced Queries

Multi-collection joins
Geospatial analysis
Time-series data

4. When to Use Aggregation vs. Find

// Use find() for simple queries
db.users.find({ age: { $gte: 18 }, country: "USA" })

// Use aggregate() when you need:
// - Grouping and calculations
// - Multiple transformations
// - Joins with other collections
// - Complex filtering and reshaping
db.users.aggregate([
  { $match: { age: { $gte: 18 } } },
  { $group: {
      _id: "$country",
      avgAge: { $avg: "$age" },
      total: { $sum: 1 }
    }
  },
  { $sort: { total: -1 } },
  { $limit: 10 }
])

How Aggregation Pipelines Work

Understanding the execution flow of aggregation pipelines is crucial for writing efficient queries and debugging issues.

Pipeline Execution Flow

1ļøāƒ£ Parse
Validate syntax
→
2ļøāƒ£ Optimize
Reorder stages
→
3ļøāƒ£ Execute
Process documents
→
4ļøāƒ£ Return
Send results

Stage-by-Stage Processing

Each stage in the pipeline receives documents from the previous stage, processes them, and outputs transformed documents to the next stage.

// Example: E-commerce order analysis
db.orders.aggregate([
  // Stage 1: Filter orders from 2024
  { $match: {
      orderDate: {
        $gte: new Date("2024-01-01"),
        $lt: new Date("2025-01-01")
      }
    }
  },

  // Stage 2: Group by customer and calculate totals
  { $group: {
      _id: "$customerId",
      totalSpent: { $sum: "$amount" },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: "$amount" }
    }
  },

  // Stage 3: Filter high-value customers
  { $match: { totalSpent: { $gte: 1000 } } },

  // Stage 4: Sort by total spent
  { $sort: { totalSpent: -1 } },

  // Stage 5: Limit to top 10
  { $limit: 10 }
])

Pipeline Optimization

MongoDB automatically optimizes pipelines! It can reorder stages, combine operations, and push filters early to reduce the number of documents processed.
  • Early Filtering: $match stages are moved to the beginning when possible
  • Index Usage: Initial $match and $sort can use indexes
  • Stage Coalescing: Adjacent stages may be combined
  • Projection Pushdown: Unnecessary fields are excluded early

Memory Considerations

// By default, each stage limited to 100MB RAM
// Use allowDiskUse for large datasets
db.collection.aggregate(
  [ /* stages */ ],
  { allowDiskUse: true }
)

// Best practices:
// 1. Filter early with $match
// 2. Project only needed fields
// 3. Use indexes when possible
// 4. Limit result set size

Pipeline Stages Overview

MongoDB provides 30+ aggregation stages. Here are the most commonly used ones:

$match

Filters documents (like find())

$group

Groups documents and calculates aggregates

$project

Reshapes documents, includes/excludes fields

$sort

Orders documents

$limit

Limits number of documents

$skip

Skips documents (pagination)

$lookup

Joins with another collection

$unwind

Deconstructs array fields

$addFields

Adds new computed fields

$count

Counts documents

$facet

Multiple parallel pipelines

$bucket

Categorizes into buckets

Stage Categories

šŸ” Filtering

$match
$limit
$skip
$sample

šŸ”„ Transformation

$project
$addFields
$set
$unset

šŸ“Š Grouping

$group
$bucket
$bucketAuto
$count

šŸ”— Joining

$lookup
$graphLookup
$unionWith

šŸ“‹ Ordering

$sort
$sortByCount

šŸŽÆ Array Operations

$unwind
$filter
$map

$match Stage

The $match stage filters documents, similar to the find() query. It's one of the most important stages for performance optimization.

Best Practice: Place $match as early as possible in your pipeline to reduce the number of documents processed by subsequent stages.
šŸ”
$match Visualization
{ $match: { status: "completed" } }

Input Documents 5

status: "completed"
amount: 150
status: "pending"
amount: 200
status: "completed"
amount: 75
status: "cancelled"
amount: 300
status: "completed"
amount: 425
$match

Output Documents ?

Click "Run Stage" to see filtered results

Basic Syntax

{ $match: { <query> } }

// Examples:
{ $match: { status: "active" } }
{ $match: { age: { $gte: 18 } } }
{ $match: {
    $and: [
      { price: { $gte: 100 } },
      { category: "electronics" }
    ]
  }
}

Common Use Cases

// 1. Filter by field value
{ $match: { country: "USA" } }

// 2. Range queries
{ $match: {
    orderDate: {
      $gte: new Date("2024-01-01"),
      $lt: new Date("2024-12-31")
    }
  }
}

// 3. Array contains
{ $match: { tags: "mongodb" } }

// 4. Regular expressions
{ $match: { email: { $regex: "@gmail\\.com$" } } }

// 5. Exists check
{ $match: { phoneNumber: { $exists: true } } }

// 6. Multiple conditions
{ $match: {
    $or: [
      { status: "premium" },
      { totalSpent: { $gte: 1000 } }
    ]
  }
}

Performance Tips

  • Use indexes: $match at the beginning can use indexes
  • Filter early: Reduce documents before expensive operations
  • Combine with $project: Remove unnecessary fields after matching
// āœ… GOOD: Match first, then process
db.orders.aggregate([
  { $match: { status: "completed" } }, // Uses index
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
])

// āŒ BAD: Group all, then filter
db.orders.aggregate([
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
  { $match: { total: { $gte: 1000 } } }
])

$group Stage

The $group stage groups documents by a specified expression and can perform calculations on grouped data.

Key Concept: $group creates a new document for each unique group, with _id representing the group key.
šŸ“Š
$group Visualization
{ $group: { _id: "$category", count: { $sum: 1 }, total: { $sum: "$price" } } }

Input Documents 6

category: "Electronics"
price: 299
category: "Books"
price: 15
category: "Electronics"
price: 599
category: "Clothing"
price: 45
category: "Books"
price: 25
category: "Electronics"
price: 149
$group

Output Documents ?

Click "Run Stage" to see grouped results

Basic Syntax

{ $group: {
    _id: <expression>,  // Group key
    <field1>: { <accumulator>: <expression> },
    <field2>: { <accumulator>: <expression> }
  }
}

Common Accumulators

// $sum - Count or sum values
{ $group: {
    _id: "$category",
    count: { $sum: 1 },
    totalSales: { $sum: "$amount" }
  }
}

// $avg - Calculate average
{ $group: {
    _id: "$department",
    avgSalary: { $avg: "$salary" }
  }
}

// $min and $max - Find extremes
{ $group: {
    _id: "$product",
    minPrice: { $min: "$price" },
    maxPrice: { $max: "$price" }
  }
}

// $push - Create array of values
{ $group: {
    _id: "$customerId",
    orders: { $push: "$orderId" }
  }
}

// $addToSet - Create array of unique values
{ $group: {
    _id: "$userId",
    uniqueProducts: { $addToSet: "$productId" }
  }
}

// $first and $last - Get first/last value
{ $group: {
    _id: "$customerId",
    firstOrder: { $first: "$orderDate" },
    lastOrder: { $last: "$orderDate" }
  }
}

Grouping Examples

// Group by single field
{ $group: {
    _id: "$country",
    count: { $sum: 1 }
  }
}

// Group by multiple fields
{ $group: {
    _id: {
      country: "$country",
      city: "$city"
    },
    count: { $sum: 1 }
  }
}

// Group all documents (count total)
{ $group: {
    _id: null,
    total: { $sum: 1 },
    avgAge: { $avg: "$age" }
  }
}

// Group by date (year/month)
{ $group: {
    _id: {
      year: { $year: "$orderDate" },
      month: { $month: "$orderDate" }
    },
    revenue: { $sum: "$amount" }
  }
}

$project Stage

The $project stage reshapes documents by including, excluding, or adding fields. It's essential for controlling output format and creating computed fields.

āœ‚ļø
$project Visualization
{ $project: { name: 1, total: { $multiply: ["$price", "$qty"] }, _id: 0 } }

Input Documents 3

_id: 1
name: "Laptop"
price: 999
qty: 2
category: "Tech"
_id: 2
name: "Mouse"
price: 29
qty: 5
category: "Tech"
_id: 3
name: "Desk"
price: 250
qty: 1
category: "Furniture"
$project

Output Documents ?

Click "Run Stage" to see reshaped documents

Basic Syntax

{ $project: {
    field1: 1,        // Include field
    field2: 0,        // Exclude field
    newField: <expression>  // Computed field
  }
}

Common Operations

// 1. Include specific fields
{ $project: {
    name: 1,
    email: 1,
    age: 1
  }
}

// 2. Exclude fields (exclude _id)
{ $project: {
    _id: 0,
    password: 0
  }
}

// 3. Rename fields
{ $project: {
    fullName: "$name",
    emailAddress: "$email"
  }
}

// 4. Computed fields
{ $project: {
    name: 1,
    total: { $multiply: ["$price", "$quantity"] },
    discountedPrice: {
      $subtract: [
        "$price",
        { $multiply: ["$price", "$discountPercent"] }
      ]
    }
  }
}

// 5. String operations
{ $project: {
    upperName: { $toUpper: "$name" },
    firstName: { $arrayElemAt: [{ $split: ["$name", " "] }, 0] },
    emailDomain: {
      $arrayElemAt: [
        { $split: ["$email", "@"] },
        1
      ]
    }
  }
}

// 6. Date operations
{ $project: {
    year: { $year: "$createdAt" },
    month: { $month: "$createdAt" },
    dayOfWeek: { $dayOfWeek: "$createdAt" }
  }
}

// 7. Conditional fields
{ $project: {
    name: 1,
    status: {
      $cond: {
        if: { $gte: ["$age", 18] },
        then: "adult",
        else: "minor"
      }
    }
  }
}

Nested Field Access

// Access nested fields with dot notation
{ $project: {
    userName: "$user.name",
    street: "$address.street",
    city: "$address.city"
  }
}

// Array element access
{ $project: {
    firstTag: { $arrayElemAt: ["$tags", 0] },
    tagCount: { $size: "$tags" }
  }
}

$sort Stage

The $sort stage orders documents by specified fields.

ā†•ļø
$sort Visualization
{ $sort: { score: -1 } }  // Descending by score

Input (Unsorted) 5

name: "Alice"
score: 85
name: "Bob"
score: 92
name: "Carol"
score: 78
name: "Dave"
score: 95
name: "Eve"
score: 88
$sort

Output (Sorted) 5

Click sort buttons to see ordering

// 1 = ascending, -1 = descending
{ $sort: { field: 1 | -1 } }

// Examples:
{ $sort: { age: 1 } }              // Ascending by age
{ $sort: { createdAt: -1 } }      // Descending by date
{ $sort: {
    country: 1,
    city: 1,
    name: 1
  }
}  // Multi-field sort
Performance Tip: $sort can use an index if it's the first stage or immediately follows $match. Otherwise, it requires an in-memory sort (limited to 100MB unless allowDiskUse is enabled).

$limit & $skip Stages

These stages control the number of documents returned, commonly used for pagination.

šŸ“„
$limit & $skip Visualization

Input Documents 6

#1 "Product A"
#2 "Product B"
#3 "Product C"
#4 "Product D"
#5 "Product E"
#6 "Product F"
$limit / $skip

Output Documents ?

Click buttons to see $limit and $skip in action

$limit

// Return only first N documents
{ $limit: 10 }

// Example: Top 5 products by sales
db.products.aggregate([
  { $sort: { sales: -1 } },
  { $limit: 5 }
])

$skip

// Skip first N documents
{ $skip: 20 }

// Example: Pagination (page 3, 10 items per page)
const page = 3;
const pageSize = 10;

db.products.aggregate([
  { $sort: { name: 1 } },
  { $skip: (page - 1) * pageSize },
  { $limit: pageSize }
])

Pagination Pattern

// Complete pagination with total count
db.products.aggregate([
  { $match: { category: "electronics" } },
  { $facet: {
      metadata: [
        { $count: "total" }
      ],
      data: [
        { $sort: { price: -1 } },
        { $skip: 20 },
        { $limit: 10 }
      ]
    }
  }
])

$lookup Stage (Join)

The $lookup stage performs a left outer join with another collection, similar to SQL JOIN.

Important: $lookup can impact performance. Ensure proper indexing on the foreign collection's join field.
šŸ”—
$lookup Visualization (JOIN)

Orders Collection

orderId: "ORD-1"
customerId: 101
amount: 250
orderId: "ORD-2"
customerId: 102
amount: 180

Customers Collection

_id: 101
name: "Alice"
email: "alice@..."
_id: 102
name: "Bob"
email: "bob@..."
{ $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customer" } }

Joined Output ?

Click "Run Join" to see the joined documents

Basic Syntax

{ $lookup: {
    from: "<collection to join>",
    localField: "<field from input documents>",
    foreignField: "<field from 'from' collection>",
    as: "<output array field>"
  }
}

Simple Join Example

// Join orders with customer data
db.orders.aggregate([
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customerInfo"
    }
  }
])

// Result structure:
{
  _id: ObjectId("..."),
  orderId: "ORD-001",
  customerId: ObjectId("..."),
  amount: 150,
  customerInfo: [
    {
      _id: ObjectId("..."),
      name: "John Doe",
      email: "john@example.com"
    }
  ]
}

Advanced $lookup with Pipeline

// Complex join with filtering and transformation
db.orders.aggregate([
  { $lookup: {
      from: "products",
      let: { orderItems: "$items" },
      pipeline: [
        { $match: {
            $expr: {
              $in: ["$_id", "$$orderItems.productId"]
            }
          }
        },
        { $project: { name: 1, price: 1 } }
      ],
      as: "productDetails"
    }
  }
])

Unwinding Lookup Results

// Convert array to single object (1-to-1 relationship)
db.orders.aggregate([
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },  // Convert array to object
  { $project: {
      orderId: 1,
      amount: 1,
      customerName: "$customer.name",
      customerEmail: "$customer.email"
    }
  }
])

$unwind Stage

The $unwind stage deconstructs an array field, creating a separate document for each array element.

šŸ’„
$unwind Visualization
{ $unwind: "$tags" }

Input Document 1

name: "Laptop Pro"
price: 1299
tags:
electronics sale featured
$unwind

Output Documents ?

Click "Unwind Array" to see explosion

Basic Example

// Input document:
{
  _id: 1,
  name: "Product A",
  tags: ["electronics", "sale", "featured"]
}

// Pipeline:
{ $unwind: "$tags" }

// Output (3 documents):
{ _id: 1, name: "Product A", tags: "electronics" }
{ _id: 1, name: "Product A", tags: "sale" }
{ _id: 1, name: "Product A", tags: "featured" }

Common Use Cases

// 1. Analyze array elements
db.products.aggregate([
  { $unwind: "$tags" },
  { $group: {
      _id: "$tags",
      count: { $sum: 1 }
    }
  },
  { $sort: { count: -1 } }
])

// 2. Process order items
db.orders.aggregate([
  { $unwind: "$items" },
  { $group: {
      _id: "$items.productId",
      totalQuantity: { $sum: "$items.quantity" },
      revenue: { $sum: {
          $multiply: ["$items.price", "$items.quantity"]
        }
      }
    }
  }
])

Advanced Options

// Preserve documents with empty arrays
{ $unwind: {
    path: "$tags",
    preserveNullAndEmptyArrays: true
  }
}

// Include array index
{ $unwind: {
    path: "$items",
    includeArrayIndex: "itemIndex"
  }
}

$addFields Stage

The $addFields stage adds new fields to documents without removing existing fields (unlike $project).

āž•
$addFields Visualization
{ $addFields: { total: { $multiply: ["$price", "$qty"] }, hasDiscount: { $gte: ["$qty", 3] } } }

Input Documents 3

item: "Widget"
price: 10
qty: 5
item: "Gadget"
price: 25
qty: 2
item: "Gizmo"
price: 15
qty: 4
$addFields

Output Documents ?

Click "Add Fields" to see computed fields

// Add computed fields
{ $addFields: {
    total: { $multiply: ["$price", "$quantity"] },
    fullName: { $concat: ["$firstName", " ", "$lastName"] }
  }
}

// Add current date
{ $addFields: {
    processedAt: new Date(),
    year: { $year: "$createdAt" }
  }
}

// Conditional fields
{ $addFields: {
    discount: {
      $cond: {
        if: { $gte: ["$quantity", 10] },
        then: 0.1,
        else: 0
      }
    }
  }
}
$addFields vs $project: Use $addFields when you want to keep all existing fields and just add new ones. Use $project when you need to reshape the entire document.

Interactive Demo: Basic Pipeline

This demo shows a simple aggregation pipeline that filters, groups, and sorts sales data. Watch the data transform through each stage!

Sample Data: Sales Collection

Pipeline Stages

šŸ“„ Input
8 documents
→
$match
Filter completed
→
$group
By product
→
$sort
By totalSales
db.sales.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$product", totalSales: { $sum: "$amount" }, count: { $sum: 1 } } },
  { $sort: { totalSales: -1 } }
])

Current Stage Output

Interactive Demo: Advanced Pipeline

This demo demonstrates a complex pipeline with $lookup, $unwind, and multiple transformations.

Sample Data

Orders Collection

Customers Collection

Pipeline

db.orders.aggregate([
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },
  { $group: {
      _id: "$customer.country",
      totalRevenue: { $sum: "$amount" },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: "$amount" }
    }
  },
  { $sort: { totalRevenue: -1 } }
])

Output

Interactive Pipeline Builder

Build your own aggregation pipeline by selecting stages. Click on stages to add them to your pipeline.

Sample Data: Sales Collection

Available Stages

$match

Filter documents

$group

Group and aggregate

$project

Reshape documents

$sort

Order results

$limit

Limit results

$addFields

Add computed fields

Your Pipeline

Click stages above to build your pipeline

Generated Code

Pipeline Results