What is MongoDB Aggregation?
MongoDB's Aggregation Framework is a powerful data processing pipeline that allows you to transform and analyze documents in a collection. Think of it as a sophisticated assembly line where your data flows through multiple stages, each performing a specific transformation.
Visual Representation
Basic Syntax
db.collection.aggregate([ { $stage1: { ... } }, { $stage2: { ... } }, { $stage3: { ... } } ]) // Example: Find active users and count by country db.users.aggregate([ { $match: { status: "active" } }, { $group: { _id: "$country", count: { $sum: 1 } } }, { $sort: { count: -1 } } ])
Core Characteristics
- Pipeline-based: Data flows through stages sequentially
- Declarative: You specify what you want, not how to get it
- Server-side: Processing happens in MongoDB, reducing network overhead
- Optimized: MongoDB can reorder and combine stages for efficiency
- Flexible: Supports complex transformations, joins, and analytics
Why Use Aggregation Pipelines?
Aggregation pipelines solve complex data processing challenges that simple queries cannot handle. They're essential for analytics, reporting, and data transformation tasks.
1. Performance & Efficiency
// ā BAD: Fetch all data and process in application const users = await db.users.find({}).toArray(); const result = users .filter(u => u.age >= 18) .reduce((acc, u) => { acc[u.country] = (acc[u.country] || 0) + 1; return acc; }, {}); // ā GOOD: Process on database server const result = await db.users.aggregate([ { $match: { age: { $gte: 18 } } }, { $group: { _id: "$country", count: { $sum: 1 } } } ]).toArray();
2. Complex Data Transformations
Aggregation enables sophisticated operations that would be difficult or impossible with simple queries:
- Grouping & Aggregating: Calculate sums, averages, min/max across groups
- Joining Collections: Combine data from multiple collections ($lookup)
- Array Operations: Flatten, filter, and transform array fields
- Computed Fields: Create new fields based on calculations
- Text Search & Analysis: Full-text search with scoring
3. Real-World Use Cases
š Analytics & Reporting
š Data Transformation
šÆ Business Intelligence
š Advanced Queries
4. When to Use Aggregation vs. Find
// Use find() for simple queries db.users.find({ age: { $gte: 18 }, country: "USA" }) // Use aggregate() when you need: // - Grouping and calculations // - Multiple transformations // - Joins with other collections // - Complex filtering and reshaping db.users.aggregate([ { $match: { age: { $gte: 18 } } }, { $group: { _id: "$country", avgAge: { $avg: "$age" }, total: { $sum: 1 } } }, { $sort: { total: -1 } }, { $limit: 10 } ])
How Aggregation Pipelines Work
Understanding the execution flow of aggregation pipelines is crucial for writing efficient queries and debugging issues.
Pipeline Execution Flow
Stage-by-Stage Processing
Each stage in the pipeline receives documents from the previous stage, processes them, and outputs transformed documents to the next stage.
// Example: E-commerce order analysis db.orders.aggregate([ // Stage 1: Filter orders from 2024 { $match: { orderDate: { $gte: new Date("2024-01-01"), $lt: new Date("2025-01-01") } } }, // Stage 2: Group by customer and calculate totals { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" }, orderCount: { $sum: 1 }, avgOrderValue: { $avg: "$amount" } } }, // Stage 3: Filter high-value customers { $match: { totalSpent: { $gte: 1000 } } }, // Stage 4: Sort by total spent { $sort: { totalSpent: -1 } }, // Stage 5: Limit to top 10 { $limit: 10 } ])
Pipeline Optimization
- Early Filtering: $match stages are moved to the beginning when possible
- Index Usage: Initial $match and $sort can use indexes
- Stage Coalescing: Adjacent stages may be combined
- Projection Pushdown: Unnecessary fields are excluded early
Memory Considerations
// By default, each stage limited to 100MB RAM // Use allowDiskUse for large datasets db.collection.aggregate( [ /* stages */ ], { allowDiskUse: true } ) // Best practices: // 1. Filter early with $match // 2. Project only needed fields // 3. Use indexes when possible // 4. Limit result set size
Pipeline Stages Overview
MongoDB provides 30+ aggregation stages. Here are the most commonly used ones:
$match
Filters documents (like find())
$group
Groups documents and calculates aggregates
$project
Reshapes documents, includes/excludes fields
$sort
Orders documents
$limit
Limits number of documents
$skip
Skips documents (pagination)
$lookup
Joins with another collection
$unwind
Deconstructs array fields
$addFields
Adds new computed fields
$count
Counts documents
$facet
Multiple parallel pipelines
$bucket
Categorizes into buckets
Stage Categories
š Filtering
š Transformation
š Grouping
š Joining
š Ordering
šÆ Array Operations
$match Stage
The $match stage filters documents, similar to the find() query. It's one of the most important stages for performance optimization.
{ $match: { status: "completed" } }
Input Documents 5
Output Documents ?
Click "Run Stage" to see filtered results
Basic Syntax
{ $match: { <query> } }
// Examples:
{ $match: { status: "active" } }
{ $match: { age: { $gte: 18 } } }
{ $match: {
$and: [
{ price: { $gte: 100 } },
{ category: "electronics" }
]
}
}
Common Use Cases
// 1. Filter by field value { $match: { country: "USA" } } // 2. Range queries { $match: { orderDate: { $gte: new Date("2024-01-01"), $lt: new Date("2024-12-31") } } } // 3. Array contains { $match: { tags: "mongodb" } } // 4. Regular expressions { $match: { email: { $regex: "@gmail\\.com$" } } } // 5. Exists check { $match: { phoneNumber: { $exists: true } } } // 6. Multiple conditions { $match: { $or: [ { status: "premium" }, { totalSpent: { $gte: 1000 } } ] } }
Performance Tips
- Use indexes: $match at the beginning can use indexes
- Filter early: Reduce documents before expensive operations
- Combine with $project: Remove unnecessary fields after matching
// ā GOOD: Match first, then process db.orders.aggregate([ { $match: { status: "completed" } }, // Uses index { $group: { _id: "$customerId", total: { $sum: "$amount" } } } ]) // ā BAD: Group all, then filter db.orders.aggregate([ { $group: { _id: "$customerId", total: { $sum: "$amount" } } }, { $match: { total: { $gte: 1000 } } } ])
$group Stage
The $group stage groups documents by a specified expression and can perform calculations on grouped data.
{ $group: { _id: "$category", count: { $sum: 1 }, total: { $sum: "$price" } } }
Input Documents 6
Output Documents ?
Click "Run Stage" to see grouped results
Basic Syntax
{ $group: {
_id: <expression>, // Group key
<field1>: { <accumulator>: <expression> },
<field2>: { <accumulator>: <expression> }
}
}
Common Accumulators
// $sum - Count or sum values { $group: { _id: "$category", count: { $sum: 1 }, totalSales: { $sum: "$amount" } } } // $avg - Calculate average { $group: { _id: "$department", avgSalary: { $avg: "$salary" } } } // $min and $max - Find extremes { $group: { _id: "$product", minPrice: { $min: "$price" }, maxPrice: { $max: "$price" } } } // $push - Create array of values { $group: { _id: "$customerId", orders: { $push: "$orderId" } } } // $addToSet - Create array of unique values { $group: { _id: "$userId", uniqueProducts: { $addToSet: "$productId" } } } // $first and $last - Get first/last value { $group: { _id: "$customerId", firstOrder: { $first: "$orderDate" }, lastOrder: { $last: "$orderDate" } } }
Grouping Examples
// Group by single field { $group: { _id: "$country", count: { $sum: 1 } } } // Group by multiple fields { $group: { _id: { country: "$country", city: "$city" }, count: { $sum: 1 } } } // Group all documents (count total) { $group: { _id: null, total: { $sum: 1 }, avgAge: { $avg: "$age" } } } // Group by date (year/month) { $group: { _id: { year: { $year: "$orderDate" }, month: { $month: "$orderDate" } }, revenue: { $sum: "$amount" } } }
$project Stage
The $project stage reshapes documents by including, excluding, or adding fields. It's essential for controlling output format and creating computed fields.
{ $project: { name: 1, total: { $multiply: ["$price", "$qty"] }, _id: 0 } }
Input Documents 3
Output Documents ?
Click "Run Stage" to see reshaped documents
Basic Syntax
{ $project: {
field1: 1, // Include field
field2: 0, // Exclude field
newField: <expression> // Computed field
}
}
Common Operations
// 1. Include specific fields { $project: { name: 1, email: 1, age: 1 } } // 2. Exclude fields (exclude _id) { $project: { _id: 0, password: 0 } } // 3. Rename fields { $project: { fullName: "$name", emailAddress: "$email" } } // 4. Computed fields { $project: { name: 1, total: { $multiply: ["$price", "$quantity"] }, discountedPrice: { $subtract: [ "$price", { $multiply: ["$price", "$discountPercent"] } ] } } } // 5. String operations { $project: { upperName: { $toUpper: "$name" }, firstName: { $arrayElemAt: [{ $split: ["$name", " "] }, 0] }, emailDomain: { $arrayElemAt: [ { $split: ["$email", "@"] }, 1 ] } } } // 6. Date operations { $project: { year: { $year: "$createdAt" }, month: { $month: "$createdAt" }, dayOfWeek: { $dayOfWeek: "$createdAt" } } } // 7. Conditional fields { $project: { name: 1, status: { $cond: { if: { $gte: ["$age", 18] }, then: "adult", else: "minor" } } } }
Nested Field Access
// Access nested fields with dot notation { $project: { userName: "$user.name", street: "$address.street", city: "$address.city" } } // Array element access { $project: { firstTag: { $arrayElemAt: ["$tags", 0] }, tagCount: { $size: "$tags" } } }
$sort Stage
The $sort stage orders documents by specified fields.
{ $sort: { score: -1 } } // Descending by score
Input (Unsorted) 5
Output (Sorted) 5
Click sort buttons to see ordering
// 1 = ascending, -1 = descending { $sort: { field: 1 | -1 } } // Examples: { $sort: { age: 1 } } // Ascending by age { $sort: { createdAt: -1 } } // Descending by date { $sort: { country: 1, city: 1, name: 1 } } // Multi-field sort
$limit & $skip Stages
These stages control the number of documents returned, commonly used for pagination.
Input Documents 6
Output Documents ?
Click buttons to see $limit and $skip in action
$limit
// Return only first N documents { $limit: 10 } // Example: Top 5 products by sales db.products.aggregate([ { $sort: { sales: -1 } }, { $limit: 5 } ])
$skip
// Skip first N documents { $skip: 20 } // Example: Pagination (page 3, 10 items per page) const page = 3; const pageSize = 10; db.products.aggregate([ { $sort: { name: 1 } }, { $skip: (page - 1) * pageSize }, { $limit: pageSize } ])
Pagination Pattern
// Complete pagination with total count db.products.aggregate([ { $match: { category: "electronics" } }, { $facet: { metadata: [ { $count: "total" } ], data: [ { $sort: { price: -1 } }, { $skip: 20 }, { $limit: 10 } ] } } ])
$lookup Stage (Join)
The $lookup stage performs a left outer join with another collection, similar to SQL JOIN.
Orders Collection
Customers Collection
{ $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customer" } }
Joined Output ?
Click "Run Join" to see the joined documents
Basic Syntax
{ $lookup: {
from: "<collection to join>",
localField: "<field from input documents>",
foreignField: "<field from 'from' collection>",
as: "<output array field>"
}
}
Simple Join Example
// Join orders with customer data db.orders.aggregate([ { $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customerInfo" } } ]) // Result structure: { _id: ObjectId("..."), orderId: "ORD-001", customerId: ObjectId("..."), amount: 150, customerInfo: [ { _id: ObjectId("..."), name: "John Doe", email: "john@example.com" } ] }
Advanced $lookup with Pipeline
// Complex join with filtering and transformation db.orders.aggregate([ { $lookup: { from: "products", let: { orderItems: "$items" }, pipeline: [ { $match: { $expr: { $in: ["$_id", "$$orderItems.productId"] } } }, { $project: { name: 1, price: 1 } } ], as: "productDetails" } } ])
Unwinding Lookup Results
// Convert array to single object (1-to-1 relationship) db.orders.aggregate([ { $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customer" } }, { $unwind: "$customer" }, // Convert array to object { $project: { orderId: 1, amount: 1, customerName: "$customer.name", customerEmail: "$customer.email" } } ])
$unwind Stage
The $unwind stage deconstructs an array field, creating a separate document for each array element.
{ $unwind: "$tags" }
Input Document 1
Output Documents ?
Click "Unwind Array" to see explosion
Basic Example
// Input document: { _id: 1, name: "Product A", tags: ["electronics", "sale", "featured"] } // Pipeline: { $unwind: "$tags" } // Output (3 documents): { _id: 1, name: "Product A", tags: "electronics" } { _id: 1, name: "Product A", tags: "sale" } { _id: 1, name: "Product A", tags: "featured" }
Common Use Cases
// 1. Analyze array elements db.products.aggregate([ { $unwind: "$tags" }, { $group: { _id: "$tags", count: { $sum: 1 } } }, { $sort: { count: -1 } } ]) // 2. Process order items db.orders.aggregate([ { $unwind: "$items" }, { $group: { _id: "$items.productId", totalQuantity: { $sum: "$items.quantity" }, revenue: { $sum: { $multiply: ["$items.price", "$items.quantity"] } } } } ])
Advanced Options
// Preserve documents with empty arrays { $unwind: { path: "$tags", preserveNullAndEmptyArrays: true } } // Include array index { $unwind: { path: "$items", includeArrayIndex: "itemIndex" } }
$addFields Stage
The $addFields stage adds new fields to documents without removing existing fields (unlike $project).
{ $addFields: { total: { $multiply: ["$price", "$qty"] }, hasDiscount: { $gte: ["$qty", 3] } } }
Input Documents 3
Output Documents ?
Click "Add Fields" to see computed fields
// Add computed fields { $addFields: { total: { $multiply: ["$price", "$quantity"] }, fullName: { $concat: ["$firstName", " ", "$lastName"] } } } // Add current date { $addFields: { processedAt: new Date(), year: { $year: "$createdAt" } } } // Conditional fields { $addFields: { discount: { $cond: { if: { $gte: ["$quantity", 10] }, then: 0.1, else: 0 } } } }
Interactive Demo: Basic Pipeline
This demo shows a simple aggregation pipeline that filters, groups, and sorts sales data. Watch the data transform through each stage!
Sample Data: Sales Collection
Pipeline Stages
db.sales.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$product", totalSales: { $sum: "$amount" }, count: { $sum: 1 } } },
{ $sort: { totalSales: -1 } }
])
Current Stage Output
Interactive Demo: Advanced Pipeline
This demo demonstrates a complex pipeline with $lookup, $unwind, and multiple transformations.
Sample Data
Orders Collection
Customers Collection
Pipeline
db.orders.aggregate([
{ $lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}
},
{ $unwind: "$customer" },
{ $group: {
_id: "$customer.country",
totalRevenue: { $sum: "$amount" },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$amount" }
}
},
{ $sort: { totalRevenue: -1 } }
])
Output
Interactive Pipeline Builder
Build your own aggregation pipeline by selecting stages. Click on stages to add them to your pipeline.
Sample Data: Sales Collection
Available Stages
$match
Filter documents
$group
Group and aggregate
$project
Reshape documents
$sort
Order results
$limit
Limit results
$addFields
Add computed fields
Your Pipeline
Click stages above to build your pipeline