Skip to main content

Command Palette

Search for a command to run...

MongoDB Aggregation Pipeline: From Basics to Behind the Scenes

Updated
4 min read
MongoDB Aggregation Pipeline: From Basics to Behind the Scenes
L
Write blogs to refine my knowledge.

In this blog, you are going to understand the internals of the MongoDB aggregation pipeline. Most of us, when we first face this aggregation pipeline, get confused about why they use the $ sign inside [] brackets. Inside the [] brackets are the stages. These stages are created using {} braces. Each Stage is using operations to transform the data. e.g. group, project, addField, lookup etc..

The aggregation pipeline looks like:

Now we are going to understand that behind this aggregation pipeline, there is a mini query engine that handles each step of the aggregation process. I also provided the technical jargons for each step.


Step 1: Parsing the Data

Parsing, Logical Plan, AST(Abstract Syntax Tree)

At first MongoDB reads this pipeline and turn it into a structure of steps. This helps to analyze and execute the query more efficiently.

Step 2: Query Optimizer

Predicate Pushdown, Stage Re-ordering

It reorder the query to make it faster to execute and improves performance without changing the result.

Step 3: Query Planner

IXSCAN, COLLSCAN, B-Tree Index

It decides the proper way to fetch data. IXSCAN refers to index-based scanning, whereas COLLSCAN means scanning the entire collection. The index used in IXSCAN can be on any field, not just _id, but also combinations like { _id, name }.

Step 4: Physical Plan

Execution Plan, Operators

It converts the optimized plan into a physical execution plan, where MongoDB decides how each stage will actually run using specific operators and algorithms.

Step 5: Execution Engine

Slot Based Execution Engine(SBE)

It uses small memory boxes (slots) instead of moving full documents, which makes execution faster and more efficient.

{
    "customerId": 01,
    "amount": 10000,
    "total": 1000,
}

Step 6: Iterator Model

Volcano Model, next()

Each stage processes data by pulling one document at a time from the previous stage using the next() function.

Step 7: Streaming VS Blocking

Streaming refers to processing one document at a time. Blocking refers to processing all the data only after it is fully available. Aggregation operations like match and project are considered streaming, while group and sort are considered blocking.

Step 8: Stage Algorithms

Hash Aggregation, External Merge Sort, Nested Loop Join

It defines which algorithm is used in each stage to perform the operation.

e.g. group - Hash Table, sort - chunks->sort->merge->Result

Step 9: Expression Evaluation

Expression Tree, Evaluation Engine

It uses an expression evaluation engine to process expressions (like sum, multiply) by converting them into expression trees and evaluating them for each document.

What happens internally:

Doc1 → price = 100, quantity = 2 → totalPrice = 200  
Doc2 → price = 50, quantity = 3 → totalPrice = 150  

For each document, MongoDB evaluates expressions like multiply by reading field values, applying the operation, and storing the result.

Step 10: Memory Management

Memory Threshold, Disk Spill, External Processing

It processes small data in RAM, but when the data size exceeds the limit (around 100MB), it uses disk storage temporarily.

Step 11: Final output (Cursor)

Cursor, Batching, Lazy Fetching

Instead of sending all results at once, it sends results gradually using a cursor. This saves memory, provides a faster response, and is scalable for large datasets.


The aggregation pipeline is not just a sequence of operations. it is a mini query engine that processes data efficiently using different strategies and algorithms.

Understanding this makes you a better developer when working with MongoDB.