MongoDB - 4. Aggregation Framework

Aggregation framework mainly used for ETL majorly aggregation pipeline. Aggregations operations process data records and return computed results. There are 3 types for doing aggregation in mongoDB.

  • Single Purpose Aggregation

  • Aggregation Pipeline

  • Map Reduce Functions

Single Purpose Aggregation-

There is very simple function like getting no of records from collection or getting distinct records by removing duplicates.



Aggregation Pipeline

I aggregation pipelining there will be phases. Series of actions will be takes place on input and output will be produced. It is very similar to Map- Reduce concept in Hadoop. There will be different ETL phases as below –







Most of the time when we have to fetch the data we compare syntax with the SQL.

So, similar commands are available in MongoDB.

  • WHERE - $match

  • GROUP BY- $group

  • HAVING - $match

  • SELECT - $project

  • ORDER BY - $sort

  • LIMIT - $limit

Map Reduce Functions

MongoDB also provides map-reduce operations to perform aggregation. Sometime Finalize function also used in map –reduce. I am not going into details of this type as we can do all the things by aggregation pipeline which we can achieve with these functions.

Let us take few examples to get into little deeper to understand these concepts.

  1. Get the project total by year

db.project.aggregate([{$group:{_id:"$year","total by year":{$sum:1}}}])

- aggregate by year

2. db.project.aggregate([{$group:{_id:"$company","total by company":{$sum:1}}}])

- aggregate by year

3. db.project.aggregate([{$group:{"_id": {"firm": "$company","year":"$year"}, "total by company":{$sum:1}}}])

- aggregate by year/company

4. db.project.aggregate([

{ "$match": {"year": 2010, "company": "ABC"} },

{ "$unwind": "$Project" },

{ "$match": { "$or": [{"Project.Domain": /Telecom/},{"Project.Domain": /Retail/}] } },

{ "$unwind": "$Project.Client" },


"$group": {

"_id": {

"company": "$company",

"year": "$year",

"domain": "$Project.Domain"


"headcount": { "$sum": "$Project.Client.size" }




-- aggregate by year and company all the project sizes

5. db.trial_test.aggregate([{$group:{_id:"$status","count":{$sum:1}}}])

- aggregate by status

6.[{$group:{_id:"$Year","total airtime":{$sum:"$AirTime"}}}])

- the ID has been described

7.[{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}}])



8.[{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$match: {"_id.Carrier":"DL"}}])



9.[{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$project:{_id:0, "Flight":

{$toLower:"$_id.Carrier"},"Air Time":{$multiply:["$total airtime",60]}}},{$out:"FlightPlan"}])

About Author

✔ 13+ years of experience in the Software Industry; Over 6.5+ years of experience in machine learning and deep learning projects.

✔ Hands-on Data Science practitioner; a Leading team of data scientists, Python developers, UI developers, and business analysts for multi-million dollar projects

✔ Designed and developed DevOps enabled MLOPs strategy and components from scratch which had saved 80% data annotation and 40% development time for clients.

Featured Posts
Search By Tags
No tags yet.