MongoDB - 4. Aggregation Framework

May 1, 2013

 

Aggregation framework mainly used for ETL majorly aggregation pipeline. Aggregations operations process data records and return computed results. There are 3 types for doing aggregation in mongoDB.

  • Single Purpose Aggregation

  • Aggregation Pipeline

  • Map Reduce Functions

 

Single Purpose Aggregation-

There is very simple function like getting no of records from collection or getting distinct records by removing duplicates.

db.mycollection.count()

db.mycollection.distinct()

 

Aggregation Pipeline

I aggregation pipelining there will be phases. Series of actions will be takes place on input and output will be produced. It is very similar to Map- Reduce concept in Hadoop. There will be different ETL phases as below –

MATCH

GROUP

UNWIND

PROJECT

SKIP

LIMIT

 

Most of the time when we have to fetch the data we compare syntax with the SQL.

So, similar commands are available in MongoDB.

  • WHERE - $match

  • GROUP BY- $group

  • HAVING - $match

  • SELECT - $project

  • ORDER BY - $sort

  • LIMIT  - $limit

 

Map Reduce Functions

 

MongoDB also provides map-reduce operations to perform aggregation. Sometime Finalize function also used in map –reduce. I am not going into details of this type as we can do all the things by aggregation pipeline which we can achieve with these functions.

 

 

 

Let us take few examples to get into little deeper to understand these concepts.

 

  1. Get the project total by year

db.project.aggregate([{$group:{_id:"$year","total by year":{$sum:1}}}])

- aggregate by year

 

2. db.project.aggregate([{$group:{_id:"$company","total by company":{$sum:1}}}])

- aggregate by year

 

3. db.project.aggregate([{$group:{"_id": {"firm": "$company","year":"$year"}, "total by company":{$sum:1}}}])

- aggregate by year/company

 

4. db.project.aggregate([

{ "$match": {"year": 2010, "company": "ABC"} },

{ "$unwind": "$Project" },

{ "$match": { "$or": [{"Project.Domain": /Telecom/},{"Project.Domain": /Retail/}] } },

{ "$unwind": "$Project.Client" },

{

"$group": {

"_id": {

"company": "$company",

"year": "$year",

"domain": "$Project.Domain"

},

"headcount": { "$sum": "$Project.Client.size" }

}

}

])

-- aggregate by year and company all the project sizes

 

 

 

5. db.trial_test.aggregate([{$group:{_id:"$status","count":{$sum:1}}}])

- aggregate by status

 

6. db.flights.aggregate([{$group:{_id:"$Year","total airtime":{$sum:"$AirTime"}}}])

- the ID has been described

 

7. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}}])

 

Match

=====

8. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$match: {"_id.Carrier":"DL"}}])

 

Projection

==========

9. db.flights.aggregate([{$group:{_id:{"Carrier":"$UniqueCarrier"},"total airtime":{$sum:"$AirTime"}}},{$project:{_id:0, "Flight":

{$toLower:"$_id.Carrier"},"Air Time":{$multiply:["$total airtime",60]}}},{$out:"FlightPlan"}])

 

 

 

 

 

 

 

 

Share on Facebook
Share on Twitter
Please reload

About Author

Dattatray Shinde have over 6+ years of experience in Software Design, Development & Maintenance of Web Based Applications; worked on Healthcare, Insurance, E-commerce and Learning Management System domains. Over 2.5 + years as Data Scientist worked mainly in predictive analytics, survey analytics, risk analytics platforms.

Featured Posts

SAS - 1. Starting with SAS ... How to Install University edition?

March 20, 2017

1/5
Please reload

Recent Posts

June 21, 2017

January 19, 2014

Please reload

Search By Tags
Please reload

Connect

Contact Me

Mobile: +91 - 956 130 9595 

dattatrayshindeharipur@gmail.com

  • Google+ Social Icon
  • Facebook Social Icon
  • LinkedIn Social Icon
  • Twitter Social Icon

© 2023 by Dattatray Shinde