♠ MongoDB Aggregation Framework and Map Reduce ♠

4 min readMay 30, 2021

✨ What is Database ?

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

✨ What is NoSQL ?

NoSQL can be defined as an approach to database designing, which holds a vast diversity of data such as key-value, multimedia, document, columnar, graph formats, external files, etc. NoSQL is purposefully developed for handling specific data models having flexible schemas to build modern applications. Some famous examples are MongoDB, Neo4J, HyperGraphDB, etc.

✨ What is MongoDB ?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.

✨ What is MongoDB Aggregation Framework ?

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

✨ What is Aggregation Pipeline ?

MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.

✨ What is Map Reduce Function ?

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results.

Let’s begin with Task :🤩

✨ Task Description📄

🔅 Use Aggression Framework of MongoDB and Create Mapper and Reducer Program.

Before we begin with this we need to import the data in mongo database.

mongoimport sample.json  -d  <database_name> -c <collection_name>    --jsonArray

👉🏻 Lets connect to mongodb shell :

# to see databases
show dbs# to see collections
use <database_name>
show collections

👉🏻 First let’s understand what our data contains and what our goal is ?

The data is of the countries and the languages the people speak there. So our goal is to find out how many countries speak same language.

👉🏻 We will perform this using two MongoDB Aggregation Framework :

Aggregation Pipeline
Map-Reduce Function

✨ Now let’s begin with task ….

👉🏻 Method 1: Aggregation Pipeline

db.countries.aggregate([{$group: {_id: {Language: “$Language”}, totalCountry: {$sum: 1}}}, {$sort: {totalCountry: 1}}])
# {$group: {_id: {Language: "$Language"} -->  group by Language# totalCountry: {$sum: 1} --> count the total countries asscoiated with that language# {$sort: {totalCountry: 1} --> sort them in ascending order

👉🏻 Method 2: Map Reduce Function

# Map reduce functions# Syntax :var mapFunction = function() { … };var reduceFunction = function(key, values) { … };db.runCommand(
 {
 mapReduce: <input-collection>,
 map: mapFunction,
 reduce: reduceFunction,
 out: { merge: <output-collection> },
 query: <query>
 }
 )

👉🏻 Declaring Map variable :

var mapFunc1 = function()  { 
  var cntry = emit(this.Language, this.CountryName);  
  $split: [ cntry, "," ];
};# defined country variable which will be grouping the data based on Language and Country Name and then splitting the data by comma

👉🏻 Declaring Reduce variable :

var ReduceFunc1 = function(keyLang, valuesCountryName) {  
 return valuesCountryName.length;
};# after grouping, here we are counting the number of countries after the output is been sent by mapper

👉🏻 Using Map Reduce Function :

db.countries.mapReduce( 
   mapFunc1,
   ReduceFunc1, 
   {out: "map_reduced"} 
)# now using map reduce function and saving it in map_reduced collection