♠ MongoDB Aggregation Framework and Map Reduce ♠
✨ What is Database ?
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.
✨ What is NoSQL ?
NoSQL can be defined as an approach to database designing, which holds a vast diversity of data such as key-value, multimedia, document, columnar, graph formats, external files, etc. NoSQL is purposefully developed for handling specific data models having flexible schemas to build modern applications. Some famous examples are MongoDB, Neo4J, HyperGraphDB, etc.
✨ What is MongoDB ?
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.
✨ What is MongoDB Aggregation Framework ?
Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.
✨ What is Aggregation Pipeline ?
MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
✨ What is Map Reduce Function ?
Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results.
Let’s begin with Task :🤩
✨ Task Description📄
🔅 Use Aggression Framework of MongoDB and Create Mapper and Reducer Program.
Before we begin with this we need to import the data in mongo database.
mongoimport sample.json -d <database_name> -c <collection_name> --jsonArray
👉🏻 Lets connect to mongodb shell :
# to see databases
show dbs# to see collections
use <database_name>
show collections
👉🏻 First let’s understand what our data contains and what our goal is ?
The data is of the countries and the languages the people speak there. So our goal is to find out how many countries speak same language.
👉🏻 We will perform this using two MongoDB Aggregation Framework :
- Aggregation Pipeline
- Map-Reduce Function
✨ Now let’s begin with task ….
👉🏻 Method 1: Aggregation Pipeline
db.countries.aggregate([{$group: {_id: {Language: “$Language”}, totalCountry: {$sum: 1}}}, {$sort: {totalCountry: 1}}])
# {$group: {_id: {Language: "$Language"} --> group by Language# totalCountry: {$sum: 1} --> count the total countries asscoiated with that language# {$sort: {totalCountry: 1} --> sort them in ascending order
👉🏻 Method 2: Map Reduce Function
# Map reduce functions# Syntax :var mapFunction = function() { … };var reduceFunction = function(key, values) { … };db.runCommand(
{
mapReduce: <input-collection>,
map: mapFunction,
reduce: reduceFunction,
out: { merge: <output-collection> },
query: <query>
}
)
👉🏻 Declaring Map variable :
var mapFunc1 = function() {
var cntry = emit(this.Language, this.CountryName);
$split: [ cntry, "," ];
};# defined country variable which will be grouping the data based on Language and Country Name and then splitting the data by comma
👉🏻 Declaring Reduce variable :
var ReduceFunc1 = function(keyLang, valuesCountryName) {
return valuesCountryName.length;
};# after grouping, here we are counting the number of countries after the output is been sent by mapper
👉🏻 Using Map Reduce Function :
db.countries.mapReduce(
mapFunc1,
ReduceFunc1,
{out: "map_reduced"}
)# now using map reduce function and saving it in map_reduced collection
👉🏻 Now let’s do query :
db.map_reduced.find().sort( { } )