In this post, I will be discussing the $group in the MongoDB Aggregation framework. I would take on some examples on $group and various accumulators (operators used along with $group
) like $push
, $avg
, $sum
, $min
, $max
and $addToSet
$group
$group
simply groups the input documents based on specified fields and for each distinct group, it outputs the result document. The response will always have a unique _id
.
Here is an illustration to better understand the concept.
You can also have computed fields in the $group
output. These values will be computed inside the group. If you want to compute a stat across the collection, then you can group by _id
: null. We would take an example later.
Let’s first take a simple example of grouping.
Suppose we have a Game database, where we have a collection for storing user info called users and there is another collection where we store game scores called game_scores. We are having the following sample document
Now if we wanted to find out all the unique users who played the game. How would we do code it?
Solution – $group
.
You see that in _id
param, we pass $user, because we only want unique users( grouping by user field only), $ denotes that the user field is from input documents.
On the output result, you can see there are 4 unique users who played the game.
Now that you understood the basic concept, let’s move on to accumulators in the $group
stage.
Accumulator operators in $group
Accumulator operators are essentially the functions that are applied to fields of input documents other than fields that are “grouped” under “_id”.
In other words, only fields that you can output in the $group
other than _id
, should be accumulated using the accumulator operator.
All the following sections of this post will give you a fair idea of accumulator operators in the $group
.
$push
$push
operator pushes the fields in the expression into an array. You will get more clarity with the example.
Let’s continue with the example above. What if we wanted to get all the scores of each unique gamer in the array.
You see that scores are the new field which is an array with all the $score
fields of grouped documents.
$avg
$avg
calculates the mathematical average of the field mentioned in accumulator expression.
It ignores the non-numerical values.
A Simple Example
What is the average score of each user in the game?
A little complex example 🙂
Suppose, we wanted to get the average game score of each user in each of the gameMode
.
What do we do?
Just add gameMode
in the $group
.
It’s not that tough 😉
Notice _id
has now changed to an object (this is because now we are grouping with 2 fields, not 1). Here is how the output looks like.
Just observe the first 3 documents in this image. You see the user field is repeated with different gameMode values. So now you have an average score in each difficulty level for each user.
PS: I have truncated the output in this example for simplicity 🙂
$sum
$sum
is very similar to $avg
. The only difference being $sum
calculates the sum (as the name suggests already)
Example for $sum
Suppose, we wanted to calculate the number of times the user has played in each "gameMode".
Pretty easy right? The same is possible using $count. But I wanted to demonstrate $sum
.
$min
$min
calculates the minimum value of non-grouped fields in a grouped set of documents.
Example
How to calculate the minimum score of each gamer?
$max
$max
calculates the maximum value of non-grouped fields in a grouped set of documents.
Example
Very similar to the previous one, how do we calculate the highest score for each gamer in each gameMode
.
$first
$first
returns the first expression result in the group of documents that share the same $group
by key.
This would not be a very meaningful operator if there is no $sort
stage before the $group
stage. I think you need an example to get more clarification.
Example Suppose, we want to find out the first score of each gamer in our database.
We will sort the game_scores in ascending order of playedOn field. Then we will group by the user and use the $first
operator. Here’s how.
If you don’t use $sort as the first stage of the pipeline, then $first will not return any meaningful result
$last
This is very similar to $first
and does the exact opposite.
As you might have already guessed, $last
returns the last expression result in the group of documents that share the same $group
by key.
Example, How to find the latest score of each gamer in our database.
An alternate solution could have been sorting records based on playedOn
date in decreasing order and then using $first
. You have the choice, but I wanted to demonstrate $last
.
$addToSet
$addToSet
= $push
+ only have unique values in the resulting array.
Basically what $addToSet
does is, it takes the group of documents with the same $group
by _id
. And then applies the expression given as an argument (to $addToSet
) on each of the documents in the group, and then returns the array with unique results.
Let’s take a quick example
From our game_score collection, what if we needed to find, which game modes did each user play.
There can be some users who only play in difficult mode, others might play only in easy and medium mode. Let’s find out with code.
Let me know if you still have any doubts about any of these.
Conclusion
In this post, I discussed some basic concepts around the $group
which is used in the MongoDB Aggregation Pipeline. I hope the examples were simple enough. Let me know if you have any doubts, Tweet me at @MohitSehgl
Be in touch, will see you in the next post. Happy Coding 🙂
Originally posted at appsyoda.com