You can perform grouping on project documents by shaping them. In the pipeline below:
- It transforms each document into the document below.
$regexFindcaptures matched text based on the regex, which looks for texts with numbers enclosed in opening and closing parentheses.
{
"_id": {
"$oid": "69aeba4dc8dadad3693b603d"
},
"match": {
"match": "(02)",
"idx": 0,
"captures": [
"02"
]
}
},
- From the first stage, each document is transformed again so we only get the first captured text.
$arrayElemAttakes the first element from$match.capturesarray.
{
"_id": {
"$oid": "69aebb27c5311e047ebcf18a"
},
"area_code": "02"
},
- From the previous stage, grouping is performed. Documents are collapsed by their
$area_codewhere each document represents an area code and containspeople_count, the sum of all documents that fall under an area code.
// test data
// (area code) network + local number, e.g. (02) 8123-4567
db.getCollection("persons").insertMany([
{ name: "Person1", contact_number: "(02) 8123-4567" },
{ name: "Person2", contact_number: "(02) 8234-5678" },
{ name: "Person3", contact_number: "(032) 7345-6789" },
{ name: "Person4", contact_number: "(032) 7456-7890" },
{ name: "Person5", contact_number: "(082) 8567-8901" },
{ name: "Person6", contact_number: "(082) 8678-9012" },
]);
// Count people per area code
db.getCollection("persons").aggregate([
{
$project: {
match: { $regexFind: { input: "$contact_number", regex: /^\((\d+)\)/ } },
},
},
{
$project: {
area_code: { $arrayElemAt: ["$match.captures", 0] },
},
},
{
$group: {
_id: "$area_code",
people_count: { $sum: 1 },
},
},
{ $sort: { _id: 1 } },
]);
Output:
[
{
"_id": "02",
"people_count": 2
},
{
"_id": "032",
"people_count": 2
},
{
"_id": "082",
"people_count": 2
}
]