skip to content
Alvin Lucillo

Group projected documents

/ 2 min read

You can perform grouping on project documents by shaping them. In the pipeline below:

  1. It transforms each document into the document below. $regexFind captures matched text based on the regex, which looks for texts with numbers enclosed in opening and closing parentheses.
  {
    "_id": {
      "$oid": "69aeba4dc8dadad3693b603d"
    },
    "match": {
      "match": "(02)",
      "idx": 0,
      "captures": [
        "02"
      ]
    }
  },
  1. From the first stage, each document is transformed again so we only get the first captured text. $arrayElemAt takes the first element from $match.captures array.
  {
    "_id": {
      "$oid": "69aebb27c5311e047ebcf18a"
    },
    "area_code": "02"
  },
  1. From the previous stage, grouping is performed. Documents are collapsed by their $area_code where each document represents an area code and contains people_count, the sum of all documents that fall under an area code.
// test data
// (area code) network + local number, e.g. (02) 8123-4567
db.getCollection("persons").insertMany([
	{ name: "Person1", contact_number: "(02) 8123-4567" },
	{ name: "Person2", contact_number: "(02) 8234-5678" },
	{ name: "Person3", contact_number: "(032) 7345-6789" },
	{ name: "Person4", contact_number: "(032) 7456-7890" },
	{ name: "Person5", contact_number: "(082) 8567-8901" },
	{ name: "Person6", contact_number: "(082) 8678-9012" },
]);

// Count people per area code
db.getCollection("persons").aggregate([
	{
		$project: {
			match: { $regexFind: { input: "$contact_number", regex: /^\((\d+)\)/ } },
		},
	},
	{
		$project: {
			area_code: { $arrayElemAt: ["$match.captures", 0] },
		},
	},
	{
		$group: {
			_id: "$area_code",
			people_count: { $sum: 1 },
		},
	},
	{ $sort: { _id: 1 } },
]);

Output:

[
	{
		"_id": "02",
		"people_count": 2
	},
	{
		"_id": "032",
		"people_count": 2
	},
	{
		"_id": "082",
		"people_count": 2
	}
]