The use of substring function can vary in programming languages, but here in mongodb, we can extract a substring with the use of $substrBytes, which accepts the property of the document, starting index of a substring, and the number of bytes. Note that $substrBytes slices the value based on bytes, which if you are not careful, you might encounter an error when dealing with multi-byte characters. In the upcominng entries, I will discuss about multi-byte characters.
// test data
db.getCollection("unicode_demo").insertMany([
{ label: "with_emoji", text: "Hi☺!" }, // emoji is multi-byte
]);
db.getCollection("unicode_demo").aggregate([
{
$project: {
label: 1,
text: 1,
substr_bytes: { $substrBytes: ["$text", 0, 2] },
},
},
]);
Output:
[
{
_id: ObjectId('69b2a1f53e54149a74d5346f'),
label: 'with_emoji',
text: 'Hi☺!',
substr_bytes: 'Hi'
}
]