I performed a small test to determine how much index storage the width of a data included in an index takes up. The Go program creates two indexes in index_size_repro.events and inserts 300K documents, with each document having unique SHA256 and SHA512. By default, there’s another index _id_ created for the unique identifier. To give you context, an ObjectID is 12 bytes, SHA256 is 32 bytes raw (but 64 characters as hex string), and SHA512 is 64 bytes raw (but 128 characters as hex string).
db := client.Database("index_size_repro")
coll := db.Collection("events")
_, err = coll.Indexes().CreateMany(ctx, []mongo.IndexModel{
{
Keys: bson.D{{Key: "sha256", Value: 1}},
Options: options.Index().SetName("sha256_1"),
},
{
Keys: bson.D{{Key: "sha512", Value: 1}},
Options: options.Index().SetName("sha512_1"),
},
})
You will notice as the data grow, the index size is growing larger than the collection data storage, reaching almost twice as the size of the total data storage. Other notes:
_id_index, the default index, is relatively small, only2.3%of the total index storage and4%of the total collection data storage (see the last row of the result table)- The index that takes up the majority of the index storage is
sha512, followed bysha256 - This goes to show that multiple, individual indexes with high cardinality wide values may create larger index storage
docs dataMB indexMB idMB sha256MB sha512MB idx/data% id/data% s256/data% s512/data%
10000 2.14 2.33 0.11 0.76 1.46 108.7 5.1 35.5 68.1
20000 7.25 6.91 0.31 2.25 4.34 95.2 4.3 31.1 59.9
30000 7.88 11.34 0.50 3.73 7.12 143.9 6.3 47.3 90.3
40000 12.98 16.12 0.68 5.26 10.18 124.1 5.3 40.5 78.4
50000 13.43 20.87 0.87 6.84 13.16 155.4 6.5 51.0 98.0
60000 18.53 25.21 0.87 8.36 15.99 136.1 4.7 45.1 86.3
70000 19.04 29.53 0.87 9.76 18.90 155.1 4.6 51.3 99.3
80000 23.65 33.97 1.06 11.21 21.70 143.6 4.5 47.4 91.8
90000 24.24 38.62 1.25 12.75 24.62 159.3 5.2 52.6 101.6
100000 24.37 42.94 1.25 14.21 27.48 176.2 5.1 58.3 112.8
110000 29.14 47.18 1.25 15.61 30.32 161.9 4.3 53.6 104.0
120000 29.75 51.74 1.44 17.13 33.18 173.9 4.8 57.6 111.5
130000 34.54 56.30 1.63 18.61 36.06 163.0 4.7 53.9 104.4
140000 35.05 60.61 1.63 20.03 38.95 172.9 4.6 57.1 111.1
150000 39.64 64.97 1.63 21.53 41.82 163.9 4.1 54.3 105.5
160000 40.23 69.55 1.82 23.02 44.71 172.9 4.5 57.2 111.1
170000 40.54 74.11 2.01 24.48 47.62 182.8 5.0 60.4 117.4
180000 45.27 78.39 2.01 25.93 50.45 173.2 4.4 57.3 111.5
190000 45.95 82.74 2.01 27.41 53.33 180.1 4.4 59.6 116.0
200000 50.79 86.98 2.20 28.84 55.94 171.3 4.3 56.8 110.1
210000 51.18 91.30 2.39 30.29 58.62 178.4 4.7 59.2 114.5
220000 55.71 96.40 2.39 31.78 62.23 173.0 4.3 57.0 111.7
230000 56.29 101.15 2.39 33.26 65.50 179.7 4.2 59.1 116.4
240000 56.64 105.20 2.58 34.71 67.91 185.8 4.6 61.3 119.9
250000 61.38 109.15 2.77 36.16 70.22 177.8 4.5 58.9 114.4
260000 62.04 113.41 2.77 37.63 73.01 182.8 4.5 60.6 117.7
270000 66.84 118.07 2.79 39.14 76.14 176.7 4.2 58.6 113.9
280000 67.25 123.00 2.99 40.70 79.30 182.9 4.4 60.5 117.9
290000 71.82 126.52 2.99 42.21 81.32 176.2 4.2 58.8 113.2
300000 72.43 130.42 3.03 43.72 83.67 180.1 4.2 60.4 115.5
Legend:
docs: inserted document count
dataMB: collection data storage
indexMB: all index storage
idMB: automatic _id_ index storage
sha256MB: secondary sha256 index storage
sha512MB: secondary sha512 index storage
idx/data%: all index storage as a percentage of collection data storage
id/data%: _id_ index storage as a percentage of collection data storage
s256/data%: sha256 index storage as a percentage of collection data storage
s512/data%: sha512 index storage as a percentage of collection data storage