skip to content
Alvin Lucillo

Rebuilding indexes to reduce disk space

/ 2 min read

Yesterday, we saw that compact can help reduce the disk space occupied by index storage by releasing unused blocks to the operating system. Another option to save space by rebuilding the indexes. In other words, indexes are recreated. Over time, there will be fragmentations in the index files keeping up disk space that can be released.

Using the tool, I reran it to recreate the database and collection and show the stats below.

     docs     dataMB    indexMB       idMB   sha256MB   sha512MB  idx/data%   id/data% s256/data% s512/data%
   1000000     232.43     427.03      10.39     146.55     270.09      183.7        4.5       63.0      116.2

I performed index recreation below:

// list existing indexes
index_size_repro> db.events.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { sha256: 1 }, name: 'sha256_1' },
  { v: 2, key: { sha512: 1 }, name: 'sha512_1' }
]

// drop all indexes; note that this excludes _id indexes
index_size_repro> db.events.dropIndexes()
{
  nIndexesWas: 3,
  msg: 'non-_id indexes dropped for collection',
  ok: 1
}

// only one index remains
index_size_repro> db.events.getIndexes()
[ { v: 2, key: { _id: 1 }, name: '_id_' } ]

// recreated the two indexes dropped earlier
index_size_repro> db.events.createIndex({ sha256: 1 }, { name: "sha256_1" })
sha256_1
index_size_repro> db.events.createIndex({ sha512: 1 }, { name: "sha512_1" })
sha512_1
index_size_repro> db.events.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { sha256: 1 }, name: 'sha256_1' },
  { v: 2, key: { sha512: 1 }, name: 'sha512_1' }
]

After rerunning the stats, we can see that the data storage remains the same, but the index storage is reduced by around 46.6%.

      docs     dataMB    indexMB       idMB   sha256MB   sha512MB  idx/data%   id/data% s256/data% s512/data%
   1000000     232.43     228.21      10.39      74.26     143.55       98.2        4.5       32.0       61.8

Legend:

docs: inserted document count
dataMB: collection data storage
indexMB: all index storage
idMB: automatic _id_ index storage
sha256MB: secondary sha256 index storage
sha512MB: secondary sha512 index storage
idx/data%: all index storage as a percentage of collection data storage
id/data%: _id_ index storage as a percentage of collection data storage
s256/data%: sha256 index storage as a percentage of collection data storage
s512/data%: sha512 index storage as a percentage of collection data storage