One way to reduce the disk space consumed by the data or indexes is to perform compact, which is an operation that releases unused blocks back to the operating system.
To test that, I created 1 million documents, resulting in this stats
docs dataMB indexMB idMB sha256MB sha512MB idx/data% id/data% s256/data% s512/data%
1000000 230.79 421.25 10.48 145.28 265.48 182.5 4.5 63.0 115.0
Ran compact, showing around 157MB freed space:
index_size_repro> db.runCommand({compact: "events"})
{ bytesFreed: 165216256, ok: 1 }
After rerunning the stats, we can see that the data storage remains the same, but the index storage is reduced by around 37%.
docs dataMB indexMB idMB sha256MB sha512MB idx/data% id/data% s256/data% s512/data%
1000000 230.79 263.68 10.47 83.54 169.67 114.3 4.5 36.2 73.5
Legend:
docs: inserted document count
dataMB: collection data storage
indexMB: all index storage
idMB: automatic _id_ index storage
sha256MB: secondary sha256 index storage
sha512MB: secondary sha512 index storage
idx/data%: all index storage as a percentage of collection data storage
id/data%: _id_ index storage as a percentage of collection data storage
s256/data%: sha256 index storage as a percentage of collection data storage
s512/data%: sha512 index storage as a percentage of collection data storage
Reference: https://www.mongodb.com/docs/manual/reference/command/compact/