I got a report from AOSP user that performance of mkfs.erofs with dedupe option that mkfs.erofs time increased to very high number. For example creation of 8GB uncompressed erofs image increased from 36seconds to 27minutes when dedupe was enabled. After profiling mkfs.erofs for sample data, I observed that the actual increased in time was coming from erofs_blob_exit() and debugging further it showed that real inefficiency was coming from hashmap_iter_first() which starts scanning for the first element from tablepos = 0 always.
The following patches solve this by - creating a helper function to disable hashmap shrinking - using hashmap_iter_next() to avoid scanning from 0 and as rehashing is disabled it is guaranteed to go through all the elements even while doing hashmap_remove(). Test results now show order of magnitude improvements for larger filesystem size. You can verify the improvements with below steps $ mkdir fs_data $ dd if=/dev/urandom of=fs_data/random_file.bin bs=1M count=8192 $ time mkfs.erofs --chunksize=4096 erofs_dedupe.img fs_data fs_size Before After Improvement 1G 23s 7s 3.2x 2G 81s 15s 5.4x 4G 272s 31s 8.77x 8G 1252s 61s 20.52x Thanks, Sandeep Sandeep Dhavale (2): erofs-utils: lib: provide helper to disable hashmap shrinking erofs-utils: lib: improve freeing hashmap in erofs_blob_exit() include/erofs/hashmap.h | 4 ++++ lib/blobchunk.c | 8 +++++++- 2 files changed, 11 insertions(+), 1 deletion(-) -- 2.45.1.288.g0e0cd299f1-goog