Depending on size of the filesystem being built there can be huge number of elements in the hashmap. Currently we call hashmap_iter_first() in while loop to iterate and free the elements. However technically correct, this is inefficient in 2 aspects.
- As we are iterating the elements for removal, we do not need overhead of rehashing. - Second part which contributes hugely to the performance is using hashmap_iter_first() as it starts scanning from index 0 throwing away the previous successful scan. For sparsely populated hashmap this becomes O(n^2) in worst case. Lets fix this by disabling hashmap shrink which avoids rehashing and use hashmap_iter_next() which is now guaranteed to iterate over all the elements while removing while avoiding the performance pitfalls of using hashmap_iter_first(). Test with random data shows performance improvement as: fs_size Before After 1G 23s 7s 2G 81s 15s 4G 272s 31s 8G 1252s 61s Signed-off-by: Sandeep Dhavale <dhav...@google.com> --- lib/blobchunk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/lib/blobchunk.c b/lib/blobchunk.c index 645bcc1..8082aa4 100644 --- a/lib/blobchunk.c +++ b/lib/blobchunk.c @@ -548,11 +548,17 @@ void erofs_blob_exit(void) if (blobfile) fclose(blobfile); - while ((e = hashmap_iter_first(&blob_hashmap, &iter))) { + /* Disable hashmap shrink, effectively disabling rehash. + * This way we can iterate over entire hashmap efficiently + * and safely by using hashmap_iter_next() */ + hashmap_disable_shrink(&blob_hashmap); + e = hashmap_iter_first(&blob_hashmap, &iter); + while (e) { bc = container_of((struct hashmap_entry *)e, struct erofs_blobchunk, ent); DBG_BUGON(hashmap_remove(&blob_hashmap, e) != e); free(bc); + e = hashmap_iter_next(&iter); } DBG_BUGON(hashmap_free(&blob_hashmap)); -- 2.45.1.288.g0e0cd299f1-goog