The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array.
By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <be...@igalia.com> --- block/qcow2-cache.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index dffb887..3137ba8 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -248,6 +248,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, BDRVQcowState *s = bs->opaque; int i; int ret; + int lookup_index; uint64_t min_lru_counter = UINT64_MAX; int min_lru_index = -1; @@ -255,7 +256,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, offset, read_from_disk); /* Check if the table is already cached */ - for (i = 0; i < c->size; i++) { + i = lookup_index = (offset / s->cluster_size * 4) % c->size; + do { const Qcow2CachedTable *t = &c->entries[i]; if (t->offset == offset) { goto found; @@ -264,7 +266,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, min_lru_counter = t->lru_counter; min_lru_index = i; } - } + if (++i == c->size) { + i = 0; + } + } while (i != lookup_index); if (min_lru_index == -1) { /* This can't happen in current synchronous code, but leave the check -- 2.1.4