Hi Igniters,

Last week I was working with a group of Ignite users.  They are inserting
several million string keys in to a cache.  Each string key was
approximately 22-characters in length.  When I exported the partition
counts (via GG Visor) I was able to see an unusual periodicity in the
number of keys allocated to partitions.  I charted this in Excel [1].
After further investigation, it appears that there is a relationship
between the number of keys being inserted, the number of partitions
assigned to the cache and amount of apparent periodicity: a small number of
partitions will cause periodicity to appear with lower numbers of keys.

The RendezvousAffinityFunction#partition function performs a simple
calculation of key hashcode modulo partition-count:

U.safeAbs(key.hashCode() % parts)


Digging further I was led to the fact that this is how the Java HashMap
*used* to behave [2], but was upgraded around Java 1.4 to perform the
following:

key.hashCode() & (parts - 1)

which performs more efficiently.  It was then updated further to do the
following:

(h = key.hashCode()) ^ (h >>> 16);

with the bit-shift performed to

incorporate impact of the highest bits that would otherwise
never be used in index calculations because of table bounds


When using this function, rather than our
RendezvousAffinityFunction#partition implementation, I also saw a
significant decrease in the periodicity and a better distribution of keys
amongst partitions [3].

I would like to suggest that we adopt this modified hash function inside
RendezvousAffinityFunction.

Regards
Mike


[1]: https://i.imgur.com/0FtCZ2A.png
[2]:
https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
[3]: https://i.imgur.com/8ZuCSA3.png

Reply via email to