Anyway, we can't always use this optimization because it will not work for non power of two values.
On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko <valentin.kuliche...@gmail.com> wrote: > In 99% of cases number of partition is a power of two, because it's the > default value. Almost no one changes it. If this change actually provides > better distribution, it absolutely makes sense to do it. > > Michael, can you create a Jira ticket and put you findings there? > > -Val > > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <ag...@apache.org> wrote: > >> Michael, >> >> it makes sense only for cases when partitions count is power of two. >> Affinity function doesn't have this limitation. >> >> Bu, of course, we can check, that partitions count is power of two and >> use optimized hash code calculation. >> >> >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs >> <michael.gri...@gridgain.com> wrote: >> > Hi Igniters, >> > >> > Last week I was working with a group of Ignite users. They are inserting >> > several million string keys in to a cache. Each string key was >> > approximately 22-characters in length. When I exported the partition >> > counts (via GG Visor) I was able to see an unusual periodicity in the >> > number of keys allocated to partitions. I charted this in Excel [1]. >> > After further investigation, it appears that there is a relationship >> > between the number of keys being inserted, the number of partitions >> > assigned to the cache and amount of apparent periodicity: a small number >> of >> > partitions will cause periodicity to appear with lower numbers of keys. >> > >> > The RendezvousAffinityFunction#partition function performs a simple >> > calculation of key hashcode modulo partition-count: >> > >> > U.safeAbs(key.hashCode() % parts) >> > >> > >> > Digging further I was led to the fact that this is how the Java HashMap >> > *used* to behave [2], but was upgraded around Java 1.4 to perform the >> > following: >> > >> > key.hashCode() & (parts - 1) >> > >> > which performs more efficiently. It was then updated further to do the >> > following: >> > >> > (h = key.hashCode()) ^ (h >>> 16); >> > >> > with the bit-shift performed to >> > >> > incorporate impact of the highest bits that would otherwise >> > never be used in index calculations because of table bounds >> > >> > >> > When using this function, rather than our >> > RendezvousAffinityFunction#partition implementation, I also saw a >> > significant decrease in the periodicity and a better distribution of keys >> > amongst partitions [3]. >> > >> > I would like to suggest that we adopt this modified hash function inside >> > RendezvousAffinityFunction. >> > >> > Regards >> > Mike >> > >> > >> > [1]: https://i.imgur.com/0FtCZ2A.png >> > [2]: >> > https://www.quora.com/Why-does-Java-use-a-mediocre- >> hashCode-implementation-for-strings >> > [3]: https://i.imgur.com/8ZuCSA3.png >>