Andrey, Absolutely, your point is correct. I'm talking about default behavior which must be as effective as possible. In case we do this optimization, I would also show a warning if number of partitions is not a power of two.
-Val On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <ag...@apache.org> wrote: > Anyway, we can't always use this optimization because it will not work > for non power of two values. > > On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko > <valentin.kuliche...@gmail.com> wrote: > > In 99% of cases number of partition is a power of two, because it's the > > default value. Almost no one changes it. If this change actually provides > > better distribution, it absolutely makes sense to do it. > > > > Michael, can you create a Jira ticket and put you findings there? > > > > -Val > > > > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <ag...@apache.org> wrote: > > > >> Michael, > >> > >> it makes sense only for cases when partitions count is power of two. > >> Affinity function doesn't have this limitation. > >> > >> Bu, of course, we can check, that partitions count is power of two and > >> use optimized hash code calculation. > >> > >> > >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs > >> <michael.gri...@gridgain.com> wrote: > >> > Hi Igniters, > >> > > >> > Last week I was working with a group of Ignite users. They are > inserting > >> > several million string keys in to a cache. Each string key was > >> > approximately 22-characters in length. When I exported the partition > >> > counts (via GG Visor) I was able to see an unusual periodicity in the > >> > number of keys allocated to partitions. I charted this in Excel [1]. > >> > After further investigation, it appears that there is a relationship > >> > between the number of keys being inserted, the number of partitions > >> > assigned to the cache and amount of apparent periodicity: a small > number > >> of > >> > partitions will cause periodicity to appear with lower numbers of > keys. > >> > > >> > The RendezvousAffinityFunction#partition function performs a simple > >> > calculation of key hashcode modulo partition-count: > >> > > >> > U.safeAbs(key.hashCode() % parts) > >> > > >> > > >> > Digging further I was led to the fact that this is how the Java > HashMap > >> > *used* to behave [2], but was upgraded around Java 1.4 to perform the > >> > following: > >> > > >> > key.hashCode() & (parts - 1) > >> > > >> > which performs more efficiently. It was then updated further to do > the > >> > following: > >> > > >> > (h = key.hashCode()) ^ (h >>> 16); > >> > > >> > with the bit-shift performed to > >> > > >> > incorporate impact of the highest bits that would otherwise > >> > never be used in index calculations because of table bounds > >> > > >> > > >> > When using this function, rather than our > >> > RendezvousAffinityFunction#partition implementation, I also saw a > >> > significant decrease in the periodicity and a better distribution of > keys > >> > amongst partitions [3]. > >> > > >> > I would like to suggest that we adopt this modified hash function > inside > >> > RendezvousAffinityFunction. > >> > > >> > Regards > >> > Mike > >> > > >> > > >> > [1]: https://i.imgur.com/0FtCZ2A.png > >> > [2]: > >> > https://www.quora.com/Why-does-Java-use-a-mediocre- > >> hashCode-implementation-for-strings > >> > [3]: https://i.imgur.com/8ZuCSA3.png > >> >