Have we ever heard of somebody needing to set the partition count to a non-power-of-two number? Perhaps we could restrict the method so that it will only accept a power of two as the partition count?
-----Original Message----- From: Valentin Kulichenko [mailto:valentin.kuliche...@gmail.com] Sent: 15 March 2017 16:22 To: dev@ignite.apache.org Subject: Re: Distribution of keys to partitions Andrey, Absolutely, your point is correct. I'm talking about default behavior which must be as effective as possible. In case we do this optimization, I would also show a warning if number of partitions is not a power of two. -Val On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <ag...@apache.org> wrote: > Anyway, we can't always use this optimization because it will not work > for non power of two values. > > On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko > <valentin.kuliche...@gmail.com> wrote: > > In 99% of cases number of partition is a power of two, because it's > > the default value. Almost no one changes it. If this change actually > > provides better distribution, it absolutely makes sense to do it. > > > > Michael, can you create a Jira ticket and put you findings there? > > > > -Val > > > > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <ag...@apache.org> wrote: > > > >> Michael, > >> > >> it makes sense only for cases when partitions count is power of two. > >> Affinity function doesn't have this limitation. > >> > >> Bu, of course, we can check, that partitions count is power of two > >> and use optimized hash code calculation. > >> > >> > >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs > >> <michael.gri...@gridgain.com> wrote: > >> > Hi Igniters, > >> > > >> > Last week I was working with a group of Ignite users. They are > inserting > >> > several million string keys in to a cache. Each string key was > >> > approximately 22-characters in length. When I exported the > >> > partition counts (via GG Visor) I was able to see an unusual > >> > periodicity in the number of keys allocated to partitions. I charted > >> > this in Excel [1]. > >> > After further investigation, it appears that there is a > >> > relationship between the number of keys being inserted, the > >> > number of partitions assigned to the cache and amount of apparent > >> > periodicity: a small > number > >> of > >> > partitions will cause periodicity to appear with lower numbers of > keys. > >> > > >> > The RendezvousAffinityFunction#partition function performs a > >> > simple calculation of key hashcode modulo partition-count: > >> > > >> > U.safeAbs(key.hashCode() % parts) > >> > > >> > > >> > Digging further I was led to the fact that this is how the Java > HashMap > >> > *used* to behave [2], but was upgraded around Java 1.4 to perform > >> > the > >> > following: > >> > > >> > key.hashCode() & (parts - 1) > >> > > >> > which performs more efficiently. It was then updated further to > >> > do > the > >> > following: > >> > > >> > (h = key.hashCode()) ^ (h >>> 16); > >> > > >> > with the bit-shift performed to > >> > > >> > incorporate impact of the highest bits that would otherwise never > >> > be used in index calculations because of table bounds > >> > > >> > > >> > When using this function, rather than our > >> > RendezvousAffinityFunction#partition implementation, I also saw a > >> > significant decrease in the periodicity and a better distribution > >> > of > keys > >> > amongst partitions [3]. > >> > > >> > I would like to suggest that we adopt this modified hash function > inside > >> > RendezvousAffinityFunction. > >> > > >> > Regards > >> > Mike > >> > > >> > > >> > [1]: https://i.imgur.com/0FtCZ2A.png > >> > [2]: > >> > https://www.quora.com/Why-does-Java-use-a-mediocre- > >> hashCode-implementation-for-strings > >> > [3]: https://i.imgur.com/8ZuCSA3.png > >> >