RE: Distribution of keys to partitions

Michael Griggs Wed, 15 Mar 2017 09:51:06 -0700

Have we ever heard of somebody needing to set the partition count to a 
non-power-of-two number?  Perhaps we could restrict the method so that it will 
only accept a power of two as the partition count?


-----Original Message-----
From: Valentin Kulichenko [mailto:valentin.kuliche...@gmail.com] 
Sent: 15 March 2017 16:22
To: dev@ignite.apache.org
Subject: Re: Distribution of keys to partitions

Andrey,

Absolutely, your point is correct. I'm talking about default behavior which 
must be as effective as possible. In case we do this optimization, I would also 
show a warning if number of partitions is not a power of two.

-Val

On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <ag...@apache.org> wrote:

> Anyway, we can't always use this optimization because it will not work 
> for non power of two values.
>
> On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko 
> <valentin.kuliche...@gmail.com> wrote:
> > In 99% of cases number of partition is a power of two, because it's 
> > the default value. Almost no one changes it. If this change actually 
> > provides better distribution, it absolutely makes sense to do it.
> >
> > Michael, can you create a Jira ticket and put you findings there?
> >
> > -Val
> >
> > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <ag...@apache.org> wrote:
> >
> >> Michael,
> >>
> >> it makes sense only for cases when partitions count is power of two.
> >> Affinity function doesn't have this limitation.
> >>
> >> Bu, of course, we can check, that partitions count is power of two 
> >> and use optimized hash code calculation.
> >>
> >>
> >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs 
> >> <michael.gri...@gridgain.com> wrote:
> >> > Hi Igniters,
> >> >
> >> > Last week I was working with a group of Ignite users.  They are
> inserting
> >> > several million string keys in to a cache.  Each string key was 
> >> > approximately 22-characters in length.  When I exported the 
> >> > partition counts (via GG Visor) I was able to see an unusual 
> >> > periodicity in the number of keys allocated to partitions.  I charted 
> >> > this in Excel [1].
> >> > After further investigation, it appears that there is a 
> >> > relationship between the number of keys being inserted, the 
> >> > number of partitions assigned to the cache and amount of apparent 
> >> > periodicity: a small
> number
> >> of
> >> > partitions will cause periodicity to appear with lower numbers of
> keys.
> >> >
> >> > The RendezvousAffinityFunction#partition function performs a 
> >> > simple calculation of key hashcode modulo partition-count:
> >> >
> >> > U.safeAbs(key.hashCode() % parts)
> >> >
> >> >
> >> > Digging further I was led to the fact that this is how the Java
> HashMap
> >> > *used* to behave [2], but was upgraded around Java 1.4 to perform 
> >> > the
> >> > following:
> >> >
> >> > key.hashCode() & (parts - 1)
> >> >
> >> > which performs more efficiently.  It was then updated further to 
> >> > do
> the
> >> > following:
> >> >
> >> > (h = key.hashCode()) ^ (h >>> 16);
> >> >
> >> > with the bit-shift performed to
> >> >
> >> > incorporate impact of the highest bits that would otherwise never 
> >> > be used in index calculations because of table bounds
> >> >
> >> >
> >> > When using this function, rather than our 
> >> > RendezvousAffinityFunction#partition implementation, I also saw a 
> >> > significant decrease in the periodicity and a better distribution 
> >> > of
> keys
> >> > amongst partitions [3].
> >> >
> >> > I would like to suggest that we adopt this modified hash function
> inside
> >> > RendezvousAffinityFunction.
> >> >
> >> > Regards
> >> > Mike
> >> >
> >> >
> >> > [1]: https://i.imgur.com/0FtCZ2A.png
> >> > [2]:
> >> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> >> hashCode-implementation-for-strings
> >> > [3]: https://i.imgur.com/8ZuCSA3.png
> >>
>

RE: Distribution of keys to partitions

Reply via email to