Re: Distribution of keys to partitions

Valentin Kulichenko Wed, 15 Mar 2017 09:29:04 -0700

Andrey,

Absolutely, your point is correct. I'm talking about default behavior which
must be as effective as possible. In case we do this optimization, I would
also show a warning if number of partitions is not a power of two.


-Val

On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <[email protected]> wrote:

> Anyway, we can't always use this optimization because it will not work
> for non power of two values.
>
> On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko
> <[email protected]> wrote:
> > In 99% of cases number of partition is a power of two, because it's the
> > default value. Almost no one changes it. If this change actually provides
> > better distribution, it absolutely makes sense to do it.
> >
> > Michael, can you create a Jira ticket and put you findings there?
> >
> > -Val
> >
> > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[email protected]> wrote:
> >
> >> Michael,
> >>
> >> it makes sense only for cases when partitions count is power of two.
> >> Affinity function doesn't have this limitation.
> >>
> >> Bu, of course, we can check, that partitions count is power of two and
> >> use optimized hash code calculation.
> >>
> >>
> >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
> >> <[email protected]> wrote:
> >> > Hi Igniters,
> >> >
> >> > Last week I was working with a group of Ignite users.  They are
> inserting
> >> > several million string keys in to a cache.  Each string key was
> >> > approximately 22-characters in length.  When I exported the partition
> >> > counts (via GG Visor) I was able to see an unusual periodicity in the
> >> > number of keys allocated to partitions.  I charted this in Excel [1].
> >> > After further investigation, it appears that there is a relationship
> >> > between the number of keys being inserted, the number of partitions
> >> > assigned to the cache and amount of apparent periodicity: a small
> number
> >> of
> >> > partitions will cause periodicity to appear with lower numbers of
> keys.
> >> >
> >> > The RendezvousAffinityFunction#partition function performs a simple
> >> > calculation of key hashcode modulo partition-count:
> >> >
> >> > U.safeAbs(key.hashCode() % parts)
> >> >
> >> >
> >> > Digging further I was led to the fact that this is how the Java
> HashMap
> >> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
> >> > following:
> >> >
> >> > key.hashCode() & (parts - 1)
> >> >
> >> > which performs more efficiently.  It was then updated further to do
> the
> >> > following:
> >> >
> >> > (h = key.hashCode()) ^ (h >>> 16);
> >> >
> >> > with the bit-shift performed to
> >> >
> >> > incorporate impact of the highest bits that would otherwise
> >> > never be used in index calculations because of table bounds
> >> >
> >> >
> >> > When using this function, rather than our
> >> > RendezvousAffinityFunction#partition implementation, I also saw a
> >> > significant decrease in the periodicity and a better distribution of
> keys
> >> > amongst partitions [3].
> >> >
> >> > I would like to suggest that we adopt this modified hash function
> inside
> >> > RendezvousAffinityFunction.
> >> >
> >> > Regards
> >> > Mike
> >> >
> >> >
> >> > [1]: https://i.imgur.com/0FtCZ2A.png
> >> > [2]:
> >> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> >> hashCode-implementation-for-strings
> >> > [3]: https://i.imgur.com/8ZuCSA3.png
> >>
>

Re: Distribution of keys to partitions

Reply via email to