Re: Distribution of keys to partitions

Andrey Gura Wed, 15 Mar 2017 09:09:40 -0700

Anyway, we can't always use this optimization because it will not work
for non power of two values.


On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko
<valentin.kuliche...@gmail.com> wrote:
> In 99% of cases number of partition is a power of two, because it's the
> default value. Almost no one changes it. If this change actually provides
> better distribution, it absolutely makes sense to do it.
>
> Michael, can you create a Jira ticket and put you findings there?
>
> -Val
>
> On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <ag...@apache.org> wrote:
>
>> Michael,
>>
>> it makes sense only for cases when partitions count is power of two.
>> Affinity function doesn't have this limitation.
>>
>> Bu, of course, we can check, that partitions count is power of two and
>> use optimized hash code calculation.
>>
>>
>> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
>> <michael.gri...@gridgain.com> wrote:
>> > Hi Igniters,
>> >
>> > Last week I was working with a group of Ignite users.  They are inserting
>> > several million string keys in to a cache.  Each string key was
>> > approximately 22-characters in length.  When I exported the partition
>> > counts (via GG Visor) I was able to see an unusual periodicity in the
>> > number of keys allocated to partitions.  I charted this in Excel [1].
>> > After further investigation, it appears that there is a relationship
>> > between the number of keys being inserted, the number of partitions
>> > assigned to the cache and amount of apparent periodicity: a small number
>> of
>> > partitions will cause periodicity to appear with lower numbers of keys.
>> >
>> > The RendezvousAffinityFunction#partition function performs a simple
>> > calculation of key hashcode modulo partition-count:
>> >
>> > U.safeAbs(key.hashCode() % parts)
>> >
>> >
>> > Digging further I was led to the fact that this is how the Java HashMap
>> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
>> > following:
>> >
>> > key.hashCode() & (parts - 1)
>> >
>> > which performs more efficiently.  It was then updated further to do the
>> > following:
>> >
>> > (h = key.hashCode()) ^ (h >>> 16);
>> >
>> > with the bit-shift performed to
>> >
>> > incorporate impact of the highest bits that would otherwise
>> > never be used in index calculations because of table bounds
>> >
>> >
>> > When using this function, rather than our
>> > RendezvousAffinityFunction#partition implementation, I also saw a
>> > significant decrease in the periodicity and a better distribution of keys
>> > amongst partitions [3].
>> >
>> > I would like to suggest that we adopt this modified hash function inside
>> > RendezvousAffinityFunction.
>> >
>> > Regards
>> > Mike
>> >
>> >
>> > [1]: https://i.imgur.com/0FtCZ2A.png
>> > [2]:
>> > https://www.quora.com/Why-does-Java-use-a-mediocre-
>> hashCode-implementation-for-strings
>> > [3]: https://i.imgur.com/8ZuCSA3.png
>>

Re: Distribution of keys to partitions

Reply via email to