unbalanced token assignment with random partioner

Chris Shorrock Mon, 17 May 2010 16:06:56 -0700

I have a feeling this issue may be more misunderstanding than anything else,
but after searching for an explanation in the wiki and elsewhere my
understanding of token assignments leads me to believe that unbalancing is
bound to occur.


Given a relatively simple example if we take a 2 node cassandra setup with a
random partitioner (letting Cassandra assign the tokens), we end up with a
ring that looks like:

Address       Status     Load          Range
     Ring

69518187202527923173412511728767069233
10.10.249.11  Up         1023.44 MB
 34433420789685454480210475042362028556     |<--|
10.10.249.12  Up         251.16 MB
69518187202527923173412511728767069233     |-->|


Given my understanding of how data works based on the following wiki
statement:

*Each Cassandra server [node] is assigned a unique Token that determines
> what keys it is the first replica for. If you sort all nodes' Tokens, the
> Range of keys each is responsible for is (PreviousToken, MyToken], that is,
> from the previous token (exclusive) to the node's token (inclusive). The
> machine with the lowest Token gets both all keys less than that token, and
> all keys greater than the largest Token; this is called a "wrapping Range."
> *


Given this description this implies, in our example above that 10.10.249.11
would server keys 0 to 3.4E37 and 6.9E37 to 1.7E38 (the "wrapping Range")
while 10.10.249.12 servers 3.4E37 to 6.9E37.  Given this it seems that
10.10.249.11 would end up serving an uneven amount of data.

This issue would of course be mitigated as the cluster grows - but it seems
like the automatic token initial selection of token ranges isn't optimal.

Is this a configuration issue, a misunderstanding, a new version of math
I've developed, or?

unbalanced token assignment with random partioner

Reply via email to