Yes, if you add nodes when the existing one doesn't have enough data to guess a good token from the keys it has, it uses a random token. Created https://issues.apache.org/jira/browse/CASSANDRA-1112 to use midpoint instead.
On Mon, May 17, 2010 at 4:06 PM, Chris Shorrock <ch...@shorrockin.com> wrote: > I have a feeling this issue may be more misunderstanding than anything else, > but after searching for an explanation in the wiki and elsewhere my > understanding of token assignments leads me to believe that unbalancing is > bound to occur. > Given a relatively simple example if we take a 2 node cassandra setup with a > random partitioner (letting Cassandra assign the tokens), we end up with a > ring that looks like: > > Address Status Load Range > Ring > > 69518187202527923173412511728767069233 > 10.10.249.11 Up 1023.44 MB > 34433420789685454480210475042362028556 |<--| > 10.10.249.12 Up 251.16 MB > 69518187202527923173412511728767069233 |-->| > > Given my understanding of how data works based on the following wiki > statement: >> >> Each Cassandra server [node] is assigned a unique Token that determines >> what keys it is the first replica for. If you sort all nodes' Tokens, the >> Range of keys each is responsible for is (PreviousToken, MyToken], that is, >> from the previous token (exclusive) to the node's token (inclusive). The >> machine with the lowest Token gets both all keys less than that token, and >> all keys greater than the largest Token; this is called a "wrapping Range." > > Given this description this implies, in our example above that 10.10.249.11 > would server keys 0 to 3.4E37 and 6.9E37 to 1.7E38 (the "wrapping Range") > while 10.10.249.12 servers 3.4E37 to 6.9E37. Given this it seems that > 10.10.249.11 would end up serving an uneven amount of data. > This issue would of course be mitigated as the cluster grows - but it seems > like the automatic token initial selection of token ranges isn't optimal. > Is this a configuration issue, a misunderstanding, a new version of math > I've developed, or? > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com