On Saturday, July 30, 2011, Rafael Almeida <almeida...@yahoo.com> wrote: > Hello, > > I have computers that are better than others in my cluster. In special, > there's one which is much better and I'd like to give it more load than the > others. Is it possible? I'm using RandomPartitioner, should I use other? > Should I select tokens in some particular way? How is load distribution > implemented in RandomPartitioner with respect to tokens? >
I'm answering myself this time. I think I've got things figured out, at least for RandomPartitioner. The token space goes from 0 to 2^217. There are 2^217 tokens possible. The load a node will receive is proportional to the number of tokens assigned to it. If you assign 2^217 / 2 tokens to a node, it will be responsible for half the load in the system. If you assign 2^217 / 3 tokens to a node it will be responsible for 1/3 the load and so on. But you assign only one token in cassandra's configuration file! True, but that's the first token for that node, in a range of tokens it will accept. The number of tokens actually assigned to it is the range from the value you wrote in intiial_token in cassandra.yaml up to the next token. I find it hard to explain that without an example. So, let's say the token space is actually from 0 to 100 and we have 4 nodes (let's do this in order to make things more manageble). In our example, we have the following initial_tokens: node A = 0 node B = 20 node C = 70 node D = 90 Node A would have 0 - 20 tokens assigned to it (20/100 = 20% of the load). Node B would have 70 - 20 = 50 tokens assigned to it (50% of the load). Node C would have 90 - 70 = 20 tokens assigned to it (20% of the load) and, finally, node D would have 10% of the tokens assigned to it. See how that works? If you mess up in your configuration. Let's say you set up initial_token like this: node A = 10 node B = 20 node C = 70 node D = 90 That way you'd have 10 unhandled tokens. I think cassandra detects it and set things up in a way no token is missing. But I'm not sure what it does exactly. I've tested it with two nodes and, when I make such invalid configuration, I get each node handling 50% of the load. I hope I've been clear. Please correct me if I misunderstood something.