I am not entirely clear on what http://wiki.apache.org/cassandra/VirtualNodes/Balance#imbalance is saying with respect to random vs. manual token selection. Can/should i assume that i will get even range distribution or close to it with random token selection? For the sake of discussion, what is a reasonable default to start with for num_tokens assuming nodes are homogenous? That wiki page mentions a default of 256 which I see commented out in cassandra.yaml; however, Config.num_tokens is set to 1. Maybe I missed where the default of 256 is used. From some initial testing though, it looks like 1 token per node is being used. Using defaults in cassandra.yaml, I see this in my logs,
WARN [main] 2012-10-31 12:06:48,591 StorageService.java (line 639) Generated random token [-8703249769453332665]. Random tokens will result in an unbalanced ring; see http://wiki.apache.org/cassandra/Operations -- - John