> TLDR, based on availability concerns, skew concerns, operational > concerns, and based on the fact that the new allocation algorithm can > be configured fairly simply now, this is a proposal to go with 4 as the > new default and the allocate_tokens_for_local_replication_factor set to > 3.
I'm uncomfortable going with the default of `num_tokens: 4`. I would rather see a default of `num_tokens: 16` based on the following… a) 4 num_tokens does not provide a good out-of-the-box experience. b) 4 num_tokens doesn't provide any significant streaming benefits over 16. c) edge-case availability doesn't trump (a) & (b) For (a)… The first node in each rack, up to RF racks, in each datacenter can't use the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first three nodes will be poorly balanced. If three poorly unbalanced nodes in a cluster is an issue (because the cluster is small enough) therefore 4 is the wrong default. From our own experience, we have had to bootstrap these nodes multiple times until they generate something ok. In practice 4 num_tokens (over 16) has provided more headache with clients than gain. Elaborating, 256 was originally chosen because the token randomness over that many always averaged out. With a default of `allocate_tokens_for_local_replication_factor: 3` this issue is largely solved, but you will still have those initial nodes with randomly generated tokens. Ref: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80 And to be precise: tokens are randomly generated until there is a node in each rack up to RF racks. So, if you have RF=3, in theory (or are a newbie) you could boot 100 nodes only in the first two racks, and they will all be random tokens regardless of the allocate_tokens_for_local_replication_factor setting. For example, using 4 num_tokens, 3 racks and RF=3… - in a 6 node cluster, there's a total of 24 tokens, half of which are random, - in a 9 node cluster, there's a total of 36 tokens, a third of which is random, - etc Following this logic i would not be willing to apply 4 unless you know there will be more than 36 nodes in each data centre, ie less than ~8% of your tokens are randomly generated. Many clusters don't have that size, and imho that's why 4 is a bad default. A default of 16 by the same logic only needs 9 nodes in each dc to overcome that randomness degree. The workaround to all this is having to manually define `initial_token: …` on those initial nodes. I'm really not inspired imposing that upon new users. For (b)… there's been a number of improvements already around streaming that solves much of what would be any difference there is between 4 and 16 num_tokens. And 4 num_tokens means bigger token ranges so could well be disadvantageous due to over-streaming. For (c)… we are trying to optimise availability in situations we can never guarantee availability. I understand it's a nice operational advantage to have in a shit-show, but it's not a systems design that you can design and rely upon. There's also the question of availability vs the size of the token-range that becomes unavailable. regards, Mick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org