I'm not sure I entirely agree with the docs there, as they don't quite
match my experiences, but it's going to depend a lot on your specific needs
and other parts of the configuration.
I think data distribution with low num_tokens is generally considered to be
less of a problem with larger cluster
Thanks for the response and details. I am just curious about the below
statement mentioned in the doc. I am pretty confident that my clusters are
going to grow to 100+ nodes (same DC or combining all DCs). I am just
concerned that the doc says it is *not recommended for clusters over 50
nodes*.
16
More tokens: better data distribution, more expensive repairs, higher
probability of a multi-host outage taking some data offline and affecting
availability.
I think with >100 nodes the repair times and availability improvements make
a strong case for 16 tokens even though it means you'll need mo