Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256
nodes i should not worry about data distribution? Sent using Zoho Mail ---- On
Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa <jji...@gmail.com> wrote ----
Virtual nodes accomplish two primary goals 1) it makes it easier to gradually
add/remove capacity to your cluster by distributing the new host capacity
around the ring in smaller increments 2) it increases the number of sources for
streaming, which speeds up bootstrap and decommission Whether or not either of
these actually is true depends on a number of factors, like your cluster size
(for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens
per host and add a 5th host, you’ll probably add a neighbor near each existing
host (#1) and stream from every other host (#2), so that’s great. If you have
20 hosts and add a new host with 4 tokens, most of your existing ranges won’t
change at all - you’re nominally adding 5% of your cluster capacity but you
won’t see a 5% improvement because you don’t have enough tokens to move 5% of
your ranges. If you had 32 tokens, you’d probably actually see that 5%
improvement, because you’d likely add a new range near each of the existing
ranges. Going down to 1 token would mean you’d probably need to manually move
tokens after each bootstrap to rebalance, which is fine, it just takes more
operator awareness. I don’t know how DSE calculates which replication factor to
use for their token allocation logic, maybe they guess or take the highest or
something. Cassandra doesn’t - we require you to be explicit, but we could
probably do better here.