Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 
nodes i should not worry about data distribution? Sent using Zoho Mail ---- On 
Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa <jji...@gmail.com> wrote ---- 
Virtual nodes accomplish two primary goals 1) it makes it easier to gradually 
add/remove capacity to your cluster by distributing the new host capacity 
around the ring in smaller increments 2) it increases the number of sources for 
streaming, which speeds up bootstrap and decommission Whether or not either of 
these actually is true depends on a number of factors, like your cluster size 
(for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens 
per host and add a 5th host, you’ll probably add a neighbor near each existing 
host (#1) and stream from every other host (#2), so that’s great. If you have 
20 hosts and add a new host with 4 tokens, most of your existing ranges won’t 
change at all - you’re nominally adding 5% of your cluster capacity but you 
won’t see a 5% improvement because you don’t have enough tokens to move 5% of 
your ranges. If you had 32 tokens, you’d probably actually see that 5% 
improvement, because you’d likely add a new range near each of the existing 
ranges. Going down to 1 token would mean you’d probably need to manually move 
tokens after each bootstrap to rebalance, which is fine, it just takes more 
operator awareness. I don’t know how DSE calculates which replication factor to 
use for their token allocation logic, maybe they guess or take the highest or 
something. Cassandra doesn’t - we require you to be explicit, but we could 
probably do better here.

Reply via email to