Virtual nodes accomplish two primary goals

1) it makes it easier to gradually add/remove capacity to your cluster by 
distributing the new host capacity around the ring in smaller increments

2) it increases the number of sources for streaming, which speeds up bootstrap 
and decommission

Whether or not either of these actually is true depends on a number of factors, 
like your cluster size (for #1) and your replication factor (for #2). If you 
have 4 hosts and 4 tokens per host and add a 5th host, you’ll probably add a 
neighbor near each existing host (#1) and stream from every other host (#2), so 
that’s great. If you have 20 hosts and add a new host with 4 tokens, most of 
your existing ranges won’t change at all - you’re nominally adding 5% of your 
cluster capacity but you won’t see a 5% improvement because you don’t have 
enough tokens to move 5% of your ranges. If you had 32 tokens, you’d probably 
actually see that 5% improvement, because you’d likely add a new range near 
each of the existing ranges.

Going down to 1 token would mean you’d probably need to manually move tokens 
after each bootstrap to rebalance, which is fine, it just takes more operator 
awareness.

I don’t know how DSE calculates which replication factor to use for their token 
allocation logic, maybe they guess or take the highest or something. Cassandra 
doesn’t - we require you to be explicit, but we could probably do better here.



> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <oleksandr.shul...@zalando.de> 
> wrote:
> 
>> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <j...@jonhaddad.com> wrote:
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I 
>> recommend folks use 4 tokens for new clusters,
> 
> 
> I wonder why not setting it to all the way down to 1 then? What's the key 
> difference once you have so few vnodes?
> 
>> with some caveats.
> 
> 
> And those are?
> 
>> When you fire up a cluster, there's no way to make the initial tokens be 
>> distributed evenly, you'll get random ones.  You'll want to set them 
>> explicitly using:
>> 
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>> 
>> After you fire up the first seed, create a keyspace using RF=3 (or whatever 
>> you're planning on using) and set allocate_tokens_for_keyspace to that 
>> keyspace in your config, and join the rest of the nodes.  That gives even 
>> distribution.
> 
> 
> Do you possibly know if the DSE-style option which doesn't require a keyspace 
> to be there also works to allocate evenly distributed tokens for the very 
> first seed node?
> 
> Thanks,
> --
> Alex
> 

Reply via email to