Your example only really applies if someone is using a 20 node cluster at
RF=1, something I've never seen, but I'm sure exists somewhere.
Realistically, RF=3 using racks (or AWS regions) and 21 nodes, means you'll
have 3 racks with 7 nodes per rack.  Adding a single node is an unlikely
operation, you'd probably add 3, one in each region / rack.  In that case
you'd distribute the load from 7 to 8 nodes, a 14% improvement, and for
that number it would help every node even with only 4 tokens.

Generally speaking I would expand cluster capacity by a percentage (say
15-20%), so you'd be looking at adding a handful of nodes to each rack,
which will still be beneficial when using only 4 tokens.

On Sat, Sep 8, 2018 at 1:00 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Virtual nodes accomplish two primary goals
>
> 1) it makes it easier to gradually add/remove capacity to your cluster by
> distributing the new host capacity around the ring in smaller increments
>
> 2) it increases the number of sources for streaming, which speeds up
> bootstrap and decommission
>
> Whether or not either of these actually is true depends on a number of
> factors, like your cluster size (for #1) and your replication factor (for
> #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll
> probably add a neighbor near each existing host (#1) and stream from every
> other host (#2), so that’s great. If you have 20 hosts and add a new host
> with 4 tokens, most of your existing ranges won’t change at all - you’re
> nominally adding 5% of your cluster capacity but you won’t see a 5%
> improvement because you don’t have enough tokens to move 5% of your ranges.
> If you had 32 tokens, you’d probably actually see that 5% improvement,
> because you’d likely add a new range near each of the existing ranges.
>
> Going down to 1 token would mean you’d probably need to manually move
> tokens after each bootstrap to rebalance, which is fine, it just takes more
> operator awareness.
>
> I don’t know how DSE calculates which replication factor to use for their
> token allocation logic, maybe they guess or take the highest or something.
> Cassandra doesn’t - we require you to be explicit, but we could probably do
> better here.
>
>
>
> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <j...@jonhaddad.com> wrote:
>
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I
>> recommend folks use 4 tokens for new clusters,
>>
>
> I wonder why not setting it to all the way down to 1 then? What's the key
> difference once you have so few vnodes?
>
> with some caveats.
>>
>
> And those are?
>
> When you fire up a cluster, there's no way to make the initial tokens be
>> distributed evenly, you'll get random ones.  You'll want to set them
>> explicitly using:
>>
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>>
>>
>> After you fire up the first seed, create a keyspace using RF=3 (or
>> whatever you're planning on using) and set allocate_tokens_for_keyspace to
>> that keyspace in your config, and join the rest of the nodes.  That gives
>> even distribution.
>>
>
> Do you possibly know if the DSE-style option which doesn't require a
> keyspace to be there also works to allocate evenly distributed tokens for
> the very first seed node?
>
> Thanks,
> --
> Alex
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Reply via email to