Your example only really applies if someone is using a 20 node cluster at RF=1, something I've never seen, but I'm sure exists somewhere. Realistically, RF=3 using racks (or AWS regions) and 21 nodes, means you'll have 3 racks with 7 nodes per rack. Adding a single node is an unlikely operation, you'd probably add 3, one in each region / rack. In that case you'd distribute the load from 7 to 8 nodes, a 14% improvement, and for that number it would help every node even with only 4 tokens.
Generally speaking I would expand cluster capacity by a percentage (say 15-20%), so you'd be looking at adding a handful of nodes to each rack, which will still be beneficial when using only 4 tokens. On Sat, Sep 8, 2018 at 1:00 PM Jeff Jirsa <jji...@gmail.com> wrote: > Virtual nodes accomplish two primary goals > > 1) it makes it easier to gradually add/remove capacity to your cluster by > distributing the new host capacity around the ring in smaller increments > > 2) it increases the number of sources for streaming, which speeds up > bootstrap and decommission > > Whether or not either of these actually is true depends on a number of > factors, like your cluster size (for #1) and your replication factor (for > #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll > probably add a neighbor near each existing host (#1) and stream from every > other host (#2), so that’s great. If you have 20 hosts and add a new host > with 4 tokens, most of your existing ranges won’t change at all - you’re > nominally adding 5% of your cluster capacity but you won’t see a 5% > improvement because you don’t have enough tokens to move 5% of your ranges. > If you had 32 tokens, you’d probably actually see that 5% improvement, > because you’d likely add a new range near each of the existing ranges. > > Going down to 1 token would mean you’d probably need to manually move > tokens after each bootstrap to rebalance, which is fine, it just takes more > operator awareness. > > I don’t know how DSE calculates which replication factor to use for their > token allocation logic, maybe they guess or take the highest or something. > Cassandra doesn’t - we require you to be explicit, but we could probably do > better here. > > > > On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <j...@jonhaddad.com> wrote: > >> 256 tokens is a pretty terrible default setting especially post 3.0. I >> recommend folks use 4 tokens for new clusters, >> > > I wonder why not setting it to all the way down to 1 then? What's the key > difference once you have so few vnodes? > > with some caveats. >> > > And those are? > > When you fire up a cluster, there's no way to make the initial tokens be >> distributed evenly, you'll get random ones. You'll want to set them >> explicitly using: >> >> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])' >> >> >> After you fire up the first seed, create a keyspace using RF=3 (or >> whatever you're planning on using) and set allocate_tokens_for_keyspace to >> that keyspace in your config, and join the rest of the nodes. That gives >> even distribution. >> > > Do you possibly know if the DSE-style option which doesn't require a > keyspace to be there also works to allocate evenly distributed tokens for > the very first seed node? > > Thanks, > -- > Alex > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade