Re: [Discuss] num_tokens default in Cassandra 4.0

Mick Semb Wever Fri, 31 Jan 2020 04:41:42 -0800


> TLDR, based on availability concerns, skew concerns, operational 
> concerns, and based on the fact that the new allocation algorithm can 
> be configured fairly simply now, this is a proposal to go with 4 as the 
> new default and the allocate_tokens_for_local_replication_factor set to 
> 3.

I'm uncomfortable going with the default of `num_tokens: 4`.
I would rather see a default of `num_tokens: 16` based on the following…

a) 4 num_tokens does not provide a good out-of-the-box experience.
b) 4 num_tokens doesn't provide any significant streaming benefits over 16.
c) edge-case availability doesn't trump (a) & (b)

For (a)…
The first node in each rack, up to RF racks, in each datacenter can't use the
allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first three nodes
will be poorly balanced. If three poorly unbalanced nodes in a cluster is an
issue (because the cluster is small enough) therefore 4 is the wrong default.
From our own experience, we have had to bootstrap these nodes multiple times
until they generate something ok. In practice 4 num_tokens (over 16) has
provided more headache with clients than gain.

Elaborating, 256 was originally chosen because the token randomness over that
many always averaged out. With a default of
`allocate_tokens_for_local_replication_factor: 3` this issue is largely solved,
but you will still have those initial nodes with randomly generated tokens.
Ref:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80
And to be precise: tokens are randomly generated until there is a node in each
rack up to RF racks. So, if you have RF=3, in theory (or are a newbie) you
could boot 100 nodes only in the first two racks, and they will all be random
tokens regardless of the allocate_tokens_for_local_replication_factor setting.

For example, using 4 num_tokens, 3 racks and RF=3…
- in a 6 node cluster, there's a total of 24 tokens, half of which are random,
- in a 9 node cluster, there's a total of 36 tokens, a third of which is
random,
- etc

Following this logic i would not be willing to apply 4 unless you know there
will be more than 36 nodes in each data centre, ie less than ~8% of your tokens
are randomly generated. Many clusters don't have that size, and imho that's why
4 is a bad default.

A default of 16 by the same logic only needs 9 nodes in each dc to overcome
that randomness degree.

The workaround to all this is having to manually define `initial_token: …` on
those initial nodes. I'm really not inspired imposing that upon new users.

For (b)…
there's been a number of improvements already around streaming that solves
much of what would be any difference there is between 4 and 16 num_tokens. And
4 num_tokens means bigger token ranges so could well be disadvantageous due to
over-streaming.

For (c)…
we are trying to optimise availability in situations we can never guarantee
availability. I understand it's a nice operational advantage to have in a
shit-show, but it's not a systems design that you can design and rely upon.
There's also the question of availability vs the size of the token-range that
becomes unavailable.

regards,
Mick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

Reply via email to