I think lowering the number of tokens is a great idea! Similar to Jon, when
I have reduced the number of tokens for clients it has been improvement in
repair performance.

I am concerned that the proposed default value for num_tokens is too low.
If you set up a cluster using the proposed defaults, you will get a
balanced cluster. However, if you decommission nodes you will start to see
large imbalances especially for small clusters (< 20 nodes). This is
because the allocate_tokens_for_local_replication_factor setting is only
applied during the bootstrap process.

I have recommended very low values for num_tokens to clients. This was
because it was very unlikely that they would reduce their cluster size and
I warned them of the caveats with using a small value for num_tokens.

The proposed num_token default value is fine for devs and operators that
know what they are doing. However, the general Cassandra community will be
unaware of the potential issue with such a low value. We should consider
setting num_tokens to 16 - 32 as the default. This will at least help
reduce the severity of the imbalance when decommissioning a node whilst
still providing the benefits of having a low number of tokens. In addition,
we can add a comment to num_tokens that clusters over 100 nodes (per
datacenter) should consider reducing it down to 4.

Cheers,
Anthony

On Fri, 31 Jan 2020 at 01:58, Jon Haddad <j...@jonhaddad.com> wrote:

> Larger clusters is where high token counts do the most damage. That's why
> it's such a problem. You start out with a small cluster using 256, as you
> grow into the hundreds it becomes more and more unstable.
>
>
> On Thu, Jan 30, 2020, 8:19 AM onmstester onmstester
> <onmstes...@zoho.com.invalid> wrote:
>
> > Shouldn't we consider the cluster size to configure num_tokens?
> >
> > For example is it OK to use num_tokens=4 for a cluster of more than 100
> of
> > nodes?
> >
> >
> >
> > Another question that is not so much relevant to this :
> >
> > When we use the token assignment algorithm (the new/non-random one) for a
> > specific keyspace, why should we use initial token for all the seeds,
> isn't
> > one seed enough and then just set the keyspace for all other nodes?
> >
> >
> >
> > Also i do not understand why should we consider rack topology and number
> > of racks for configuration of num_tokens?
> >
> >
> >
> > Sent using https://www.zoho.com/mail/
> >
> >
> >
> >
> > ---- On Thu, 30 Jan 2020 04:33:57 +0330 Jeremy Hanna <
> > jeremy.hanna1...@gmail.com> wrote ----
> >
> >
> > The new default wouldn't be retroactively set for 3.x, but the same
> > principles apply.  The new algorithm is in 3.x as well as the
> > simplification of the configuration.  So no reason not to use the same
> > configuration on 3.x.
> >
> > > On Jan 30, 2020, at 4:34 AM, Chen-Becker, Derek <mailto:
> > dchen...@amazon.com.INVALID> wrote:
> > >
> > > Does the same guidance apply to 3.x clusters? I read through the JIRA
> > ticket linked below, along with tickets that it links to, but it's not
> > clear that the new allocation algorithm is available in 3.x or if there
> are
> > other reasons that this would be problematic.
> > >
> > > Thanks,
> > >
> > > Derek
> > >
> > > On 1/29/20, 9:54 AM, "Jon Haddad" <mailto:j...@jonhaddad.com> wrote:
> > >
> > >    Ive put a lot of my previous clients on 4 tokens, all of which have
> > >    resulted in a major improvement.
> > >
> > >    I wouldn't use any more than 4 except under some pretty unusual
> > >    circumstances.
> > >
> > >    Jon
> > >
> > >    On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead <mailto:
> > b...@instaclustr.com> wrote:
> > >
> > >> +1 to reducing the number of tokens as low as possible for
> availability
> > >> issues. 4 lgtm
> > >>
> > >> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi <mailto:
> djo...@apache.org>
> > wrote:
> > >>
> > >>> Thanks for restarting this discussion Jeremy. I personally think 4 is
> > a
> > >>> good number as a default. I think whatever we pick, we should have
> > enough
> > >>> documentation for operators to make sense of the new defaults in 4.0.
> > >>>
> > >>> Dinesh
> > >>>
> > >>>> On Jan 28, 2020, at 9:25 PM, Jeremy Hanna <mailto:
> > jeremy.hanna1...@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>> I wanted to start a discussion about the default for num_tokens that
> > >>> we'd like for people starting in Cassandra 4.0.  This is for ticket
> > >>> CASSANDRA-13701 <
> https://issues.apache.org/jira/browse/CASSANDRA-13701>
> >
> > >>> (which has been duplicated a number of times, most recently by me).
> > >>>>
> > >>>> TLDR, based on availability concerns, skew concerns, operational
> > >>> concerns, and based on the fact that the new allocation algorithm can
> > be
> > >>> configured fairly simply now, this is a proposal to go with 4 as the
> > new
> > >>> default and the allocate_tokens_for_local_replication_factor set to
> 3.
> > >>> That gives a good experience out of the box for people and is the
> most
> > >>> conservative.  It does assume that racks and DCs have been configured
> > >>> correctly.  We would, of course, go into some detail in the NEWS.txt.
> > >>>>
> > >>>> Joey Lynch and Josh Snyder did an extensive analysis of availability
> > >>> concerns with high num_tokens/virtual nodes in their paper <
> > >>>
> > >>
> >
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> > >>> .
> > >>> This worsens as clusters grow larger.  I won't quote the paper here
> > but
> > >> in
> > >>> order to have a conservative default and with the accompanying new
> > >>> allocation algorithm, I think it makes sense as a default.
> > >>>>
> > >>>> The difficulties have always been that virtual nodes have been
> > >>> beneficial for operations but that 256 is too high for the purposes
> of
> > >>> repair and as Joey and Josh cover, for availability.  Going lower
> with
> > >> the
> > >>> original allocation algorithm has produced skew in allocation in its
> > >> naive
> > >>> distribution.  Enter CASSANDRA-7032 <
> > >>> https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new
> > token
> > >>> allocation algorithm.  CASSANDRA-15260 <
> > >>> https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> > >>> algorithm operationally simpler.
> > >>>>
> > >>>> One other item of note - since Joey and Josh's analysis, there have
> > >> been
> > >>> improvements in streaming and other considerations that can reduce
> the
> > >>> probability of more than one node representing some token range being
> > >>> unavailable, but it would still be good to be conservative.
> > >>>>
> > >>>> Please chime in with any concerns with having num_tokens=4 and
> > >>> allocate_tokens_for_local_replication_factor=3 and the accompanying
> > >>> rationale so we can improve the experience for all users.
> > >>>>
> > >>>> Other resources:
> > >>>>
> > >>>
> > >>
> >
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> > >>>>
> > >>>
> > >>
> >
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> > >>>>
> > >>>
> > >>
> >
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> > >>>>
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org
> > >>> For additional commands, e-mail: mailto:
> dev-h...@cassandra.apache.org
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Ben Bromhead
> > >>
> > >> Instaclustr | www.instaclustr.com | @instaclustr
> > >> <http://twitter.com/instaclustr> | (650) 284 9692
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: mailto:dev-h...@cassandra.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: mailto:dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: mailto:dev-h...@cassandra.apache.org
>

Reply via email to