"large/giant clusters and admins are the target audience for the value we select"
There are reasons aside from massive scale to pick cassandra, but the primary reason cassandra is selected technically is to support vertically scaling to large clusters. Why pick a value that once you reach scale you need to switch token count? It's still a ticking time bomb, although 16 won't be what 256 is. Hmmmm. But 4 is bad and could scare off adoption. Ultimately a well-written article on operations and how to transition from 16 --> 4 and at what point that is a good idea (aka not when your cluster is too big) should be a critical part of this. On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler <mich...@pbandjelly.org> wrote: > On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: > > one corollary of the way the algorithm works (or more > > precisely might not work) with multiple seeds or simultaneous > > multi-node bootstraps or decommissions, is that a lot of dtests > > start failing due to deterministic token conflicts. I wasn't > > able to fix that by changing solely ccm and the dtests > I appreciate all the detailed discussion. For a little historic context, > since I brought up this topic in the contributors zoom meeting, unstable > dtests was precisely the reason we moved the dtest configurations to > 'num_tokens: 32'. That value has been used in CI dtest since something > like 2014, when we found that this helped stabilize a large segment of > flaky dtest failures. No real science there, other than "this hurts less." > > I have no real opinion on the suggestions of using 4 or 16, other than I > believe most "default config using" new users are starting with smaller > numbers of nodes. The small-but-growing users and veteran large cluster > admins should be gaining more operational knowledge and be able to > adjust their own config choices according to their needs (and good > comment suggestions in the yaml). Whatever default config value is > chosen for num_tokens, I think it should suit the new users with smaller > clusters. The suggestion Mick makes that 16 makes a better choice for > small numbers of nodes, well, that would seem to be the better choice > for those users we are trying to help the most with the default. > > I fully agree that science, maths, and support/ops experience should > guide the choice, but I don't believe that large/giant clusters and > admins are the target audience for the value we select. > > -- > Kind regards, > Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >