edit: 4 is bad at small cluster sizes and could scare off adoption

On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller <carl.muel...@smartthings.com>
wrote:

> "large/giant clusters and admins are the target audience for the value we
> select"
>
> There are reasons aside from massive scale to pick cassandra, but the
> primary reason cassandra is selected technically is to support vertically
> scaling to large clusters.
>
> Why pick a value that once you reach scale you need to switch token count?
> It's still a ticking time bomb, although 16 won't be what 256 is.
>
> Hmmmm. But 4 is bad and could scare off adoption.
>
> Ultimately a well-written article on operations and how to transition from
> 16 --> 4 and at what point that is a good idea (aka not when your cluster
> is too big) should be a critical part of this.
>
> On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler <mich...@pbandjelly.org>
> wrote:
>
>> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
>> > one corollary of the way the algorithm works (or more
>> > precisely might not work) with multiple seeds or simultaneous
>> > multi-node bootstraps or decommissions, is that a lot of dtests
>> > start failing due to deterministic token conflicts. I wasn't
>> > able to fix that by changing solely ccm and the dtests
>> I appreciate all the detailed discussion. For a little historic context,
>> since I brought up this topic in the contributors zoom meeting, unstable
>> dtests was precisely the reason we moved the dtest configurations to
>> 'num_tokens: 32'. That value has been used in CI dtest since something
>> like 2014, when we found that this helped stabilize a large segment of
>> flaky dtest failures. No real science there, other than "this hurts less."
>>
>> I have no real opinion on the suggestions of using 4 or 16, other than I
>> believe most "default config using" new users are starting with smaller
>> numbers of nodes. The small-but-growing users and veteran large cluster
>> admins should be gaining more operational knowledge and be able to
>> adjust their own config choices according to their needs (and good
>> comment suggestions in the yaml). Whatever default config value is
>> chosen for num_tokens, I think it should suit the new users with smaller
>> clusters. The suggestion Mick makes that 16 makes a better choice for
>> small numbers of nodes, well, that would seem to be the better choice
>> for those users we are trying to help the most with the default.
>>
>> I fully agree that science, maths, and support/ops experience should
>> guide the choice, but I don't believe that large/giant clusters and
>> admins are the target audience for the value we select.
>>
>> --
>> Kind regards,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>

Reply via email to