Re: Re-evaluate compaction defaults in 5.1/trunk

Dave Herrington Sun, 08 Dec 2024 07:33:30 -0800

…the analysis I describe would need to be weighted by table size.  I have
several representative production cluster tablestats analyses that show r:w
ratio by table, including table size.  I can check to see how this analysis
plays out on a few of these.


-Dave

David A. Herrington II
President and Chief Engineer
RhinoSource, Inc.

*Data Lake Architecture, Cloud Computing and Advanced Analytics.*

www.rhinosource.com


On Sun, Dec 8, 2024 at 7:22 AM Dave Herrington <[email protected]>
wrote:

> Paulo,
>
> I understand your perspective.
>
> Short of waiting for UCS to prove itself out, I guess it comes down to the
> assertion that a strong majority of Cassandra use cases would benefit from
> using LCS vs. STCS.
>
> The conventional wisdom is that workloads need to be read-heavy to make
> the extra resource consumption of LCS pay off.  4:1 read:write is the
> threshold I use to decide whether or not to use LCS.
>
> I think this ratio is important in this analysis.  Has this LCS “payoff”
> threshold changed to 2:1 or better, in favor of LCS?  This would be good to
> know.
>
> With an up-to-date threshold in hand, what is the fraction of Cassandra
> use cases that meet this updatedthreshold?
>
> For example, say this LCS payoff r:w ratio has improved to 2:1.  What
> percentage of Cassandra tables across all clusters currently in operation
> are 2:1 read-to-write or more?
>
> If the answer is a solid majority, I think this would justify the default
> change.
>
> -Dave
>
> David A. Herrington II
> President and Chief Engineer
> RhinoSource, Inc.
>
> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>
> www.rhinosource.com
>
>
> On Sun, Dec 8, 2024 at 5:43 AM Paulo Motta <[email protected]> wrote:
>
>> Hi Dave,
>>
>> I'm also in the field and my experience is different.
>>
>> I have seen new users shooting themselves in the foot with the default
>> compaction strategy STCS on a regular basis over the past few years and
>> have been recommending them to switch to LCS and they no longer encounter
>> issues after making this switch. I would like to generalize this
>> recommendation to prevent new users from having bad experiences and
>> abandoning the database.
>>
>> This is not a cost issue, it's an ease of use matter. STCS does not work
>> for mutable workloads and this is a massive functional limitation with the
>> database.
>>
>> I don't want people to download Cassandra 5.1 to try out transactions and
>> start facing issues due to bad STCS performance on mutable data.
>>
>> If you would like to optimize for cost, then you can read the docs or
>> hire a consultant to optimize the cost for you. Otherwise, the database
>> should work out of the box and this is provided by LCS. If LCS can not keep
>> up, it means the cluster is under provisioned and needs to be expanded,
>> it's not a functional issue but a capacity issue.
>>
>> Cheers,
>>
>> Paulo
>>
>> On Sun, Dec 8, 2024 at 1:26 AM Dave Herrington <[email protected]>
>> wrote:
>>
>>> Chiming in from the field, I think maintaining the familiar status quo
>>> until a panacea compaction strategy proves itself out (could that be UCS?)
>>> makes sense to me.  I feel it could be maddening to customers if LCS
>>> started showing up in schemas after an upgrade just because the default
>>> changed.  If UCS proves itself as the fits-all solution, then we’d be doing
>>> them a favor by making the default. In time.
>>>
>>> -Dave
>>>
>>> David A. Herrington II
>>> President and Chief Engineer
>>> RhinoSource, Inc.
>>>
>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>
>>> www.rhinosource.com
>>>
>>>
>>> On Sat, Dec 7, 2024 at 7:32 PM Jeff Jirsa <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Dec 7, 2024, at 7:08 PM, Mick Semb Wever <[email protected]> wrote:
>>>>
>>>> Chiming in with my two cents…
>>>>
>>>>
>>>> When people have the luxury of working in environments where clusters
>>>>> are massively over provisioned, LCS as a default makes a lot of sense,
>>>>> because there's not much downside.  The use cases where you'd actually 
>>>>> fall
>>>>> behind in compaction are pretty slim, so the negative impact isn't felt.
>>>>>
>>>>> Most people aren't doing this.  Putting LCS as the default
>>>>> significantly changes the performance profile of new clusters in a way 
>>>>> that
>>>>> actively harms a portion of the community.
>>>>>
>>>>
>>>>
>>>> Haddad's statement here resonates above everything else that's been
>>>> said so far.  It is this particular audience that I'm thinking first about
>>>> not screwing over, everyone else is a step in front of them wrt knowing
>>>> what compaction is and making an informed decision into changing it.
>>>>
>>>>
>>>> “You have to over-provision (iops) to use LCS” isn’t that different
>>>> from “you have to over-provision (space) to use LCS” (by perhaps 50%).
>>>>
>>>> Both of them are sub-optimal and you’re trading off either extra space
>>>> or extra compute/ops.
>>>>
>>>>
>>>>

Re: Re-evaluate compaction defaults in 5.1/trunk

Reply via email to