Paulo,

I understand your perspective.

Short of waiting for UCS to prove itself out, I guess it comes down to the
assertion that a strong majority of Cassandra use cases would benefit from
using LCS vs. STCS.

The conventional wisdom is that workloads need to be read-heavy to make the
extra resource consumption of LCS pay off.  4:1 read:write is the threshold
I use to decide whether or not to use LCS.

I think this ratio is important in this analysis.  Has this LCS “payoff”
threshold changed to 2:1 or better, in favor of LCS?  This would be good to
know.

With an up-to-date threshold in hand, what is the fraction of Cassandra use
cases that meet this updatedthreshold?

For example, say this LCS payoff r:w ratio has improved to 2:1.  What
percentage of Cassandra tables across all clusters currently in operation
are 2:1 read-to-write or more?

If the answer is a solid majority, I think this would justify the default
change.

-Dave

David A. Herrington II
President and Chief Engineer
RhinoSource, Inc.

*Data Lake Architecture, Cloud Computing and Advanced Analytics.*

www.rhinosource.com


On Sun, Dec 8, 2024 at 5:43 AM Paulo Motta <pa...@apache.org> wrote:

> Hi Dave,
>
> I'm also in the field and my experience is different.
>
> I have seen new users shooting themselves in the foot with the default
> compaction strategy STCS on a regular basis over the past few years and
> have been recommending them to switch to LCS and they no longer encounter
> issues after making this switch. I would like to generalize this
> recommendation to prevent new users from having bad experiences and
> abandoning the database.
>
> This is not a cost issue, it's an ease of use matter. STCS does not work
> for mutable workloads and this is a massive functional limitation with the
> database.
>
> I don't want people to download Cassandra 5.1 to try out transactions and
> start facing issues due to bad STCS performance on mutable data.
>
> If you would like to optimize for cost, then you can read the docs or hire
> a consultant to optimize the cost for you. Otherwise, the database should
> work out of the box and this is provided by LCS. If LCS can not keep up, it
> means the cluster is under provisioned and needs to be expanded, it's not a
> functional issue but a capacity issue.
>
> Cheers,
>
> Paulo
>
> On Sun, Dec 8, 2024 at 1:26 AM Dave Herrington <he...@rhinosource.com>
> wrote:
>
>> Chiming in from the field, I think maintaining the familiar status quo
>> until a panacea compaction strategy proves itself out (could that be UCS?)
>> makes sense to me.  I feel it could be maddening to customers if LCS
>> started showing up in schemas after an upgrade just because the default
>> changed.  If UCS proves itself as the fits-all solution, then we’d be doing
>> them a favor by making the default. In time.
>>
>> -Dave
>>
>> David A. Herrington II
>> President and Chief Engineer
>> RhinoSource, Inc.
>>
>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>
>> www.rhinosource.com
>>
>>
>> On Sat, Dec 7, 2024 at 7:32 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>>
>>>
>>> On Dec 7, 2024, at 7:08 PM, Mick Semb Wever <m...@apache.org> wrote:
>>>
>>> Chiming in with my two cents…
>>>
>>>
>>> When people have the luxury of working in environments where clusters
>>>> are massively over provisioned, LCS as a default makes a lot of sense,
>>>> because there's not much downside.  The use cases where you'd actually fall
>>>> behind in compaction are pretty slim, so the negative impact isn't felt.
>>>>
>>>> Most people aren't doing this.  Putting LCS as the default
>>>> significantly changes the performance profile of new clusters in a way that
>>>> actively harms a portion of the community.
>>>>
>>>
>>>
>>> Haddad's statement here resonates above everything else that's been said
>>> so far.  It is this particular audience that I'm thinking first about not
>>> screwing over, everyone else is a step in front of them wrt knowing what
>>> compaction is and making an informed decision into changing it.
>>>
>>>
>>> “You have to over-provision (iops) to use LCS” isn’t that different from
>>> “you have to over-provision (space) to use LCS” (by perhaps 50%).
>>>
>>> Both of them are sub-optimal and you’re trading off either extra space
>>> or extra compute/ops.
>>>
>>>
>>>

Reply via email to