…the analysis I describe would need to be weighted by table size. I have several representative production cluster tablestats analyses that show r:w ratio by table, including table size. I can check to see how this analysis plays out on a few of these.
-Dave David A. Herrington II President and Chief Engineer RhinoSource, Inc. *Data Lake Architecture, Cloud Computing and Advanced Analytics.* www.rhinosource.com On Sun, Dec 8, 2024 at 7:22 AM Dave Herrington <he...@rhinosource.com> wrote: > Paulo, > > I understand your perspective. > > Short of waiting for UCS to prove itself out, I guess it comes down to the > assertion that a strong majority of Cassandra use cases would benefit from > using LCS vs. STCS. > > The conventional wisdom is that workloads need to be read-heavy to make > the extra resource consumption of LCS pay off. 4:1 read:write is the > threshold I use to decide whether or not to use LCS. > > I think this ratio is important in this analysis. Has this LCS “payoff” > threshold changed to 2:1 or better, in favor of LCS? This would be good to > know. > > With an up-to-date threshold in hand, what is the fraction of Cassandra > use cases that meet this updatedthreshold? > > For example, say this LCS payoff r:w ratio has improved to 2:1. What > percentage of Cassandra tables across all clusters currently in operation > are 2:1 read-to-write or more? > > If the answer is a solid majority, I think this would justify the default > change. > > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > > On Sun, Dec 8, 2024 at 5:43 AM Paulo Motta <pa...@apache.org> wrote: > >> Hi Dave, >> >> I'm also in the field and my experience is different. >> >> I have seen new users shooting themselves in the foot with the default >> compaction strategy STCS on a regular basis over the past few years and >> have been recommending them to switch to LCS and they no longer encounter >> issues after making this switch. I would like to generalize this >> recommendation to prevent new users from having bad experiences and >> abandoning the database. >> >> This is not a cost issue, it's an ease of use matter. STCS does not work >> for mutable workloads and this is a massive functional limitation with the >> database. >> >> I don't want people to download Cassandra 5.1 to try out transactions and >> start facing issues due to bad STCS performance on mutable data. >> >> If you would like to optimize for cost, then you can read the docs or >> hire a consultant to optimize the cost for you. Otherwise, the database >> should work out of the box and this is provided by LCS. If LCS can not keep >> up, it means the cluster is under provisioned and needs to be expanded, >> it's not a functional issue but a capacity issue. >> >> Cheers, >> >> Paulo >> >> On Sun, Dec 8, 2024 at 1:26 AM Dave Herrington <he...@rhinosource.com> >> wrote: >> >>> Chiming in from the field, I think maintaining the familiar status quo >>> until a panacea compaction strategy proves itself out (could that be UCS?) >>> makes sense to me. I feel it could be maddening to customers if LCS >>> started showing up in schemas after an upgrade just because the default >>> changed. If UCS proves itself as the fits-all solution, then we’d be doing >>> them a favor by making the default. In time. >>> >>> -Dave >>> >>> David A. Herrington II >>> President and Chief Engineer >>> RhinoSource, Inc. >>> >>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.* >>> >>> www.rhinosource.com >>> >>> >>> On Sat, Dec 7, 2024 at 7:32 PM Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> >>>> >>>> On Dec 7, 2024, at 7:08 PM, Mick Semb Wever <m...@apache.org> wrote: >>>> >>>> Chiming in with my two cents… >>>> >>>> >>>> When people have the luxury of working in environments where clusters >>>>> are massively over provisioned, LCS as a default makes a lot of sense, >>>>> because there's not much downside. The use cases where you'd actually >>>>> fall >>>>> behind in compaction are pretty slim, so the negative impact isn't felt. >>>>> >>>>> Most people aren't doing this. Putting LCS as the default >>>>> significantly changes the performance profile of new clusters in a way >>>>> that >>>>> actively harms a portion of the community. >>>>> >>>> >>>> >>>> Haddad's statement here resonates above everything else that's been >>>> said so far. It is this particular audience that I'm thinking first about >>>> not screwing over, everyone else is a step in front of them wrt knowing >>>> what compaction is and making an informed decision into changing it. >>>> >>>> >>>> “You have to over-provision (iops) to use LCS” isn’t that different >>>> from “you have to over-provision (space) to use LCS” (by perhaps 50%). >>>> >>>> Both of them are sub-optimal and you’re trading off either extra space >>>> or extra compute/ops. >>>> >>>> >>>>