Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-23 Thread Benedict Elliott Smith
If we’re debating the overall approach, I think we need to define what we want to achieve before we pursue any specific design. I think rate limiting is simply a proxy for cluster stability. I think implicitly we also all want to achieve client fairness. Rate limiting is one proposal for achiev

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-23 Thread Štefan Miklošovič
I know it is probably too soon to discuss the implementation details in depth as it is hard to say precisely how it will look like but I want to highlight for example this (1). Would some parts of that work touch this logic? There is also (2) which tries to solve different but somewhat related pro

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-23 Thread Alex Petrov
> Are those three sufficient to protect against a client that unexpectedly > comes up with 100x a previous provisioned-for workload? Or 100 clients at > 100x concurrently? Given that can be 100x in terms of quantity (helped by > queueing and shedding), but also 100x in terms of *computational an

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Oh, one last thing. If the client drivers were to implement a rate limiter based on each node's error rate, and had the ability to back off, paired with CASSANDRA-19534 , I think the majority of severe cluster outages that people experience wo

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Can you elaborate what “the bad” is here? Maybe a scenario would help. I’m trying to visualize what kind of workload would be running where you wouldn’t have timeouts or a deep queue yet a node is overloaded. What is “the bad” if requests aren’t timing out? How is a node overloaded if there isn’t

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jordan West
I agree with Josh. We need to be able to protect from a sudden burst of traffic. 19534 went a long way in that regard — at least wrt to minimizing the effects. The challenge with latency and queue depths can be that they trigger when the bad has already occurred. One other thing we are considering

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Josh McKenzie
Are those three sufficient to protect against a client that unexpectedly comes up with 100x a previous provisioned-for workload? Or 100 clients at 100x concurrently? Given that can be 100x in terms of quantity (helped by queueing and shedding), but also 100x in terms of *computational and disk i

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Alex Petrov
> Personally, I’m a bit skeptical that we will come up with a metric based > heuristic that works well in most scenarios and doesn’t require significant > knowledge and tuning. I think past implementations of the dynamic snitch are > good evidence of that. I am more optimistic on that font. I t

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-20 Thread Jordan West
+1 to Benedict’s (and others) comments on plugability and low overhead when disabled. The latter I think needs little justification. The reason I am big on the former is, in my opinion: decisions on approach need to be settled with numbers not anecdotes or past experience (including my own). So I w

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-20 Thread Jon Haddad
Assuming the intent was to migrate the Google Doc to the CEP, I took another look. I think there's some ambitious ideas here, and I appreciate any effort to improve Cassandra's stability. I think CASSANDRA-19534 was a massive step in the rig

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-19 Thread Benedict Elliott Smith
I just want to flag here that this is a topic I have strong opinions on, but the CEP is not really specific or detailed enough to understand precisely how it will be implemented. So, if a patch is already being produced, most of my feedback is likely to be provided some time after a patch appear

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-19 Thread Patrick McFadin
The work has begun but we don't have a VOTE thread for this CEP. Can one get started? On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia wrote: > Sure, Caleb. I will include the work as part of CASSANDRA-19534 > in the CEP-41. > > Jaydeep > >

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-05-06 Thread Jaydeep Chovatia
Sure, Caleb. I will include the work as part of CASSANDRA-19534 in the CEP-41. Jaydeep On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe wrote: > FYI, there is some ongoing sort-of-related work going on in > CASSANDRA-19534

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-05-03 Thread Caleb Rackliffe
FYI, there is some ongoing sort-of-related work going on in CASSANDRA-19534 On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia wrote: > Just created an official CEP-41 >

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-04-10 Thread Jaydeep Chovatia
Just created an official CEP-41 incorporating the feedback from this discussion. Feel free to let me know if I may have missed some important feedback in this thread that is not captured

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Jaydeep Chovatia
Thanks, Josh. I will file an official CEP with all the details in a few days and update this thread with that CEP number. Thanks a lot everyone for providing valuable insights! Jaydeep On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie wrote: > Do folks think we should file an official CEP and take

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Josh McKenzie
> Do folks think we should file an official CEP and take it there? +1 here. Synthesizing your gdoc, Caleb's work, and the feedback from this thread into a draft seems like a solid next step. On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote: > I see a lot of great ideas being discussed or

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-07 Thread Jaydeep Chovatia
I see a lot of great ideas being discussed or proposed in the past to cover the most common rate limiter candidate use cases. Do folks think we should file an official CEP and take it there? Jaydeep On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe wrote: > I just remembered the other day that I h

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-02 Thread Caleb Rackliffe
I just remembered the other day that I had done a quick writeup on the state of compaction stress-related throttling in the project: https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing I'm sure most of it is old news to the people on this thread, but I

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-30 Thread Josh McKenzie
> 2.) We should make sure the links between the "known" root causes of > cascading failures and the mechanisms we introduce to avoid them remain very > strong. Seems to me that our historical strategy was to address individual known cases one-by-one rather than looking for a more holistic load-b

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-29 Thread Caleb Rackliffe
I almost forgot CASSANDRA-15817, which introduced reject_repair_compaction_threshold, which provides a mechanism to stop repairs while compaction is underwater.On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe wrote:Hey all,I'm a bit late to the discussion. I see that we've already discussed CASSANDRA

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-26 Thread Caleb Rackliffe
Hey all, I'm a bit late to the discussion. I see that we've already discussed CASSANDRA-15013 and CASSANDRA-16663 at least in passing. Having written the latter, I'd be the first to admi

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-22 Thread Jaydeep Chovatia
c and trying to cover as >>> many paths as possible. >>> >>> German, >>> >>> Sure, let's first continue the discussions here. If it turns out that >>> there is no widespread interest in the idea then we can do 1:1 and see how >>> we

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
her on a private fork, etc. >> >> Jaydeep >> >> On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev < >> dev@cassandra.apache.org> wrote: >> >>> Jaydeep, >>> >>> I concur with Stefan that extensibility of this should be a

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jeff Jirsa
th other systems to signal a resource need  which then could kick off things like scaling Super interested in this and we have been thinking about siimilar things internally 😉 Thanks, German From: Jaydeep Chovatia <chovatia.jayd...@gmail.com> Sent: Tuesday, January 16, 2024 1:16 PM To

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
add additional metrics (e.g. write queue >>depth) and decision logic >>- There should be a way to interact with other systems to signal a >>resource need which then could kick off things like scaling >> >> >> Super interested in this and we have been

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread Jaydeep Chovatia
t siimilar things > internally 😉 > > Thanks, > German > -- > *From:* Jaydeep Chovatia > *Sent:* Tuesday, January 16, 2024 1:16 PM > *To:* dev@cassandra.apache.org > *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in > Cass

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread German Eichberger via dev
Rate Limiter in Cassandra You don't often get email from chovatia.jayd...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi Stefan, Please find my response below: 1) Currently, I am keeping the signals as interface, so one can override with a

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jaydeep Chovatia
Hi Stefan, Please find my response below: 1) Currently, I am keeping the signals as interface, so one can override with a different implementation, but a point noted that even the interface APIs could be also made dynamic so one can define APIs and its implementation, if they wish to override. 2)

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread C. Scott Andreas
Jaydeep, thanks for reaching out and for sharing this proposal. I like the direction. Please also take a look at https://issues.apache.org/jira/browse/CASSANDRA-16663 , which adds coordinator-level rate limiting on request rate. This ticket introduced a lockless rate limiter patterned on an appr

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jon Haddad
Server side rate limiting can be useful, but imo if we were to focus effort into a single place, time would be much better spent adding adaptive rate limiting to the drivers. Rate limiting at the driver level can be done based on 2 simple feedback mechanisms - error rate and latency. When a node

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Štefan Miklošovič
Hi Jaydeep, That seems quite interesting. Couple points though: 1) It would be nice if there is a way to "subscribe" to decisions your detection framework comes up with. Integration with e.g. diagnostics subsystem would be beneficial. This should be pluggable - just coding up an interface to dump

[Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jaydeep Chovatia
Hi, Happy New Year! I would like to discuss the following idea: Open-source Cassandra (CASSANDRA-15013 ) has an elementary built-in memory rate limiter based on the incoming payload from user requests. This rate limiter activates if any inco