Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Oh, one last thing. If the client drivers were to implement a rate limiter based on each node's error rate, and had the ability to back off, paired with CASSANDRA-19534 , I think the majority of severe cluster outages that people experience wo

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jon Haddad
Can you elaborate what “the bad” is here? Maybe a scenario would help. I’m trying to visualize what kind of workload would be running where you wouldn’t have timeouts or a deep queue yet a node is overloaded. What is “the bad” if requests aren’t timing out? How is a node overloaded if there isn’t

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Jordan West
I agree with Josh. We need to be able to protect from a sudden burst of traffic. 19534 went a long way in that regard — at least wrt to minimizing the effects. The challenge with latency and queue depths can be that they trigger when the bad has already occurred. One other thing we are considering

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Josh McKenzie
Are those three sufficient to protect against a client that unexpectedly comes up with 100x a previous provisioned-for workload? Or 100 clients at 100x concurrently? Given that can be 100x in terms of quantity (helped by queueing and shedding), but also 100x in terms of *computational and disk i

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-21 Thread Alex Petrov
> Personally, I’m a bit skeptical that we will come up with a metric based > heuristic that works well in most scenarios and doesn’t require significant > knowledge and tuning. I think past implementations of the dynamic snitch are > good evidence of that. I am more optimistic on that font. I t