Re: CASSANDRA-21158: Implementing quota management for multi-tenant

Ling Mao Fri, 13 Mar 2026 03:03:06 -0700

1. For Hard Rate Limits and Bursts
We can allow users to set quotas with different rate limiter implementations 
and corresponding burst values to meet their needs.
Guava RateLimiter: Designed for underutilization with "storedPermits"
NonBlockingRateLimiter: Supports bursts as well.
We could implement another RateLimiter 
(e.g.,BurstableRateLimiter/BatchRateLimiter). This type doesn't try to smooth 
the requests but allows bursts, which is especially suitable for batch, 
streaming, or OLAP scenarios. For example, if you set a quota of 1000 tokens 
per second and allow it to be exhausted within 1 millisecond, you'll need to 
wait 999 milliseconds to execute your next request.

2. Fairness and QPS as Imperfect Metrics
This CEP not only cares about QPS (request number) but also about the request 
size. Users can set bytes_per_second(e.g., bandwidth). The system calculates 
the incoming CQL message size but does not calculate how much data is actually 
read/write from the disk.
The throttling in this CEP takes main read/write requests into consideration, 
excluding heavy background operations like compactions.
The focus of this CEP's quota is to address the primary pain points, solving 
80% of the problems in normal operations. For example, consider a situation 
where your neighbor team triggers a Spark job that affects your online 
business, causing long-tail latency. If you have a hot/large node in Cassandra 
and discover a hot/large partition, refer to the Usage section of this CEP. 
Quota will save your day with a set of simple, operational, and cost-effective 
tools. This CEP is just the first step in addressing these issues.

Step 2: Prioritized Load Shedding
All requests are not created equal. In a normal system state, we treat them 
equally. However, when the system is in a poor state or overloaded, we 
prioritize high-priority requests and discard low-priority ones[1].

Step 3: Adaptive Compaction/Streaming/Snapshot
Foreground requests (main read/write operations) are the first citizens in an 
OLTP system. Heavy background operations (e.g., compaction, streaming, 
snapshots) are secondary. These background tasks should observe the key metrics 
of foreground requests and aim to complete their tasks as quickly as possible, 
without negatively impacting the predictable performance of foreground requests.

Step 4: Resource Controller/Scheduler/Isolation for Multi-Tenant Systems
I have researched this topic for a while, reviewing other databases and papers 
on how to implement resource isolation (see References). The main approaches 
are as follows:
* Abstracting CPU, I/O, and network resources into Resource Units/Groups.
* Creating a custom database scheduler, as opposed to relying on the OS for CPU 
and I/O scheduling.
I'm not sure if this is the right path. Users/develops was still too young to 
know that life never gives anything for nothing, and that a price is always 
exacted for what fate bestows. Every software feature comes with a price tag. 
These implementations may introduce more complexity and instability. How much 
users will benefit from and appreciate it? the answer is blowing in the wind.

References:
[1] 
https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94
CockroachDB
https://www.cockroachlabs.com/blog/admission-control-in-cockroachdb/
TIDB:
https://www.pingcap.com/blog/managing-resource-isolation-optimizing-performance-stability-tidb/
OceanBase
https://oceanbase.medium.com/how-to-realize-i-o-separation-in-a-distributed-sql-database-da2574099b1c
https://en.oceanbase.com/blog/2615023872
DB2(Adaptive Workload Management)
https://www.tridex.org/wp-content/uploads/wlm.pdf
https://www.ibm.com/docs/en/db2/11.5.x?topic=management-adaptive-workload-manager
https://www.youtube.com/watch?v=KJrD6Rs4ef
YRRN(Scheduling Policies and Resource Types in YARN)
https://www.youtube.com/watch?v=1M5bEwHj5Wc&t=1s
Risingwave
https://risingwave.com/blog/workload-isolation-in-risingwave/
Scylladb:
https://www.scylladb.com/2019/05/23/workload-prioritization-running-oltp-and-olap-traffic-on-the-same-superhighway/
SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores

On 2026/03/09 18:51:35 Ariel Weisberg wrote:
> Hi,
> 
> Thanks for working on this. This is an important piece of functionality to 
> make Cassandra really viable for multi-tenancy.
> 
> I took a quick look at the proposal. It seems to be for hard cap rate limits 
> without supporting burst usage and consumption of idle resources. This 
> reduces efficiency quite a bit  because it doesn't balance between tenants at 
> saturation allowing them each to get some minimum with equal access to the 
> remaining resources.
> 
> The proposal also doesn't cover fairness and how we are ensuring that 
> resources are fairly distributed when the rate limits are minimums (even if 
> aspirational) rather than maximums and don't restrict tenants down to below 
> the actual capacity of a node.
> 
> QPS is not the only metric we need to track for fairness. CPU/IOPs and how 
> much memory is used (and for how long!) are all factors, but for now maybe 
> CPU/IOPs is the one to focus on. QPS still needs to be shared fairly because 
> execution slots are not unlimited. A tenant with a small expensive queries 
> shouldn't be able to dominate available resources.
> 
> QPS itself is also problematic because time in the execution slot matters 
> just as much how many times a second the client runs a query through an 
> execution slot. Either way it's the time the slot is unavailable to other 
> tenants that matters most.
> 
> There are also background operations like compaction to consider. A cheap to 
> write but expensive to compact data model can impact other tenants.
> 
> We don't need to put absolutely everything under the scope of this single 
> CEP, but I think anything that's in it should be a good fit for Cassandra and 
> hard rate limits seems like something we should iterate on more.
> 
> Ariel
> 
> On Tue, Feb 24, 2026, at 4:47 AM, Justin Ling Mao wrote:
> > Hi everyone:
> > 
> > I have created a JIRA ticket:**CASSANDRA-21158**, regarding a new feature: 
> > Implementing quota management for multi-tenant.
> > You can find the design document here: 
> > **https://docs.google.com/document/d/1BGDjBsuVkuISbN8lqxoZUuGbx0qRhuNA8BAxF48a24k**
> > If you are interested, please join the discussion. Once we’ve had a 
> > thorough discussion and if the community finds this feature valuable, I 
> > will proceed to create a CEP (Cassandra Enhancement Proposal) and 
> > subsequently submit a PR.
> > 
> > Looking forward to your feedback!
> > 
> > 
> > --------------------------------
> > Best regards
> > Justin Ling Mao
> > Beijing,China
>

Re: CASSANDRA-21158: Implementing quota management for multi-tenant

Reply via email to