1. For Hard Rate Limits and Bursts We can allow users to set quotas with different rate limiter implementations and corresponding burst values to meet their needs. Guava RateLimiter: Designed for underutilization with "storedPermits" NonBlockingRateLimiter: Supports bursts as well. We could implement another RateLimiter (e.g.,BurstableRateLimiter/BatchRateLimiter). This type doesn't try to smooth the requests but allows bursts, which is especially suitable for batch, streaming, or OLAP scenarios. For example, if you set a quota of 1000 tokens per second and allow it to be exhausted within 1 millisecond, you'll need to wait 999 milliseconds to execute your next request.
2. Fairness and QPS as Imperfect Metrics This CEP not only cares about QPS (request number) but also about the request size. Users can set bytes_per_second(e.g., bandwidth). The system calculates the incoming CQL message size but does not calculate how much data is actually read/write from the disk. The throttling in this CEP takes main read/write requests into consideration, excluding heavy background operations like compactions. The focus of this CEP's quota is to address the primary pain points, solving 80% of the problems in normal operations. For example, consider a situation where your neighbor team triggers a Spark job that affects your online business, causing long-tail latency. If you have a hot/large node in Cassandra and discover a hot/large partition, refer to the Usage section of this CEP. Quota will save your day with a set of simple, operational, and cost-effective tools. This CEP is just the first step in addressing these issues. Step 2: Prioritized Load Shedding All requests are not created equal. In a normal system state, we treat them equally. However, when the system is in a poor state or overloaded, we prioritize high-priority requests and discard low-priority ones[1]. Step 3: Adaptive Compaction/Streaming/Snapshot Foreground requests (main read/write operations) are the first citizens in an OLTP system. Heavy background operations (e.g., compaction, streaming, snapshots) are secondary. These background tasks should observe the key metrics of foreground requests and aim to complete their tasks as quickly as possible, without negatively impacting the predictable performance of foreground requests. Step 4: Resource Controller/Scheduler/Isolation for Multi-Tenant Systems I have researched this topic for a while, reviewing other databases and papers on how to implement resource isolation (see References). The main approaches are as follows: * Abstracting CPU, I/O, and network resources into Resource Units/Groups. * Creating a custom database scheduler, as opposed to relying on the OS for CPU and I/O scheduling. I'm not sure if this is the right path. Users/develops was still too young to know that life never gives anything for nothing, and that a price is always exacted for what fate bestows. Every software feature comes with a price tag. These implementations may introduce more complexity and instability. How much users will benefit from and appreciate it? the answer is blowing in the wind. References: [1] https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94 CockroachDB https://www.cockroachlabs.com/blog/admission-control-in-cockroachdb/ TIDB: https://www.pingcap.com/blog/managing-resource-isolation-optimizing-performance-stability-tidb/ OceanBase https://oceanbase.medium.com/how-to-realize-i-o-separation-in-a-distributed-sql-database-da2574099b1c https://en.oceanbase.com/blog/2615023872 DB2(Adaptive Workload Management) https://www.tridex.org/wp-content/uploads/wlm.pdf https://www.ibm.com/docs/en/db2/11.5.x?topic=management-adaptive-workload-manager https://www.youtube.com/watch?v=KJrD6Rs4ef YRRN(Scheduling Policies and Resource Types in YARN) https://www.youtube.com/watch?v=1M5bEwHj5Wc&t=1s Risingwave https://risingwave.com/blog/workload-isolation-in-risingwave/ Scylladb: https://www.scylladb.com/2019/05/23/workload-prioritization-running-oltp-and-olap-traffic-on-the-same-superhighway/ SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores On 2026/03/09 18:51:35 Ariel Weisberg wrote: > Hi, > > Thanks for working on this. This is an important piece of functionality to > make Cassandra really viable for multi-tenancy. > > I took a quick look at the proposal. It seems to be for hard cap rate limits > without supporting burst usage and consumption of idle resources. This > reduces efficiency quite a bit because it doesn't balance between tenants at > saturation allowing them each to get some minimum with equal access to the > remaining resources. > > The proposal also doesn't cover fairness and how we are ensuring that > resources are fairly distributed when the rate limits are minimums (even if > aspirational) rather than maximums and don't restrict tenants down to below > the actual capacity of a node. > > QPS is not the only metric we need to track for fairness. CPU/IOPs and how > much memory is used (and for how long!) are all factors, but for now maybe > CPU/IOPs is the one to focus on. QPS still needs to be shared fairly because > execution slots are not unlimited. A tenant with a small expensive queries > shouldn't be able to dominate available resources. > > QPS itself is also problematic because time in the execution slot matters > just as much how many times a second the client runs a query through an > execution slot. Either way it's the time the slot is unavailable to other > tenants that matters most. > > There are also background operations like compaction to consider. A cheap to > write but expensive to compact data model can impact other tenants. > > We don't need to put absolutely everything under the scope of this single > CEP, but I think anything that's in it should be a good fit for Cassandra and > hard rate limits seems like something we should iterate on more. > > Ariel > > On Tue, Feb 24, 2026, at 4:47 AM, Justin Ling Mao wrote: > > Hi everyone: > > > > I have created a JIRA ticket:**CASSANDRA-21158**, regarding a new feature: > > Implementing quota management for multi-tenant. > > You can find the design document here: > > **https://docs.google.com/document/d/1BGDjBsuVkuISbN8lqxoZUuGbx0qRhuNA8BAxF48a24k** > > If you are interested, please join the discussion. Once we’ve had a > > thorough discussion and if the community finds this feature valuable, I > > will proceed to create a CEP (Cassandra Enhancement Proposal) and > > subsequently submit a PR. > > > > Looking forward to your feedback! > > > > > > -------------------------------- > > Best regards > > Justin Ling Mao > > Beijing,China >
