I have an implementation of a layered Bloom filter in [1] (note the layered branch). This should handle the layering Bloom filter and allow for layers that
1. Do not become over populated and thus yield too many false positives. 2. Expire and are removed automatically. The layered Bloom filter also does not need another thread to remove the old filters as this is amortized across all the inserts. In addition, the number of Layered Bloom filter instances can be reduced by hashing the Kafka Principal and the ID together into the Bloom filter to look for. [1] https://github.com/Claudenw/BloomFilters/tree/layered On Sun, Jun 18, 2023 at 10:18 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote: > Hi Haruki, Thanks for having a look at the KIP. > > 1. Do you have any memory-footprint estimation for > TimeControlledBloomFilter? > I don't at the moment have any estimate as I don't have a full > implementation of this one at the moment. I can work on one if it's > required. > > > * If I read the KIP correctly, TimeControlledBloomFilter will be > > allocated per KafkaPrincipal so the size should be reasonably small > > considering clusters which have a large number of users. > The Map stored in the cache has 2 dimensions one is vertical which is > KafkaPrincipal (producers only) and the second horizontal which is the time > of the windows. > - Horizontally we will add only PIDs to the TimeControlledBloomFilter only > if KafkaPrincipal didn't hit the quota and we control the bloom filter by > time to expire the oldest set at some point when it's not needed anymore. > - Vertically is the tricky one if the cluster has an insane number of > KafkaPrincipals used for producing. And if the number of KafkaPrincipals is > huge we can control the memory used by the cache by throttling more > aggressively and I would argue that they will never going to be an insane > number that could cause OOM. > > >* i.e. What false-positive rate do you plan to choose as the default? > Am planning on using 0.1 as default. > > > 2. What do you think about rotating windows on produce-requests arrival > instead of scheduler? > > * If we do rotation in scheduler threads, my concern is potential > > scheduler threads occupation which could make other background tasks to > > delay > This is a valid concern. We can consider disposing of the oldest bloom when > we add a new PID to the TimeControlledBloomFilter. However, I would still > need a scheduler to clean up any inactive KafkaPrincipal from the cache > layer `i.e. ProducerIdQuotaManagerCache`. Do you have the same concern > about this one too? > > > 3. Why the default producer.id.quota.window.size.seconds is 1 hour? > > * Unlike other quota types (1 second) > Mostly because 1 sec doesn't make sense for this type of quota. > Misconfigured or misbehaving producers usually don't allocate new PIDs on > the leader every sec but over a period of time. > > Thanks > > On Tue, Jun 6, 2023 at 5:21 PM Haruki Okada <ocadar...@gmail.com> wrote: > > > Hi, Omnia. > > > > Thanks for the KIP. > > The feature sounds indeed helpful and the strategy to use bloom-filter > > looks good. > > > > I have three questions: > > > > 1. Do you have any memory-footprint estimation > > for TimeControlledBloomFilter? > > * If I read the KIP correctly, TimeControlledBloomFilter will be > > allocated per KafkaPrincipal so the size should be reasonably small > > considering clusters which have a large number of users. > > * i.e. What false-positive rate do you plan to choose as the default? > > 2. What do you think about rotating windows on produce-requests arrival > > instead of scheduler? > > * If we do rotation in scheduler threads, my concern is potential > > scheduler threads occupation which could make other background tasks to > > delay > > 3. Why the default producer.id.quota.window.size.seconds is 1 hour? > > * Unlike other quota types (1 second) > > > > Thanks, > > > > 2023年6月6日(火) 23:55 Omnia Ibrahim <o.g.h.ibra...@gmail.com>: > > > > > Hi everyone, > > > I want to start the discussion of the KIP-936 to throttle the number of > > > active PIDs per KafkaPrincipal. The proposal is here > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs > > > > > > > > > > Thanks for your time and feedback. > > > Omnia > > > > > > > > > -- > > ======================== > > Okada Haruki > > ocadar...@gmail.com > > ======================== > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren