Hi Haruki, Thanks for having a look at the KIP.
> 1. Do you have any memory-footprint estimation for
TimeControlledBloomFilter?
I don't at the moment have any estimate as I don't have a full
implementation of this one at the moment. I can work on one if it's
required.

> * If I read the KIP correctly, TimeControlledBloomFilter will be
> allocated per KafkaPrincipal so the size should be reasonably small
> considering clusters which have a large number of users.
The Map stored in the cache has 2 dimensions one is vertical which is
KafkaPrincipal (producers only) and the second horizontal which is the time
of the windows.
- Horizontally we will add only PIDs to the TimeControlledBloomFilter only
if KafkaPrincipal didn't hit the quota and we control the bloom filter by
time to expire the oldest set at some point when it's not needed anymore.
- Vertically is the tricky one if the cluster has an insane number of
KafkaPrincipals used for producing. And if the number of KafkaPrincipals is
huge we can control the memory used by the cache by throttling more
aggressively and I would argue that they will never going to be an insane
number that could cause OOM.

 >* i.e. What false-positive rate do you plan to choose as the default?
Am planning on using 0.1 as default.

> 2. What do you think about rotating windows on produce-requests arrival
instead of scheduler?
> * If we do rotation in scheduler threads, my concern is potential
> scheduler threads occupation which could make other background tasks to
> delay
This is a valid concern. We can consider disposing of the oldest bloom when
we add a new PID to the TimeControlledBloomFilter. However, I would still
need a scheduler to clean up any inactive KafkaPrincipal from the cache
layer `i.e. ProducerIdQuotaManagerCache`. Do you have the same concern
about this one too?

> 3. Why the default producer.id.quota.window.size.seconds is 1 hour?
>  * Unlike other quota types (1 second)
Mostly because 1 sec doesn't make sense for this type of quota.
Misconfigured or misbehaving producers usually don't allocate new PIDs on
the leader every sec but over a period of time.

Thanks

On Tue, Jun 6, 2023 at 5:21 PM Haruki Okada <ocadar...@gmail.com> wrote:

> Hi, Omnia.
>
> Thanks for the KIP.
> The feature sounds indeed helpful and the strategy to use bloom-filter
> looks good.
>
> I have three questions:
>
> 1. Do you have any memory-footprint estimation
> for TimeControlledBloomFilter?
>     * If I read the KIP correctly, TimeControlledBloomFilter will be
> allocated per KafkaPrincipal so the size should be reasonably small
> considering clusters which have a large number of users.
>     * i.e. What false-positive rate do you plan to choose as the default?
> 2. What do you think about rotating windows on produce-requests arrival
> instead of scheduler?
>     * If we do rotation in scheduler threads, my concern is potential
> scheduler threads occupation which could make other background tasks to
> delay
> 3. Why the default producer.id.quota.window.size.seconds is 1 hour?
>     * Unlike other quota types (1 second)
>
> Thanks,
>
> 2023年6月6日(火) 23:55 Omnia Ibrahim <o.g.h.ibra...@gmail.com>:
>
> > Hi everyone,
> > I want to start the discussion of the KIP-936 to throttle the number of
> > active PIDs per KafkaPrincipal. The proposal is here
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs
> > >
> >
> > Thanks for your time and feedback.
> > Omnia
> >
>
>
> --
> ========================
> Okada Haruki
> ocadar...@gmail.com
> ========================
>

Reply via email to