Bump this thread to see if there are any comments/thoughts. Thanks. Luke
On Mon, Sep 26, 2022 at 11:06 AM Luke Chen <show...@gmail.com> wrote: > Hi devs, > > As stated in the motivation section in KIP-854 > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry>: > > With idempotent producers becoming the default in Kafka, this means that > unless otherwise specified, all new producers will be given producer IDs. > Some (inefficient) applications may now create many non-transactional > idempotent producers. Each of these producers will be assigned a producer > ID and these IDs and their metadata are stored in the broker memory, which > might cause brokers out of memory. > > Justine (in cc.) and I and some other team members are working on the > solutions for this issue. But none of them solves it completely without > side effects. Among them, "availability" VS "idempotency guarantees" is > what we can't decide which to sacrifice. Some of these solutions sacrifice > availability of produce (1,2,5) and others sacrifice idempotency guarantees > (3). It could be useful to know if people generally have a preference one > way or the other. Or what other better solutions there might be. > > Here are the proposals we came up with: > > 1. Limit the total active producer ID allocation number. > -> This is the simplest solution. But since the OOM issue is usually > caused by a rogue or misconfigured client, and this solution might "punish" > the good client from sending messages. > > 2. Throttling the producer ID allocation rate > -> Same concern as the solution #1. > > 3. Having a limit to the number of active producer IDs (sort of like an > LRU cache) > -> The idea here is that if we hit a misconfigured client, we will expire > the older entries. The concern here is we have risks to lose idempotency > guarantees, and currently, we don't have a way to notify clients about > losing idempotency guarantees. Besides, the least recently used entries > got removed are not always from the "bad" clients. > > 4. allow clients to "close" the producer ID usage > -> We can provide a way for producer to "close" producerID usage. > Currently, we only have a way to INIT_PRODUCER_ID requested to allocate > one. After that, we'll keep the producer ID metadata in broker even if the > producer is "closed". Having a closed API (ex: END_PRODUCER_ID), we can > remove the entry from broker side. In client side, we can send it when > producer closing. The concern is, the old clients (including non-java > clients) will still suffer from the OOM issue. > > 5. limit/throttling the producer id based on the principle > -> Although we can limit the impact to a certain principle with this idea, > same concern still exists as solution #1 #2. > > Any thoughts/feedback are welcomed. > > Thank you. > Luke >