Hi devs, As stated in the motivation section in KIP-854 <https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry>:
With idempotent producers becoming the default in Kafka, this means that unless otherwise specified, all new producers will be given producer IDs. Some (inefficient) applications may now create many non-transactional idempotent producers. Each of these producers will be assigned a producer ID and these IDs and their metadata are stored in the broker memory, which might cause brokers out of memory. Justine (in cc.) and I and some other team members are working on the solutions for this issue. But none of them solves it completely without side effects. Among them, "availability" VS "idempotency guarantees" is what we can't decide which to sacrifice. Some of these solutions sacrifice availability of produce (1,2,5) and others sacrifice idempotency guarantees (3). It could be useful to know if people generally have a preference one way or the other. Or what other better solutions there might be. Here are the proposals we came up with: 1. Limit the total active producer ID allocation number. -> This is the simplest solution. But since the OOM issue is usually caused by a rogue or misconfigured client, and this solution might "punish" the good client from sending messages. 2. Throttling the producer ID allocation rate -> Same concern as the solution #1. 3. Having a limit to the number of active producer IDs (sort of like an LRU cache) -> The idea here is that if we hit a misconfigured client, we will expire the older entries. The concern here is we have risks to lose idempotency guarantees, and currently, we don't have a way to notify clients about losing idempotency guarantees. Besides, the least recently used entries got removed are not always from the "bad" clients. 4. allow clients to "close" the producer ID usage -> We can provide a way for producer to "close" producerID usage. Currently, we only have a way to INIT_PRODUCER_ID requested to allocate one. After that, we'll keep the producer ID metadata in broker even if the producer is "closed". Having a closed API (ex: END_PRODUCER_ID), we can remove the entry from broker side. In client side, we can send it when producer closing. The concern is, the old clients (including non-java clients) will still suffer from the OOM issue. 5. limit/throttling the producer id based on the principle -> Although we can limit the impact to a certain principle with this idea, same concern still exists as solution #1 #2. Any thoughts/feedback are welcomed. Thank you. Luke