I don't think a major release is a requirement for a default change for what it's worth. I do appreciate that there is a preference for not rocking the boat though. For a little bit of background here, the problem we have encountered in production since the idempotent producer became the default is OOM errors due to huge numbers of producerIds that had to be retained in the cache for 7 days. It is hard to prevent use cases from emerging where producers are used and discarded rapidly. We will be using a lower value for sure, but it would also be nice to reduce the likelihood for the community to see this problem. The benefit of the caching diminishes quickly over time since it is primarily meant to handle producer retry windows. I do not think there is much difference between 1 days and 7 days from an application perspective, but it is a huge difference for the broker's memory usage.
Best, Jason On Wed, Jul 27, 2022 at 2:57 AM Sagar <sagarmeansoc...@gmail.com> wrote: > Thanks Justine for the KIP. I think it might be better to document the > correlation between the new config and delivery.timeout.ms in the Public > Interfaces Description. > > Also, I agree with Luke that for now setting a default to -1 should be > good. We can look to switch to 1 day with major release. > > Thanks! > Sagar. > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <show...@gmail.com> wrote: > > > Hi Justine, > > > > Thanks for the KIP. > > I agree with you that we should try our best to keep backward > > compatibility, although our intention is to have lower producer id > > expiration timeout. > > So, I think we should keep default to -1 IMO. > > Maybe we change the default to 1 day in next major release (4.0)? > > > > Thank you. > > Luke > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan > > <jols...@confluent.io.invalid> > > wrote: > > > > > Thanks for taking a look Jason! > > > > > > I wondered if we wanted to have a smaller default but wasn't sure about > > the > > > compatibility story -- especially since there is the chance for > producer > > > IDs to expire silently. > > > I do think that 1 day is fairly reasonable. If I don't hear any > > conflicting > > > opinions I can go ahead and update the default. > > > > > > Justine > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson > > > <ja...@confluent.io.invalid> > > > wrote: > > > > > > > Hi Justine, > > > > > > > > Thanks for the KIP. Although I hate seeing new configurations, I > think > > > this > > > > is a good change. Combining these timeout behaviors into a single > > > > configuration was definitely a mistake, but we didn't anticipate the > > > > problem with the producer id cache. I do wonder if we can make the > > > default > > > > a bit lower to reduce the chances that users will hit the same memory > > > > issues we have seen. After decoupling this configuration from > > > > transactional.id.expiration.ms, the new timeout just needs to cover > > the > > > > longest duration that a producer might be retrying the same Produce > > > > request. 7 days seems too high. Although I think it could go a fair > > even > > > > lower, perhaps 1 day is a reasonable place to start? > > > > > > > > Thanks, > > > > Jason > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan > > > > <jols...@confluent.io.invalid> > > > > wrote: > > > > > > > > > Hey Bill, > > > > > Thanks! I was just going to say that hopefully > > > > > transactional.id.expiration.ms would also be over the delivery > > > timeout. > > > > :) > > > > > Thanks for the +1! > > > > > > > > > > Justine > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bbej...@gmail.com> > > wrote: > > > > > > > > > > > Hi Justine, > > > > > > > > > > > > I just took another look at the KIP, and I realize my > > > > question/suggestion > > > > > > about default values has already been addressed in the > > > `Compatibility` > > > > > > section. > > > > > > > > > > > > I'm +1 on the KIP. > > > > > > > > > > > > -Bill > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bbej...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hi Justine, > > > > > > > > > > > > > > Thanks for the well written KIP, this looks like it will be a > > > useful > > > > > > > addition. > > > > > > > > > > > > > > Overall the KIP looks good to me, I have one question/comment. > > > > > > > > > > > > > > You mentioned that setting the `producer.id.expiration.ms` > less > > > than > > > > > the > > > > > > > delivery timeout could lead to duplicates, which makes sense. > To > > > > help > > > > > > > avoid this situation, do we want to consider a default value > that > > > is > > > > > the > > > > > > > same as the delivery timeout? > > > > > > > > > > > > > > Thanks again for the KIP. > > > > > > > > > > > > > > Bill > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan > > > > > > > <jols...@confluent.io.invalid> wrote: > > > > > > > > > > > > > >> Hey all! > > > > > > >> > > > > > > >> I'd like to start a discussion on my proposal to separate > > > time-based > > > > > > >> producer ID expiration from transactional ID expiration by > > > > > introducing a > > > > > > >> new configuration. > > > > > > >> > > > > > > >> The KIP Is pretty small and simple, but will be helpful in > > > > controlling > > > > > > >> memory usage in brokers -- especially now that by default > > > producers > > > > > are > > > > > > >> idempotent and create producer ID state. > > > > > > >> > > > > > > >> Please take a look and leave any comments you may have! > > > > > > >> > > > > > > >> KIP: > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097 > > > > > > >> > > > > > > >> Thanks! > > > > > > >> Justine > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >