Hello Luis, Thanks for your thoughtful responses, here are my two cents:
1) I think having the new configs with per-topic granularity would not introduce much memory overhead or logic complexity, as all you need is to remember this at the topic metadata cache. If I've missed some details about the complexity, could you elaborate a bit more? 2) I agree with you: the current `ConfigDef.Validator` only scope on the validated config value itself, and hence cannot be dependent on another config. 4) I think Jun's point is that since we need the latest message in the log segment for the timestamp tracking, we cannot delete it actually: with offset based only policy, this is naturally guaranteed; but now with other policy, it is not guaranteed to never be compacted away, and hence we need to "enforce" to special-handle this case and not delete it. Guozhang Guozhang On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> wrote: > Hi all, > > Since there has been a rejuvenated interest in this KIP, it felt better to > downgrade it back down to [DISCUSSION], as we aren't really voting on it > anymore. > I'll try to address the currently pending questions on the following > points, so please bear with me while we go through them all: > > 1) Configuration: Topic vs Global > > Here we all agree that having a configuration per-topic is the best > option. However, this is not possible with the current design of the > compaction solution. Yes, it is true that "some" properties that relate to > compaction are configurable per-topic, but those are only the properties > that act outside(!) of the actual compaction logic, such as configuring the > start-compaction trigger with "ratio" or compaction duration with "lag.ms > ". > This logic can, of course, be re-designed to suit our wishes, but this is > not a direct effort, and if we have spent months arguing about the extra 8 > bytes per record, for sure we would spend another few dozen months > discussing the memory impact that doing this change to the properties will > invariably have. > As such, I will limit this KIP to ONLY have these configurations globally. > > 2) Configuration: Fail-fast vs Fallback > > > Ideally, I would also like to prevent the application to start if the > configuration is somehow invalid. > However (another 'however'), the way the configuration is handled prevents > adding dependencies between them, so we can't add logic that says > "configuration X is invalid if configuration Y is so-and-such". > Again, this can be re-designed to add this feature to the configuration > logic, but it would again be a big change just by itself, so this KIP is > again limited to use ONLY what is already in place. > > 3) Documenting the memory impact on the KIP > > This is now added to the KIP, though this topic is more complicated than > 'memory impact'. E.g.: this change doesn't translate to an actual memory > impact, it just means that the compaction will potentially encompass less > records per execution. > > 4) Documenting how we deal with the last message in the log > > I have 2 interpretations for this request: "the last message in the log" > or "the last message with a shared key on the log" > For the former: there is no change to the logic on how the last message is > handled. Only the "tail" gets compacted, so the "head" (which includes the > last message) still keeps the last message > > 5) Documenting how the key deletion will be handled > > I'm having some trouble understanding this one; do you mean how keys are > deleted in general, or? > > Cheers, > Luis Cabral > > On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao <j...@confluent.io> > wrote: > > Hi, Luis, > > Thanks for the reply. A few more comments below. > > 1. About the topic level configuration. It seems that it's useful for the > new configs to be at the topic level. Currently, the following configs > related to compaction are already at the topic level. > > min.cleanable.dirty.ratio > min.compaction.lag.ms > cleanup.policy > > 2. Have you documented the memory impact in the KIP? > > 3. Could you document how we deal with the last message in the log, which > is potentially cleanable now? > > 4. Could you document how key deletion will be handled? > > 10. As for Jason's proposal on CompactionStrategy, it does make the feature > more general. On the other hand, it will be useful not to require > user-level code if the compaction value only comes from the header. > > 20. "If compaction.strategy.header is chosen and compaction.strategy.header > is not set, the KIP falls back to offset." I am wondering if it's better to > just fail the configuration in the case. > > Jun > > > > On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Regarding "broker-agnostic of headers": there are some KIPs from Streams > to > > use headers for internal purposes as well, e.g. KIP-258 and KIP-213 (I > > admit there may be a conflict with user space, but practically I think it > > is very rare). So I think we are very likely going to make Kafka > internals > > to be "headers-aware" anyways. > > > > Regarding the general API: I think it is a good idea in general, but it > may > > still have limits: note that right now our KIP enforce a header type to > be > > long, and we have a very careful discussion about the fall-back policy if > > header does not have the specified key or if the value is not long-typed; > > but if we enforce long type version in the interface, it would require > > users trying to customizing their compaction logic (think: based on some > > value payload field) to transform their fields to long as well. So I feel > > we can still go with the current proposed approach, and only consider > this > > general API if we observe it does have a general usage requirement. By > that > > time we can still extend the config values of "log.cleaner.compaction. > > strategy" to "offset, timestamp, header, myFuncName". > > > > @Bertus > > > > Thanks for your feedback, I believe the proposed config is indeed for > both > > global (for the whole broker) and per-topic, Luís can confirm if this is > > the case, and update the wiki page to make it clear. > > > > > > Guozhang > > > > > > On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff < > > bgre...@microsoft.com.invalid> wrote: > > > > > I'm interested to know the status of this KIP. I see that the status > is > > > "Voting". How long does this normally take? > > > > > > We want to use Kafka and this KIP provides exactly the log compaction > > > logic that we want for many of our projects. > > > > > > One piece of feedback that I have is that log.cleaner.compaction. > > strategy > > > and log.cleaner.compaction.strategy.header needs to be per topic. The > > > text of the KIP makes it sound that the config is only available for > all > > > topics but this makes no sense. Different topics will need different > > > strategies and/or headers. > > > > > > From the KIP: > > > Provide the configuration for the individual topics > > > None of the configurations for log compaction are available at topic > > > level, so adding it there is not a part of this KIP > > > > > > > > > > > > On 2018/04/05 08:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote: > > > > Hello all,> > > > > Starting a discussion for this feature.> > > > > KIP-280 : https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 280% > > > 3A+Enhanced+log+compactionPull-4822 : https://github.com/apache/kafk > > > a/pull/4822> > > > > > > > Kind Regards,Luís> > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang