Luis, What is the status of this KIP?
I tend to agree, that introducing the feature only globally, might be less useful (I would assume that people want to turn it on, on a per-topic basis). As I am not familiar with the corresponding code, I cannot judge the complexity to add topic level configs, however, it seems to be worth to include it in the KIP. -Matthias On 9/21/18 1:59 PM, Bertus Greeff wrote: > Someone pointed out to me that my scenario is also resolved by using Kafka > transactions. Zombie fencing which is essentially my scenario was one of the > scenarios that transactions were designed to solve. I was going to use the > ideas of this KIP to solve it but using transactions seems even better > because out of order messages never even make it into the topic. They are > blocked by the broker. > > -----Original Message----- > From: Guozhang Wang <wangg...@gmail.com> > Sent: Saturday, September 1, 2018 11:33 AM > To: dev <dev@kafka.apache.org> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Luis, > > Thanks for your thoughtful responses, here are my two cents: > > 1) I think having the new configs with per-topic granularity would not > introduce much memory overhead or logic complexity, as all you need is to > remember this at the topic metadata cache. If I've missed some details about > the complexity, could you elaborate a bit more? > > 2) I agree with you: the current `ConfigDef.Validator` only scope on the > validated config value itself, and hence cannot be dependent on another > config. > > 4) I think Jun's point is that since we need the latest message in the log > segment for the timestamp tracking, we cannot delete it actually: with offset > based only policy, this is naturally guaranteed; but now with other policy, > it is not guaranteed to never be compacted away, and hence we need to > "enforce" to special-handle this case and not delete it. > > > > Guozhang > > > > Guozhang > > > On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> > wrote: > >> Hi all, >> >> Since there has been a rejuvenated interest in this KIP, it felt >> better to downgrade it back down to [DISCUSSION], as we aren't really >> voting on it anymore. >> I'll try to address the currently pending questions on the following >> points, so please bear with me while we go through them all: >> >> 1) Configuration: Topic vs Global >> >> Here we all agree that having a configuration per-topic is the best >> option. However, this is not possible with the current design of the >> compaction solution. Yes, it is true that "some" properties that >> relate to compaction are configurable per-topic, but those are only >> the properties that act outside(!) of the actual compaction logic, >> such as configuring the start-compaction trigger with "ratio" or >> compaction duration with "lag.ms ". >> This logic can, of course, be re-designed to suit our wishes, but this >> is not a direct effort, and if we have spent months arguing about the >> extra 8 bytes per record, for sure we would spend another few dozen >> months discussing the memory impact that doing this change to the >> properties will invariably have. >> As such, I will limit this KIP to ONLY have these configurations globally. >> >> 2) Configuration: Fail-fast vs Fallback >> >> >> Ideally, I would also like to prevent the application to start if the >> configuration is somehow invalid. >> However (another 'however'), the way the configuration is handled >> prevents adding dependencies between them, so we can't add logic that >> says "configuration X is invalid if configuration Y is so-and-such". >> Again, this can be re-designed to add this feature to the >> configuration logic, but it would again be a big change just by >> itself, so this KIP is again limited to use ONLY what is already in place. >> >> 3) Documenting the memory impact on the KIP >> >> This is now added to the KIP, though this topic is more complicated >> than 'memory impact'. E.g.: this change doesn't translate to an actual >> memory impact, it just means that the compaction will potentially >> encompass less records per execution. >> >> 4) Documenting how we deal with the last message in the log >> >> I have 2 interpretations for this request: "the last message in the log" >> or "the last message with a shared key on the log" >> For the former: there is no change to the logic on how the last >> message is handled. Only the "tail" gets compacted, so the "head" >> (which includes the last message) still keeps the last message >> >> 5) Documenting how the key deletion will be handled >> >> I'm having some trouble understanding this one; do you mean how keys >> are deleted in general, or? >> >> Cheers, >> Luis Cabral >> >> On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao >> <j...@confluent.io> >> wrote: >> >> Hi, Luis, >> >> Thanks for the reply. A few more comments below. >> >> 1. About the topic level configuration. It seems that it's useful for >> the new configs to be at the topic level. Currently, the following >> configs related to compaction are already at the topic level. >> >> min.cleanable.dirty.ratio >> min.compaction.lag.ms >> cleanup.policy >> >> 2. Have you documented the memory impact in the KIP? >> >> 3. Could you document how we deal with the last message in the log, >> which is potentially cleanable now? >> >> 4. Could you document how key deletion will be handled? >> >> 10. As for Jason's proposal on CompactionStrategy, it does make the >> feature more general. On the other hand, it will be useful not to >> require user-level code if the compaction value only comes from the header. >> >> 20. "If compaction.strategy.header is chosen and >> compaction.strategy.header is not set, the KIP falls back to offset." >> I am wondering if it's better to just fail the configuration in the case. >> >> Jun >> >> >> >> On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang <wangg...@gmail.com> wrote: >> >>> Regarding "broker-agnostic of headers": there are some KIPs from >>> Streams >> to >>> use headers for internal purposes as well, e.g. KIP-258 and KIP-213 >>> (I admit there may be a conflict with user space, but practically I >>> think it is very rare). So I think we are very likely going to make >>> Kafka >> internals >>> to be "headers-aware" anyways. >>> >>> Regarding the general API: I think it is a good idea in general, but >>> it >> may >>> still have limits: note that right now our KIP enforce a header type >>> to >> be >>> long, and we have a very careful discussion about the fall-back >>> policy if header does not have the specified key or if the value is >>> not long-typed; but if we enforce long type version in the >>> interface, it would require users trying to customizing their >>> compaction logic (think: based on some value payload field) to >>> transform their fields to long as well. So I feel we can still go >>> with the current proposed approach, and only consider >> this >>> general API if we observe it does have a general usage requirement. >>> By >> that >>> time we can still extend the config values of "log.cleaner.compaction. >>> strategy" to "offset, timestamp, header, myFuncName". >>> >>> @Bertus >>> >>> Thanks for your feedback, I believe the proposed config is indeed >>> for >> both >>> global (for the whole broker) and per-topic, Luís can confirm if >>> this is the case, and update the wiki page to make it clear. >>> >>> >>> Guozhang >>> >>> >>> On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff < >>> bgre...@microsoft.com.invalid> wrote: >>> >>>> I'm interested to know the status of this KIP. I see that the >>>> status >> is >>>> "Voting". How long does this normally take? >>>> >>>> We want to use Kafka and this KIP provides exactly the log >>>> compaction logic that we want for many of our projects. >>>> >>>> One piece of feedback that I have is that log.cleaner.compaction. >>> strategy >>>> and log.cleaner.compaction.strategy.header needs to be per topic. >>>> The text of the KIP makes it sound that the config is only >>>> available for >> all >>>> topics but this makes no sense. Different topics will need >>>> different strategies and/or headers. >>>> >>>> From the KIP: >>>> Provide the configuration for the individual topics None of the >>>> configurations for log compaction are available at topic level, so >>>> adding it there is not a part of this KIP >>>> >>>> >>>> >>>> On 2018/04/05 08:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote: >>>>> Hello all,> >>>>> Starting a discussion for this feature.> >>>>> KIP-280 : >>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2 >>>>> Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-&data >>>>> =02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4b0a243108d6103 >>>>> 959ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6367142358482 >>>>> 30609&sdata=gtulffYXEDbf%2BNt1b1P%2FkGeDQvmcf2PZ9nQlo4oi%2Bo >>>>> Q%3D&reserved=0 >>> 280% >>>> 3A+Enhanced+log+compactionPull-4822 : >>>> 3A+Enhanced+log+https://na01.safelinks.protection.outlook.com/?url >>>> 3A+Enhanced+log+=https%3A%2F%2Fgithub.com%2Fapache%2Fkafk&data >>>> 3A+Enhanced+log+=02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4 >>>> 3A+Enhanced+log+b0a243108d6103959ed%7C72f988bf86f141af91ab2d7cd011 >>>> 3A+Enhanced+log+db47%7C1%7C0%7C636714235848230609&sdata=s5lASk >>>> 3A+Enhanced+log+fvK%2F21wmK8vsbdlXIXINfFFWZCtXzBLWhdCiA%3D&res >>>> 3A+Enhanced+log+erved=0 >>>> a/pull/4822> >>>> >>>>> Kind Regards,Luís> >>>> >>> >>> >>> >>> -- >>> -- Guozhang >>> >> > > > > -- > -- Guozhang >
signature.asc
Description: OpenPGP digital signature