Someone pointed out to me that my scenario is also resolved by using Kafka transactions. Zombie fencing which is essentially my scenario was one of the scenarios that transactions were designed to solve. I was going to use the ideas of this KIP to solve it but using transactions seems even better because out of order messages never even make it into the topic. They are blocked by the broker.
-----Original Message----- From: Guozhang Wang <wangg...@gmail.com> Sent: Saturday, September 1, 2018 11:33 AM To: dev <dev@kafka.apache.org> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hello Luis, Thanks for your thoughtful responses, here are my two cents: 1) I think having the new configs with per-topic granularity would not introduce much memory overhead or logic complexity, as all you need is to remember this at the topic metadata cache. If I've missed some details about the complexity, could you elaborate a bit more? 2) I agree with you: the current `ConfigDef.Validator` only scope on the validated config value itself, and hence cannot be dependent on another config. 4) I think Jun's point is that since we need the latest message in the log segment for the timestamp tracking, we cannot delete it actually: with offset based only policy, this is naturally guaranteed; but now with other policy, it is not guaranteed to never be compacted away, and hence we need to "enforce" to special-handle this case and not delete it. Guozhang Guozhang On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> wrote: > Hi all, > > Since there has been a rejuvenated interest in this KIP, it felt > better to downgrade it back down to [DISCUSSION], as we aren't really > voting on it anymore. > I'll try to address the currently pending questions on the following > points, so please bear with me while we go through them all: > > 1) Configuration: Topic vs Global > > Here we all agree that having a configuration per-topic is the best > option. However, this is not possible with the current design of the > compaction solution. Yes, it is true that "some" properties that > relate to compaction are configurable per-topic, but those are only > the properties that act outside(!) of the actual compaction logic, > such as configuring the start-compaction trigger with "ratio" or > compaction duration with "lag.ms ". > This logic can, of course, be re-designed to suit our wishes, but this > is not a direct effort, and if we have spent months arguing about the > extra 8 bytes per record, for sure we would spend another few dozen > months discussing the memory impact that doing this change to the > properties will invariably have. > As such, I will limit this KIP to ONLY have these configurations globally. > > 2) Configuration: Fail-fast vs Fallback > > > Ideally, I would also like to prevent the application to start if the > configuration is somehow invalid. > However (another 'however'), the way the configuration is handled > prevents adding dependencies between them, so we can't add logic that > says "configuration X is invalid if configuration Y is so-and-such". > Again, this can be re-designed to add this feature to the > configuration logic, but it would again be a big change just by > itself, so this KIP is again limited to use ONLY what is already in place. > > 3) Documenting the memory impact on the KIP > > This is now added to the KIP, though this topic is more complicated > than 'memory impact'. E.g.: this change doesn't translate to an actual > memory impact, it just means that the compaction will potentially > encompass less records per execution. > > 4) Documenting how we deal with the last message in the log > > I have 2 interpretations for this request: "the last message in the log" > or "the last message with a shared key on the log" > For the former: there is no change to the logic on how the last > message is handled. Only the "tail" gets compacted, so the "head" > (which includes the last message) still keeps the last message > > 5) Documenting how the key deletion will be handled > > I'm having some trouble understanding this one; do you mean how keys > are deleted in general, or? > > Cheers, > Luis Cabral > > On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao > <j...@confluent.io> > wrote: > > Hi, Luis, > > Thanks for the reply. A few more comments below. > > 1. About the topic level configuration. It seems that it's useful for > the new configs to be at the topic level. Currently, the following > configs related to compaction are already at the topic level. > > min.cleanable.dirty.ratio > min.compaction.lag.ms > cleanup.policy > > 2. Have you documented the memory impact in the KIP? > > 3. Could you document how we deal with the last message in the log, > which is potentially cleanable now? > > 4. Could you document how key deletion will be handled? > > 10. As for Jason's proposal on CompactionStrategy, it does make the > feature more general. On the other hand, it will be useful not to > require user-level code if the compaction value only comes from the header. > > 20. "If compaction.strategy.header is chosen and > compaction.strategy.header is not set, the KIP falls back to offset." > I am wondering if it's better to just fail the configuration in the case. > > Jun > > > > On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Regarding "broker-agnostic of headers": there are some KIPs from > > Streams > to > > use headers for internal purposes as well, e.g. KIP-258 and KIP-213 > > (I admit there may be a conflict with user space, but practically I > > think it is very rare). So I think we are very likely going to make > > Kafka > internals > > to be "headers-aware" anyways. > > > > Regarding the general API: I think it is a good idea in general, but > > it > may > > still have limits: note that right now our KIP enforce a header type > > to > be > > long, and we have a very careful discussion about the fall-back > > policy if header does not have the specified key or if the value is > > not long-typed; but if we enforce long type version in the > > interface, it would require users trying to customizing their > > compaction logic (think: based on some value payload field) to > > transform their fields to long as well. So I feel we can still go > > with the current proposed approach, and only consider > this > > general API if we observe it does have a general usage requirement. > > By > that > > time we can still extend the config values of "log.cleaner.compaction. > > strategy" to "offset, timestamp, header, myFuncName". > > > > @Bertus > > > > Thanks for your feedback, I believe the proposed config is indeed > > for > both > > global (for the whole broker) and per-topic, Luís can confirm if > > this is the case, and update the wiki page to make it clear. > > > > > > Guozhang > > > > > > On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff < > > bgre...@microsoft.com.invalid> wrote: > > > > > I'm interested to know the status of this KIP. I see that the > > > status > is > > > "Voting". How long does this normally take? > > > > > > We want to use Kafka and this KIP provides exactly the log > > > compaction logic that we want for many of our projects. > > > > > > One piece of feedback that I have is that log.cleaner.compaction. > > strategy > > > and log.cleaner.compaction.strategy.header needs to be per topic. > > > The text of the KIP makes it sound that the config is only > > > available for > all > > > topics but this makes no sense. Different topics will need > > > different strategies and/or headers. > > > > > > From the KIP: > > > Provide the configuration for the individual topics None of the > > > configurations for log compaction are available at topic level, so > > > adding it there is not a part of this KIP > > > > > > > > > > > > On 2018/04/05 08:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote: > > > > Hello all,> > > > > Starting a discussion for this feature.> > > > > KIP-280 : > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2 > > > > Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-&data > > > > =02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4b0a243108d6103 > > > > 959ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6367142358482 > > > > 30609&sdata=gtulffYXEDbf%2BNt1b1P%2FkGeDQvmcf2PZ9nQlo4oi%2Bo > > > > Q%3D&reserved=0 > > 280% > > > 3A+Enhanced+log+compactionPull-4822 : > > > 3A+Enhanced+log+https://na01.safelinks.protection.outlook.com/?url > > > 3A+Enhanced+log+=https%3A%2F%2Fgithub.com%2Fapache%2Fkafk&data > > > 3A+Enhanced+log+=02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4 > > > 3A+Enhanced+log+b0a243108d6103959ed%7C72f988bf86f141af91ab2d7cd011 > > > 3A+Enhanced+log+db47%7C1%7C0%7C636714235848230609&sdata=s5lASk > > > 3A+Enhanced+log+fvK%2F21wmK8vsbdlXIXINfFFWZCtXzBLWhdCiA%3D&res > > > 3A+Enhanced+log+erved=0 > > > a/pull/4822> > > > > > > > Kind Regards,Luís> > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang