This KIP would definitely address a gap in the current functionality, where you currently can't have a tombstone with any associated content.
That said, I'd like to talk about use cases, to make sure that this is in fact useful. The KIP should be updated with whatever use cases we come up with. First of all, an observation: When we speak about log compaction, we typically think of "the latest message for a key is retained". In that respect, a delete tombstone (i.e. a message with a null payload) is treated the same as any other Kafka message: the latest message is retained. It doesn't matter whether the latest message is null, or if the latest message has actual content. In all cases, the last message is retained. The only way a delete tombstone is treated differently from other Kafka messages is that it automatically disappears after a while. The time of deletion is specified using delete.retention.ms. So what we're really talking about is, do we want to support messages in a log-compacted topic that auto-delete themselves after a while? In a thread from 2015, there was a discussion on first-class support of headers between Roger Hoover, Felix GV, Jun Rao, and I. See thread at https://groups.google.com/d/msg/confluent-platform/8xPbjyUE_7E/yQ1AeCufL_gJ <https://groups.google.com/d/msg/confluent-platform/8xPbjyUE_7E/yQ1AeCufL_gJ> . In that thread, Jun raised a good question that I didn't have a good answer for at the time: If a message is going to auto-delete itself after a while, how important was the message? That is, what information did the message contain that was important *for a while* but not so important that it needed to be kept around forever? Some use cases that I can think of: 1) Tracability. I would like to know who issued this delete tombstone. It might include the hostname, IP of the producer of the delete. 2) Timestamps. I would like to know when this delete was issued. This use case is already addressed by the availability of per-message timestamps that came in 0.10.0 3) Data provenance. I hope I'm using this phrase correctly, but what I mean is, where did this delete come from? What processing job emitted it? What input to the processing job caused this delete to be produced? For example, if a record in topic A was processed and caused a delete tombstone to be emitted to topic B, I might like the offset of the topic A message to be attached to the topic B message. 4) Distributed tracing for stream topologies. This might be a slight repeat of the above use cases. In the microservices world, we can generate call-graphs of webservices using tools like Zipkin/opentracing.io <http://opentracing.io/>, or something homegrown like https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency <https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency>. I can imagine that you might want to do something similar for stream processing topologies, where stream processing jobs carry along and forward along a globally unique identifier, and a distributed topology graph is generated. 5) Cases where processing a delete requires data that is not available in the message key. I'm not sure I have a good example of this, though. One hand-wavy example might be where I am publishing documents into Kafka where the documentId is the message key, and the text contents of the document are in the message body. And I have a consuming job that does some analytics on the message body. If that document gets deleted, then the consuming job might need the original message body in order to "delete" that message's impact from the analytics. But I'm not sure that is a great example. If the consumer was worried about that, the consumer would probably keep the original message around, stored by primary key. And then all it would need from a delete message would be the primary key of the message. Do people think these are valid use cases? What are other use cases that people can think of? -James > On Oct 26, 2016, at 3:46 PM, Mayuresh Gharat <gharatmayures...@gmail.com> > wrote: > > +1 @Joel. > I think a clear migration plan of upgrading and downgrading of server and > clients along with handling of issues that Joel mentioned, on the KIP would > be really great. > > Thanks, > > Mayuresh > > On Wed, Oct 26, 2016 at 3:31 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > >> I'm not sure why it would be useful, but it should be theoretically >> possible if the attribute bit alone is enough to mark a tombstone. OTOH, we >> could consider that as invalid if we wish. These are relevant details that >> I think should be added to the KIP. >> >> Also, in the few odd scenarios that I mentioned we should also consider >> that fetches could be coming from other yet-to-be-upgraded brokers in a >> cluster that is being upgraded. So we would probably want to continue to >> support nulls as tombstones or down-convert in a way that we are sure works >> with least surprise to fetchers. >> >> There is a slightly vague statement under "Compatibility, Deprecation, and >> Migration Plan" that could benefit more details: *Logic would base on >> current behavior of null value or if tombstone flag set to true, as such >> wouldn't impact any existing flows simply allow new producers to make use >> of the feature*. It is unclear to me based on that whether you would >> interpret null as a tombstone if the tombstone attribute bit is off. >> >> On Wed, Oct 26, 2016 at 3:10 PM, Xavier Léauté <xav...@confluent.io> >> wrote: >> >>> Does this mean that starting with V4 requests we would allow storing null >>> messages in compacted topics? The KIP should probably clarify the >> behavior >>> for null messages where the tombstone flag is not net. >>> >>> On Wed, Oct 26, 2016 at 1:32 AM Magnus Edenhill <mag...@edenhill.se> >>> wrote: >>> >>>> 2016-10-25 21:36 GMT+02:00 Nacho Solis <nso...@linkedin.com.invalid>: >>>> >>>>> I think you probably require a MagicByte bump if you expect correct >>>>> behavior of the system as a whole. >>>>> >>>>> From a client perspective you want to make sure that when you >> deliver a >>>>> message that the broker supports the feature you're expecting >>>>> (compaction). So, depending on the behavior of the broker on >>>> encountering >>>>> a previously undefined bit flag I would suggest making some change to >>>> make >>>>> certain that flag-based compaction is supported. I'm going to guess >>> that >>>>> the MagicByte would do this. >>>>> >>>> >>>> I dont believe this is needed since it is already attributed through >> the >>>> request's API version. >>>> >>>> Producer: >>>> * if a client sends ProduceRequest V4 then attributes.bit5 indicates a >>>> tombstone >>>> * if a clients sends ProduceRequest <V4 then attributes.bit5 is >> ignored >>>> and value==null indicates a tombstone >>>> * in both cases the on-disk messages are stored with attributes.bit5 >> (I >>>> assume?) >>>> >>>> Consumer: >>>> * if a clients sends FetchRequest V4 messages are sendfile():ed >> directly >>>> from disk (with attributes.bit5) >>>> * if a client sends FetchRequest <V4 messages are slowpathed and >>>> translated from attributes.bit5 to value=null as required. >>>> >>>> >>>> That's my understanding anyway, please correct me if I'm wrong. >>>> >>>> /Magnus >>>> >>>> >>>> >>>>> On Tue, Oct 25, 2016 at 10:17 AM, Magnus Edenhill < >> mag...@edenhill.se> >>>>> wrote: >>>>> >>>>>> It is safe to assume that a previously undefined attributes bit >> will >>> be >>>>>> unset in protocol requests from existing clients, if not, such a >>> client >>>>> is >>>>>> already violating the protocol and needs to be fixed. >>>>>> >>>>>> So I dont see a need for a MagicByte bump, both broker and client >> has >>>> the >>>>>> information it needs to construct or parse the message according to >>>>> request >>>>>> version. >>>>>> >>>>>> >>>>>> 2016-10-25 18:48 GMT+02:00 Michael Pearce <michael.pea...@ig.com>: >>>>>> >>>>>>> Hi Magnus, >>>>>>> >>>>>>> I was wondering if I even needed to change those also, as >>> technically >>>>>>> we’re just making use of a non used attribute bit, but im not >> 100% >>>> that >>>>>> it >>>>>>> be always false currently. >>>>>>> >>>>>>> If someone can say 100% it will already be set false with current >>> and >>>>>>> historic bit wise masking techniques used over the time, we could >>> do >>>>> away >>>>>>> with both, and simply just start to use it. Unfortunately I don’t >>>> have >>>>>> that >>>>>>> historic knowledge so was hoping it would be flagged up in this >>>>>> discussion >>>>>>> thread ☺ >>>>>>> >>>>>>> Cheers >>>>>>> Mike >>>>>>> >>>>>>> On 10/25/16, 5:36 PM, "Magnus Edenhill" <mag...@edenhill.se> >>> wrote: >>>>>>> >>>>>>> Hi Michael, >>>>>>> >>>>>>> With the version bumps for Produce and Fetch requests, do you >>>>> really >>>>>>> need >>>>>>> to bump MagicByte too? >>>>>>> >>>>>>> Regards, >>>>>>> Magnus >>>>>>> >>>>>>> >>>>>>> 2016-10-25 18:09 GMT+02:00 Michael Pearce < >>> michael.pea...@ig.com >>>>> : >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I would like to discuss the following KIP proposal: >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>> 87+-+Add+Compaction+Tombstone+Flag >>>>>>>> >>>>>>>> This is off the back of the discussion on KIP-82 / KIP >>> meeting >>>>>>> where it >>>>>>>> was agreed to separate this issue and feature. See: >>>>>>>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201610. >>>>>>>> mbox/%3cCAJS3ho8OcR==EcxsJ8OP99pD2hz=iiGecWsv- >>>>>>>> EZsBsNyDcKr=g...@mail.gmail.com%3e >>>>>>>> >>>>>>>> Thanks >>>>>>>> Mike >>>>>>>> >>>>>>>> The information contained in this email is strictly >>>> confidential >>>>>> and >>>>>>> for >>>>>>>> the use of the addressee only, unless otherwise indicated. >> If >>>> you >>>>>>> are not >>>>>>>> the intended recipient, please do not read, copy, use or >>>> disclose >>>>>> to >>>>>>> others >>>>>>>> this message or any attachment. Please also notify the >> sender >>>> by >>>>>>> replying >>>>>>>> to this email or by telephone (+44(020 7896 0011) and then >>>> delete >>>>>>> the email >>>>>>>> and any copies of it. Opinions, conclusion (etc) that do >> not >>>>> relate >>>>>>> to the >>>>>>>> official business of this company shall be understood as >>>> neither >>>>>>> given nor >>>>>>>> endorsed by it. IG is a trading name of IG Markets Limited >> (a >>>>>> company >>>>>>>> registered in England and Wales, company number 04008957) >> and >>>> IG >>>>>>> Index >>>>>>>> Limited (a company registered in England and Wales, company >>>>> number >>>>>>>> 01190902). Registered address at Cannon Bridge House, 25 >>>> Dowgate >>>>>>> Hill, >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register number >>>> 195355) >>>>>>> and IG >>>>>>>> Index Limited (register number 114059) are authorised and >>>>> regulated >>>>>>> by the >>>>>>>> Financial Conduct Authority. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> The information contained in this email is strictly confidential >>> and >>>>> for >>>>>>> the use of the addressee only, unless otherwise indicated. If you >>> are >>>>> not >>>>>>> the intended recipient, please do not read, copy, use or disclose >>> to >>>>>> others >>>>>>> this message or any attachment. Please also notify the sender by >>>>> replying >>>>>>> to this email or by telephone (+44(020 7896 0011) and then delete >>> the >>>>>> email >>>>>>> and any copies of it. Opinions, conclusion (etc) that do not >> relate >>>> to >>>>>> the >>>>>>> official business of this company shall be understood as neither >>>> given >>>>>> nor >>>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a >>> company >>>>>>> registered in England and Wales, company number 04008957) and IG >>>> Index >>>>>>> Limited (a company registered in England and Wales, company >> number >>>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate >>>> Hill, >>>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) >>> and >>>>> IG >>>>>>> Index Limited (register number 114059) are authorised and >> regulated >>>> by >>>>>> the >>>>>>> Financial Conduct Authority. >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nacho (Ignacio) Solis >>>>> Kafka >>>>> nso...@linkedin.com >>>>> >>>> >>> >> > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125