@Mayuresh Yes exactly, it is a real nasty race issue.
This is why I look forward to being able to trash our custom workaround :) Kostya 06.10.2016, 02:36, "Mayuresh Gharat" <gharatmayures...@gmail.com>: > @Kostya > > Regarding "To get around this we have an awful *cough* solution whereby we > have to send our message wrapper with the headers and null content, and > then we have an application that has to consume from all the compacted > topics and when it sees this message it produces back in a null payload > record to make the broker compact it out." > > ---> This has a race condition, right? > > Suppose the producer produces a message with headers and null content at > time To to Kafka. > > Then the producer, at time To + 1, sends another message with headers and > actual content to Kafka. > > What we expect is that the application that is consuming and then producing > same message with null payload should happen at time To + 0.5, so that the > message at To + 1 is not deleted. > > But there is no guarantee here. > > If the null payload goes in to Kafka at time To + 2, then essentially you > loose the second message produced by the producer at time To + 1. > > Thanks, > > Mayuresh > > On Wed, Oct 5, 2016 at 6:13 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > >> @Nacho >> >> > > - Brokers can't see the headers (part of the "V" black box)> >> > >> >> > (Also, it would be nice if we had a way to access the headers from the >> > > brokers, something that is not trivial at this time with the current >> > broker >> > > architecture). >> > >> > >> >> I think this can be addressed with broker interceptors which we touched on >> in KIP-42 >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> 42%3A+Add+Producer+and+Consumer+Interceptors> >> . >> >> @Gwen >> >> You are right that the wrapper thingy “works”, but there are some drawbacks >> that Nacho and Radai have covered in detail that I can add a few more >> comments to. >> >> At LinkedIn, we *get by* without the proposed Kafka record headers by >> dumping such metadata in one or two places: >> >> - Most of our applications use Avro, so for the most part we can use an >> explicit header field in the Avro schema. Topic owners are supposed to >> include this header in their schemas. >> - A prefix to the payload that primarily contains the schema’s ID so we >> can deserialize the Avro. (We could use this for other use-cases as >> well - >> i.e., move some of the above into this prefix blob.) >> >> Dumping headers in the Avro schema pollutes the application’s data model >> with data/service-infra-related fields that are unrelated to the underlying >> topic; and forces the application to deserialize the entire blob whether or >> not the headers are actually used. Conversely from an infrastructure >> perspective, we would really like to not touch any application data. Our >> infiltration of the application’s schema is a major reason why many at >> LinkedIn sometimes assume that we (Kafka folks) are the shepherds for all >> things Avro :) >> >> Another drawback is that all this only works if everyone in the >> organization is a good citizen and includes the header; and uses our >> wrapper libraries - which is a good practice IMO - but may not always be >> easy for open source projects that wish to directly use the Apache >> producer/client. If instead we allow these headers to be inserted via >> suitable interceptors outside the application payloads it would remove such >> issues of separation in the data model and choice of clients. >> >> Radai has enumerated a number of use-cases >> <https://cwiki.apache.org/confluence/display/KAFKA/A+ >> Case+for+Kafka+Headers> >> and >> I’m sure the broader community will have a lot more to add. The feature as >> such would enable an ecosystem of plugins from different vendors that users >> can mix and match in their data pipelines without requiring any specific >> payload formats or client libraries. >> >> Thanks, >> >> Joel >> >> > > >> > > >> > > On Wed, Oct 5, 2016 at 2:20 PM, Gwen Shapira <g...@confluent.io> >> wrote: >> > > >> > > > Since LinkedIn has some kind of wrapper thingy that adds the headers, >> > > > where they could have added them to Apache Kafka - I'm very curious >> to >> > > > hear what drove that decision and the pros/cons of managing the >> > > > headers outside Kafka itself. >> > > > >> > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125