@Mayuresh

Yes exactly, it is a real nasty race issue.

This is why I look forward to being able to trash our custom workaround :)

Kostya


06.10.2016, 02:36, "Mayuresh Gharat" <gharatmayures...@gmail.com>:
> @Kostya
>
> Regarding "To get around this we have an awful *cough* solution whereby we
> have to send our message wrapper with the headers and null content, and
> then we have an application that has to consume from all the compacted
> topics and when it sees this message it produces back in a null payload
> record to make the broker compact it out."
>
>  ---> This has a race condition, right?
>
> Suppose the producer produces a message with headers and null content at
> time To to Kafka.
>
> Then the producer, at time To + 1, sends another message with headers and
> actual content to Kafka.
>
> What we expect is that the application that is consuming and then producing
> same message with null payload should happen at time To + 0.5, so that the
> message at To + 1 is not deleted.
>
> But there is no guarantee here.
>
> If the null payload goes in to Kafka at time To + 2, then essentially you
> loose the second message produced by the producer at time To + 1.
>
> Thanks,
>
> Mayuresh
>
> On Wed, Oct 5, 2016 at 6:13 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
>>  @Nacho
>>
>>  > > - Brokers can't see the headers (part of the "V" black box)>
>>  >
>>
>>  > (Also, it would be nice if we had a way to access the headers from the
>>  > > brokers, something that is not trivial at this time with the current
>>  > broker
>>  > > architecture).
>>  >
>>  >
>>
>>  I think this can be addressed with broker interceptors which we touched on
>>  in KIP-42
>>  <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>  42%3A+Add+Producer+and+Consumer+Interceptors>
>>  .
>>
>>  @Gwen
>>
>>  You are right that the wrapper thingy “works”, but there are some drawbacks
>>  that Nacho and Radai have covered in detail that I can add a few more
>>  comments to.
>>
>>  At LinkedIn, we *get by* without the proposed Kafka record headers by
>>  dumping such metadata in one or two places:
>>
>>     - Most of our applications use Avro, so for the most part we can use an
>>     explicit header field in the Avro schema. Topic owners are supposed to
>>     include this header in their schemas.
>>     - A prefix to the payload that primarily contains the schema’s ID so we
>>     can deserialize the Avro. (We could use this for other use-cases as
>>  well -
>>     i.e., move some of the above into this prefix blob.)
>>
>>  Dumping headers in the Avro schema pollutes the application’s data model
>>  with data/service-infra-related fields that are unrelated to the underlying
>>  topic; and forces the application to deserialize the entire blob whether or
>>  not the headers are actually used. Conversely from an infrastructure
>>  perspective, we would really like to not touch any application data. Our
>>  infiltration of the application’s schema is a major reason why many at
>>  LinkedIn sometimes assume that we (Kafka folks) are the shepherds for all
>>  things Avro :)
>>
>>  Another drawback is that all this only works if everyone in the
>>  organization is a good citizen and includes the header; and uses our
>>  wrapper libraries - which is a good practice IMO - but may not always be
>>  easy for open source projects that wish to directly use the Apache
>>  producer/client. If instead we allow these headers to be inserted via
>>  suitable interceptors outside the application payloads it would remove such
>>  issues of separation in the data model and choice of clients.
>>
>>  Radai has enumerated a number of use-cases
>>  <https://cwiki.apache.org/confluence/display/KAFKA/A+
>>  Case+for+Kafka+Headers>
>>  and
>>  I’m sure the broader community will have a lot more to add. The feature as
>>  such would enable an ecosystem of plugins from different vendors that users
>>  can mix and match in their data pipelines without requiring any specific
>>  payload formats or client libraries.
>>
>>  Thanks,
>>
>>  Joel
>>
>>  > >
>>  > >
>>  > > On Wed, Oct 5, 2016 at 2:20 PM, Gwen Shapira <g...@confluent.io>
>>  wrote:
>>  > >
>>  > > > Since LinkedIn has some kind of wrapper thingy that adds the headers,
>>  > > > where they could have added them to Apache Kafka - I'm very curious
>>  to
>>  > > > hear what drove that decision and the pros/cons of managing the
>>  > > > headers outside Kafka itself.
>>  > > >
>>  >
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125

Reply via email to