Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Thu, 06 Oct 2016 03:45:08 -0700

Hi All,

We have updated a sample version of an implementation of the KIP with 
Integer,Byte[] headers to aid discussion.
See:
https://github.com/michaelandrepearce/kafka/tree/headers


On the kip there are a few outstanding bits that I would like to have discussed 
and get some majority consensus on to get the KIP more complete.
I believe there is some differing on opinions on these though.

1)Should headers be ordered by the int key
2)Key Allocation – should the example be the actual proposed.

I will give my opinions on the two

1)Yes
2)Yes

===========================
On other notes:

@Kostya,

Thanks for the feedback. Its good seeing further people suffering the same 
issue that I haven’t already seen in stack over flow etc, giving more credence 
that this KIP has real user benefit and addresses many problems faced.

Indeed your issue on send we also have the same problem in IG with regards to 
needing to send headers with a null payload on compacting topic. We feel your 
pain here.
       • We handle this slightly differently but also with some nasty 
race/failure conditions still, currently make our producer wrapper detect this 
and creates two records one with our Message Wrapper with only headers followed 
by one with a null value. We have issues with if one record succeeds to send 
but the other doesn’t and handling this, as such it never compacts/clears out.
       • I agree this is a very bad position to be in and proves that message 
wrapper solutions doesn’t work, and one of the reasons why we want to add 
headers in this KIP ☺.

I think Radai answered your question about why Integer keys, over String based 
ones.

I agree with Radai that ordering keys whilst is an optimization, does bring 
some benefit broker side with not needing to read through all the headers.



Cheers
Mike


On 10/6/16, 2:36 AM, "Mayuresh Gharat" <gharatmayures...@gmail.com> wrote:

    @Kostya

    Regarding "To get around this we have an awful *cough* solution whereby we
    have to send our message wrapper with the headers and null content, and
    then we have an application that has to consume from all the compacted
    topics and when it sees this message it produces back in a null payload
    record to make the broker compact it out."

     ---> This has a race condition, right?

    Suppose the producer produces a message with headers and null content at
    time To to Kafka.

    Then the producer, at time To + 1,  sends another message with headers and
    actual content to Kafka.

    What we expect is that the application that is consuming and then producing
    same message with null payload should happen at time To + 0.5, so that the
    message at To + 1 is not deleted.

    But there is no guarantee here.

    If the null payload goes in to Kafka at time To + 2, then essentially you
    loose the second message produced by the producer at time To + 1.


    Thanks,

    Mayuresh

    On Wed, Oct 5, 2016 at 6:13 PM, Joel Koshy <jjkosh...@gmail.com> wrote:

    > @Nacho
    >
    > > > - Brokers can't see the headers (part of the "V" black box)>
    > >
    >
    >
    > > (Also, it would be nice if we had a way to access the headers from the
    > > > brokers, something that is not trivial at this time with the current
    > > broker
    > > > architecture).
    > >
    > >
    >
    > I think this can be addressed with broker interceptors which we touched on
    > in KIP-42
    > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > 42%3A+Add+Producer+and+Consumer+Interceptors>
    > .
    >
    > @Gwen
    >
    > You are right that the wrapper thingy “works”, but there are some 
drawbacks
    > that Nacho and Radai have covered in detail that I can add a few more
    > comments to.
    >
    > At LinkedIn, we *get by* without the proposed Kafka record headers by
    > dumping such metadata in one or two places:
    >
    >    - Most of our applications use Avro, so for the most part we can use an
    >    explicit header field in the Avro schema. Topic owners are supposed to
    >    include this header in their schemas.
    >    - A prefix to the payload that primarily contains the schema’s ID so we
    >    can deserialize the Avro. (We could use this for other use-cases as
    > well -
    >    i.e., move some of the above into this prefix blob.)
    >
    > Dumping headers in the Avro schema pollutes the application’s data model
    > with data/service-infra-related fields that are unrelated to the 
underlying
    > topic; and forces the application to deserialize the entire blob whether 
or
    > not the headers are actually used. Conversely from an infrastructure
    > perspective, we would really like to not touch any application data. Our
    > infiltration of the application’s schema is a major reason why many at
    > LinkedIn sometimes assume that we (Kafka folks) are the shepherds for all
    > things Avro :)
    >
    > Another drawback is that all this only works if everyone in the
    > organization is a good citizen and includes the header; and uses our
    > wrapper libraries - which is a good practice IMO - but may not always be
    > easy for open source projects that wish to directly use the Apache
    > producer/client. If instead we allow these headers to be inserted via
    > suitable interceptors outside the application payloads it would remove 
such
    > issues of separation in the data model and choice of clients.
    >
    > Radai has enumerated a number of use-cases
    > <https://cwiki.apache.org/confluence/display/KAFKA/A+
    > Case+for+Kafka+Headers>
    > and
    > I’m sure the broader community will have a lot more to add. The feature as
    > such would enable an ecosystem of plugins from different vendors that 
users
    > can mix and match in their data pipelines without requiring any specific
    > payload formats or client libraries.
    >
    > Thanks,
    >
    > Joel
    >
    >
    >
    > > >
    > > >
    > > > On Wed, Oct 5, 2016 at 2:20 PM, Gwen Shapira <g...@confluent.io>
    > wrote:
    > > >
    > > > > Since LinkedIn has some kind of wrapper thingy that adds the 
headers,
    > > > > where they could have added them to Apache Kafka - I'm very curious
    > to
    > > > > hear what drove that decision and the pros/cons of managing the
    > > > > headers outside Kafka itself.
    > > > >
    > >
    >



    --
    -Regards,
    Mayuresh R. Gharat
    (862) 250-7125


The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to