Re: [DISCUSS] KIP-82 - Add Record Headers

James Cheng Wed, 02 Nov 2016 16:26:11 -0700

> On Nov 2, 2016, at 2:33 AM, Michael Pearce <michael.pea...@ig.com> wrote:
> 
> Thanks James for taking the time out.
> 
> My comments per solution below you commented about. (I note you didn’t 
> comment on the 3rd at all , which is the current proposal in the kip)
> 1) 
>     a. This forces all clients to have distinct knowledge of platform level 
> implementation detail 
>     b. enforces single serialization technology for all apps payloads and 
> platform headers
>         i. what if apps need to have different serialization e.g. app team 
> need to use XML for legacy system reasons but we force at a platform to have 
> to use avro because of our headers
>     c. If we were to have a common Kafka solution, this would force everyone 
> onto a single serialization solution, I think this is something we don’t want 
> to do?
>     d. this doesn’t deal with having large payloads as you’ve mentioned http 
> in second solution, think of MIME multipart.
>     e. End2End encryption, if apps need end2end encryption then platform 
> tooling cannot read the header information without decoding the message that 
> then breaks reasons for having e2e encryption.
> 2) 
>     a. Container is the solution we currently use (we don’t use MIME but it 
> looks like a not bad choice if you don’t care about size, or you have big 
> enough payloads its small overhead)
>         i. I think if we don’t go with adding the headers to the message and 
> offset , having an common agreed container format is the next best offering.
>     b. The TiVO specific HTTP MIME type message is indeed a good solution in 
> our view
>         i. Deals with separating headers and payload
>         ii. Allows multipart messaging


How exactly would this work? Or maybe that's out of scope for this email.

>         iii. Allows payload to be encrypted yet headers not
>         iv. Platform tooling doesn’t care about payload and can quickly read 
> headers
>         v. Well established and known container solution
>     c. HTTP MIME type headers (String keys) has a large byte overhead though
>         i. See Nacho’s and Radai’s previous points on this
>     d. If we agree on say a container format being MIME how does a platform 
> team integrate adding its needed headers without enforcing all teams to have 
> to be aware of it? Or is this actually ok?
>         i. Would we make a new consumer and producer Kafka API that is 
> container aware?

I don't think we need to change the existing consumer/producer. I think this is 
simply a new serialization format. If a platform team wanted to use this, they 
would create a serializer/deserializer that would perform this serialization. 
It would be an instance of 
org.apache.kafka.common.serialization.Serializer/Deserializer. They would have 
to get the entire org to move over to this. And they may wrap the 
producer/consumer library to use this serializer, in order to have a 
centralized place to add headers. I see this as similar to what Confluent has 
done with io.confluent.kafka.serializers.KafkaAvroSerializer

I'm pretty sure LinkedIn has wrappers as well as serializers/deserializers that 
implement their existing solution. LinkedIn might even be able to change their 
implementation to do this the container way, and it might be transparent to 
their producers/consumers. Maybe.

>     e. How would this work with the likes of Kafka Streams , where as a 
> platform team we want to add some meta data needed to ever message but we 
> don’t want to recode these frameworks.

Same answer as above. I think this is just a serialization format. You would 
use Kafka Streams, but would provide your own serializer/deserializer. Same 
thing applies to Kafka Connect.

-James

> 
> 
> On 10/29/16, 8:09 AM, "James Cheng" <wushuja...@gmail.com> wrote:
> 
>    Let me talk about the container format that we are using here at TiVo to 
> add headers to our Kafka messages.
> 
>    Just some quick terminology, so that I don't confuse everyone.
>    I'm going to use "message body" to refer to the thing returned by 
> ConsumerRecord.value()
>    And I'm going to use "payload" to refer to your data after it has been 
> serialized into bytes.
> 
>    To recap, during the KIP call, we talked about 3 ways to have headers in 
> Kafka messages:
>    1) The message body is your payload, which has headers within it.
>    2) The message body is a container, which has headers in it as well your 
> payload.
>    3) Extend Kafka to hold headers outside of the message body. The message 
> body holds your payload.
> 
>    1) The message body is your payload, which has headers in it
>    -----------------------
>    Here's an example of what this may look like, if it were rendered in JSON:
> 
>    {
>        "headers" : {
>            "Host" : "host.domain.com",
>            "Service" : "PaymentProcessor",
>            "Timestamp" : "2016-10-28 12:45:56"
>        },
>        "Field1" : "value",
>        "Field2" : "value"
>    }
> 
>    In this scenario, headers are really not anything special. They are a part 
> of your payload. They may have been auto-included by some mechanism in all of 
> your schemas, but they really just are part of your payload. I believe 
> LinkedIn uses this mechanism. The "headers" field is a reserved word in all 
> schemas, and is somehow auto-inserted into all schemas. The headers schema 
> contains a couple fields like "host" and "service" and "timestamp". If 
> LinkedIn decides that a new field needs to be added for company-wide 
> infrastructure purposes, then they will add it to the schema of "headers", 
> and because "headers" is included everywhere, then all schemas will get 
> updated as well.
> 
>    Because they are simply part of your payload, you need to deserialize your 
> payload in order to read the headers.
> 
>    3) Extend Kafka to hold headers outside of the message body. The message 
> body holds your payload.
>    -------------
>    This is what this KIP is discussing. I will let others talk about this.
> 
>    2) The message body is a container, which has headers in it, as well as 
> your payload.
>    --------------
>    At TiVo, we have standardized on a container format that looks very 
> similar to HTTP. Let me jump straight to an example:
> 
>    ----- example below ----
>    JS/1 123 1024
>    Host: host.domain.com
>    Service: SomethingProcessor
>    Timestamp: 2016-10-28 12:45:56
>    ObjectTypeInPayload: MyObjectV1
> 
>    {
>        "Field1" : "value",
>        "Field2" : "value"
>    }
>    ----- example above ----
> 
>    Ignore the first line for now. Lines 2-5 are headers. Then there is a 
> blank line, and then after that is your payload.  The field 
> "ObjectTypeInPayload" describes what schema applies to the payload. In order 
> to decode your payload, you read the field "ObjectTypeInPayload" and use that 
> to decide how to decode the payload.
> 
>    Okay, let's talk about details.
>    The first line is "JS/1 123 1024". 
>    * The JS/1 thing is the "schema" of the container. JS is the container 
> type, 1 is the version number of the JS container. This particular version of 
> the JS container means those 4 specific headers are present, and that the 
> payload is encoded in JSON.
>    * The 123 is the length in bytes of the header section. (This particular 
> example probably isn't exactly 123 bytes)
>    * The 1024 is the length in bytes of the payload. (This particular example 
> probably isn't exactly 1024 bytes)
>    The 123 and 1024 allow the deserializer to quickly jump to the different 
> sections. I don't know how necessary they are. It's possible they are an over 
> optimization. They are kind of a holdover from a previous wireformat here at 
> TiVo where we were pipelining messages over TCP as one continuous bytestream  
> (NOT using Kafka), and we needed to be able to know where one object ended 
> and another started, and also be able to skip messages that we didn't care 
> about.
> 
>    Let me show another made up example of this container format being used:
> 
>    ---- example below ----
>    AV/1 123 1024
>    Host: host.domain.com
>    Service: SomethingProcessor
>    Timestamp: 2016-10-28 12:45:56
> 
>    0xFF
>    BYTESOFDATA
>    ---- example above ----
> 
>    This container is of type AV/1. This means that the payload is a magic 
> byte followed by a stream of bytes. The magic byte is schema registry ID 
> which is used to look up the schema, which is then used to decode the rest of 
> the bytes in the payload.
> 
>    Notice that this is a different use of the same container syntax. In this 
> case, the schema ID was a byte in the payload. In the JS/1 case, the schema 
> ID was stored in a header.
> 
>    Here is a more precise description of the container format:
>    ---- container format below ----
>    <tag><headers length><payload length>\r\n
>    header: value\r\n
>    header: value\r\n
>    \r\n
>    payload
>    ---- container format above ----
> 
>    As I mentioned above, the headers length and payload length might not be 
> necessary. You can also simply scan the message body until the first 
> occurence of \r\n\r\n
> 
>    Let's talk about pros/cons.
> 
>    Pros:
>    * Headers do not affect the payload. An addition of a header does not 
> effect the schema of the payload.
>    * Payload serialization can be different for different use cases. This 
> container format can carry a payload that is Avro or JSON or Thrift or 
> whatever. The payload is just a stream of bytes.
>    * Headers can be read without deserializing the payload
>    * Headers can have a schema. In the JS/1 case, I use "JS/1" to mean that 
> "There are 4 required fields. Host is a string, Service is a string, 
> Timestamp is a time in ISO(something) format, ObjectTypeInPayload is a 
> String, and the payload is in JSON"
>    * Plaintext headers with a relatively simple syntax is pretty easy to 
> parse in any programming language.
> 
>    Cons:
>    * Double serialization upon writes. In order to create the message body, 
> you first have to create your payload (which means you serialize your object 
> into an array of bytes) and then tack headers onto the front of it. And if 
> you do the optimization where your store the length of the payload, you 
> actually have to do it in this order. Which means you have to encode the 
> payload first and store the whole thing in memory before creating your 
> message body.
>    * Double deserialization upon reads. You *might* need to read the headers 
> so that you can figure out how to read the payload. It depends on how you use 
> the container. In the JS/1 case, I had to read the ObjectIdInPayload field in 
> order to deserialize the payload. However, in the AV/1 case, you did NOT have 
> to read any of the headers in order to deserialize the payload.
>    * What if I want my header values to be complex types? What if I wanted to 
> store a header where the value was an array?  Do I start relying on stuff 
> like comma-separated strings to indicate arrays? What if I wanted to store a 
> header where the value was binary bytes? Do I insist that headers all must be 
> ASCII encoded? I realize this conflicts with what I said above about headers 
> being easy to parse. Maybe they are actually more complex that I realized. 
>    * Size overhead of the container format and headers: If I have a 10 byte 
> payload, but my container is 512 bytes of ascii-encoded strings, is it worth 
> it?
> 
>    Alternatives:
>    * I can imagine doing something similar to the above, but using Avro as 
> the serialization format for the container. The avro schemas would be like 
> the following (apologies if I got those wrong, I actually haven't used avro)
> 
>    {
>        "type": "record", 
>        "name": "JS",
>        "fields" : [
>            {"name": "Host", "type" : "string"},
>            {"name": "Service", "type" : "string"},
>            {"name": "Timestamp", "type" : "double"},
>            {"name": "ObjectTypeInPayload", "type" : "string"},
>            {"name": "payload", "type": "bytes"}
>        ]
>    }
> 
>    {
>        "type": "record", 
>        "name": "AV",
>        "fields" : [
>            {"name": "Host", "type" : "string"},
>            {"name": "Service", "type" : "string"},
>            {"name": "Timestamp", "type" : "double"},
>            {"name": "payload", "type": "bytes"}
>        ]
>    }
> 
>    You would use avro to deserialize the container, and then potentially use 
> a different deserializer for the payload. Using avro would potentially reduce 
> the overhead of the container format, and let you use complex types in your 
> headers. However, this would mean people would still have to use avro for 
> deserializing a Kafka message body.
> 
>    Our experience using this at TiVo:
>    * We haven't run into any problems so far.
>    * We are not yet running Kafka in production, so we don't yet have a lot 
> of traffic running through our brokers.
>    * Even when we go to production, we expect that the amount of data that we 
> have will be relatively small compared to most companies. So we're hoping 
> that the overhead of the container format will be okay for our use cases.
> 
>    Phew, okay, that's enough for now. Let's discuss.
> 
>    -James
> 
>> On Oct 27, 2016, at 12:19 AM, James Cheng <wushuja...@gmail.com> wrote:
>> 
>> 
>>> On Oct 25, 2016, at 10:23 PM, Michael Pearce <michael.pea...@ig.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> In case you hadn't noticed re the compaction issue for non-null values i 
>>> have created a separate KIP-87, if you could all contribute to its 
>>> discussion would be much appreciated.
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag
>>> 
>>> Secondly, focussing back on KIP-82, one of the actions agreed from the KIP 
>>> call was for some additional alternative solution proposals on top of those 
>>> already detailed in the KIP wiki and subsequent linked wiki pages by others 
>>> in the group in the meeting.
>>> 
>>> I haven't seen any activity on this, does this mean there isn't any further 
>>> and everyone in hindsight actually thinks the current proposed solution in 
>>> the KIP is the front runner? (i assume this isn't the case, just want to 
>>> nudge everyone)
>>> 
>> 
>> I have been meaning to respond, but I haven't had the time. In the next 
>> couple days, I will try to write up the container format that TiVo is using, 
>> and we can discuss it.
>> 
>> -James
>> 
>>> Also just copying across the kip call thread to keep everything in one 
>>> thread to avoid a divergence of the discussion into multiple threads.
>>> 
>>> Cheers
>>> Mike
>>> 
>>> ________________________________________
>>> From: Mayuresh Gharat <gharatmayures...@gmail.com>
>>> Sent: Monday, October 24, 2016 6:17 PM
>>> To: dev@kafka.apache.org
>>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>>> 
>>> I agree with Nacho.
>>> +1 for the KIP.
>>> 
>>> Thanks,
>>> 
>>> Mayuresh
>>> 
>>> On Fri, Oct 21, 2016 at 11:46 AM, Nacho Solis <nso...@linkedin.com.invalid>
>>> wrote:
>>> 
>>>> I think a separate KIP is a good idea as well.  Note however that potential
>>>> decisions in this KIP could affect the other KIP.
>>>> 
>>>> Nacho
>>>> 
>>>> On Fri, Oct 21, 2016 at 10:23 AM, Jun Rao <j...@confluent.io> wrote:
>>>> 
>>>>> Michael,
>>>>> 
>>>>> Yes, doing a separate KIP to address the null payload issue for compacted
>>>>> topics is a good idea.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce <michael.pea...@ig.com>
>>>>> wrote:
>>>>> 
>>>>>> I had noted that what ever the solution having compaction based on null
>>>>>> payload was agreed isn't elegant.
>>>>>> 
>>>>>> Shall we raise another kip to : as discussed propose using a attribute
>>>>> bit
>>>>>> for delete/compaction flag as well/or instead of null value and
>>>> updating
>>>>>> compaction logic to look at that delelete/compaction attribute
>>>>>> 
>>>>>> I believe this is less contentious, so that at least we get that done
>>>>>> alleviating some concerns whilst the below gets discussed further?
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: Jun Rao <j...@confluent.io>
>>>>>> Sent: Wednesday, October 19, 2016 8:56:52 PM
>>>>>> To: dev@kafka.apache.org
>>>>>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>>>>>> 
>>>>>> The following are the notes from today's KIP discussion.
>>>>>> 
>>>>>> 
>>>>>> - KIP-82 - add record header: We agreed that there are use cases for
>>>>>> third-party vendors building tools around Kafka. We haven't reached
>>>>> the
>>>>>> conclusion whether the added complexity justifies the use cases. We
>>>>> will
>>>>>> follow up on the mailing list with use cases, container format
>>>> people
>>>>>> have
>>>>>> been using, and details on the proposal.
>>>>>> 
>>>>>> 
>>>>>> The video will be uploaded soon in https://cwiki.apache.org/
>>>>>> confluence/display/KAFKA/Kafka+Improvement+Proposals .
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>>> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao <j...@confluent.io> wrote:
>>>>>> 
>>>>>>> Hi, Everyone.,
>>>>>>> 
>>>>>>> We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am
>>>>> PST.
>>>>>>> If you plan to attend but haven't received an invite, please let me
>>>>> know.
>>>>>>> The following is the tentative agenda.
>>>>>>> 
>>>>>>> Agenda:
>>>>>>> KIP-82: add record header
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>> The information contained in this email is strictly confidential and
>>>> for
>>>>>> the use of the addressee only, unless otherwise indicated. If you are
>>>> not
>>>>>> the intended recipient, please do not read, copy, use or disclose to
>>>>> others
>>>>>> this message or any attachment. Please also notify the sender by
>>>> replying
>>>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>>>>> email
>>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to
>>>>> the
>>>>>> official business of this company shall be understood as neither given
>>>>> nor
>>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>>>> registered in England and Wales, company number 04008957) and IG Index
>>>>>> Limited (a company registered in England and Wales, company number
>>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
>>>> IG
>>>>>> Index Limited (register number 114059) are authorised and regulated by
>>>>> the
>>>>>> Financial Conduct Authority.
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Nacho (Ignacio) Solis
>>>> Kafka
>>>> nso...@linkedin.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -Regards,
>>> Mayuresh R. Gharat
>>> (862) 250-7125
>>> 
>>> 
>>> ________________________________________
>>> From: Michael Pearce <michael.pea...@ig.com>
>>> Sent: Monday, October 17, 2016 7:48 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>> 
>>> Hi Jun,
>>> 
>>> Sounds good.
>>> 
>>> Look forward to the invite.
>>> 
>>> Cheers,
>>> Mike
>>> ________________________________________
>>> From: Jun Rao <j...@confluent.io>
>>> Sent: Monday, October 17, 2016 5:55:57 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>> 
>>> Hi, Michael,
>>> 
>>> We do have online KIP discussion meetings from time to time. How about we
>>> discuss this KIP Wed (Oct 19) at 11:00am PST? I will send out an invite (we
>>> typically do the meeting through Zoom and will post the video recording to
>>> Kafka wiki).
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Wed, Oct 12, 2016 at 1:22 AM, Michael Pearce <michael.pea...@ig.com>
>>> wrote:
>>> 
>>>> @Jay and Dana
>>>> 
>>>> We have internally had a few discussions of how we may address this if we
>>>> had a common apache kafka message wrapper for headers that can be used
>>>> client side only to, and address the compaction issue.
>>>> I have detailed this solution separately and linked from the main KIP-82
>>>> wiki.
>>>> 
>>>> Here’s a direct link –
>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>> Headers+Value+Message+Wrapper
>>>> 
>>>> We feel this solution though doesn’t manage to address all the use cases
>>>> being mentioned still and also has some compatibility drawbacks e.g.
>>>> backwards forwards compatibility especially on different language clients
>>>> Also we still require with this solution, as still need to address
>>>> compaction issue / tombstones, we need to make server side changes and as
>>>> many message/record version changes.
>>>> 
>>>> We believe the proposed solution in KIP-82 does address all these needs
>>>> and is cleaner still, and more benefits.
>>>> Please have a read, and comment. Also if you have any improvements on the
>>>> proposed KIP-82 or an alternative solution/option your input is 
>>>> appreciated.
>>>> 
>>>> @All
>>>> As Joel has mentioned to get this moving along, and able to discuss more
>>>> fluidly, it would be great if we can organize to meet up virtually online
>>>> e.g. webex or something.
>>>> I am aware, that the majority are based in America, myself is in the UK.
>>>> @Kostya I assume you’re in Eastern Europe or Russia based on your email
>>>> address (please correct this assumption), I hope the time difference isn’t
>>>> too much that the below would suit you if you wish to join
>>>> 
>>>> Can I propose next Wednesday 19th October at 18:30 BST , 10:30 PST, 20:30
>>>> MSK we try meetup online?
>>>> 
>>>> Would this date/time suit the majority?
>>>> Also what is the preferred method? I can host via Adobe Connect style
>>>> webex (which my company uses) but it isn’t the best IMHO, so more than
>>>> happy to have someone suggest a better alternative.
>>>> 
>>>> Best,
>>>> Mike
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 10/8/16, 7:26 AM, "Michael Pearce" <michael.pea...@ig.com> wrote:
>>>> 
>>>>>> I agree with the critique of compaction not having a value. I think
>>>> we should consider fixing that directly.
>>>> 
>>>>> Agree that the compaction issue is troubling: compacted "null"
>>>> deletes
>>>>  are incompatible w/ headers that must be packed into the message
>>>>  value. Are there any alternatives on compaction delete semantics that
>>>>  could address this? The KIP wiki discussion I think mostly assumes
>>>>  that compaction-delete is what it is and can't be changed/fixed.
>>>> 
>>>>  This KIP is about dealing with quite a few use cases and issues,
>>>> please see both the KIP use cases detailed by myself and also the
>>>> additional use cases wiki added by LinkedIn linked from the main KIP.
>>>> 
>>>>  The compaction is something that happily is addressed with headers,
>>>> but most defiantly isn't the sole reason or use case for them, headers
>>>> solves many issues and use cases. Thus their elegance and simplicity, and
>>>> why they're so common in transport mechanisms and so succesfull, as stated
>>>> like http, tcp, jms.
>>>> 
>>>>  ________________________________________
>>>>  From: Dana Powers <dana.pow...@gmail.com>
>>>>  Sent: Friday, October 7, 2016 11:09 PM
>>>>  To: dev@kafka.apache.org
>>>>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>>> 
>>>>> I agree with the critique of compaction not having a value. I think
>>>> we should consider fixing that directly.
>>>> 
>>>>  Agree that the compaction issue is troubling: compacted "null" deletes
>>>>  are incompatible w/ headers that must be packed into the message
>>>>  value. Are there any alternatives on compaction delete semantics that
>>>>  could address this? The KIP wiki discussion I think mostly assumes
>>>>  that compaction-delete is what it is and can't be changed/fixed.
>>>> 
>>>>  -Dana
>>>> 
>>>>  On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com>
>>>> wrote:
>>>>> 
>>>>> Hi Jay,
>>>>> 
>>>>> Thanks for the comments and feedback.
>>>>> 
>>>>> I think its quite clear that if a problem keeps arising then it is
>>>> clear that it needs resolving, and addressing properly.
>>>>> 
>>>>> Fair enough at linkedIn, and historically for the very first use
>>>> cases addressing this maybe not have been a big priority. But as Kafka is
>>>> now Apache open source and being picked up by many including my company, it
>>>> is clear and evident that this is a requirement and issue that needs to be
>>>> now addressed to address these needs.
>>>>> 
>>>>> The fact in almost every transport mechanism including networking
>>>> layers in the enterprise ive worked in, there has always been headers i
>>>> think clearly shows their need and success for a transport mechanism.
>>>>> 
>>>>> I understand some concerns with regards to impact for others not
>>>> needing it.
>>>>> 
>>>>> What we are proposing is flexible solution that provides no overhead
>>>> on storage or network traffic layers if you chose not to use headers, but
>>>> does enable those who need or want it to use it.
>>>>> 
>>>>> 
>>>>> On your response to 1), there is nothing saying that it should be
>>>> put in any faster or without diligence and the same KIP process can still
>>>> apply for adding kafka-scope headers, having headers, just makes it easier
>>>> to add, without constant message and record changes. Timestamp is a clear
>>>> real example of actually what should be in a header (along with other
>>>> fields) but as such the whole message/record object needed to be changed to
>>>> add this, as will any further headers deemed needed by kafka.
>>>>> 
>>>>> On response to 2) why within my company as a platforms designer
>>>> should i enforce that all teams use the same serialization for their
>>>> payloads? But what i do need is some core cross cutting concerns and
>>>> information addressed at my platform level and i don't want to impose onto
>>>> my development teams. This is the same argument why byte[] is the exposed
>>>> value and key because as a messaging platform you dont want to impose that
>>>> on my company.
>>>>> 
>>>>> On response to 3) Actually this isnt true, there are many 3rd party
>>>> tools, we need to hook into our messaging flows that they only build onto
>>>> standardised interfaces as obviously the cost to have a custom
>>>> implementation for every company would be very high.
>>>>> APM tooling is a clear case in point, every enterprise level APM
>>>> tool on the market is able to stitch in transaction flow end 2 end over a
>>>> platform over http, jms because they can stitch in some "magic" data in a
>>>> uniform/standardised for the two mentioned they stitch this into the
>>>> headers. It is current form they cannot do this with Kafka. Providing a
>>>> standardised interface will i believe actually benefit the project as
>>>> commercial companies like these will now be able to plugin their tooling
>>>> uniformly, making it attractive and possible.
>>>>> 
>>>>> Some of you other concerns as Joel mentions these are more
>>>> implementation details, that i think should be agreed upon, but i think can
>>>> be addressed.
>>>>> 
>>>>> e.g. re your concern on the hashmap.
>>>>> it is more than possible not to have every record have to have a
>>>> hashmap unless it actually has a header (just like we have managed to do on
>>>> the serialized meesage) so if theres a concern on the in memory record size
>>>> for those using kafka without headers.
>>>>> 
>>>>> On your second to last comment about every team choosing their own
>>>> format, actually we do want this a little, as very first mentioned, no we
>>>> don't want a free for all, but some freedom to have different serialization
>>>> has different benefits and draw backs across our business. I can iterate
>>>> these if needed. One of the use case for headers provided by linkedIn on
>>>> top of my KIP even shows where headers could be beneficial here as a header
>>>> could be used to detail which data format the message is serialized to
>>>> allowing me to consume different formats.
>>>>> 
>>>>> Also we have some systems that we need to integrate that pretty near
>>>> impossible to wrap or touch their binary payloads, or we’re not allowed to
>>>> touch them (historic system, or inter/intra corporate)
>>>>> 
>>>>> Headers really gives as a solution to provide a pluggable platform,
>>>> and standardisation that allows users to build platforms that adapt to
>>>> their needs.
>>>>> 
>>>>> 
>>>>> Cheers
>>>>> Mike
>>>>> 
>>>>> 
>>>>> ________________________________________
>>>>> From: Jay Kreps <j...@confluent.io>
>>>>> Sent: Friday, October 7, 2016 4:45 PM
>>>>> To: dev@kafka.apache.org
>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>>>> 
>>>>> Hey guys,
>>>>> 
>>>>> This discussion has come up a number of times and we've always
>>>> passed.
>>>>> 
>>>>> One of things that has helped keep Kafka simple is not adding in new
>>>>> abstractions and concepts except when the proposal is really elegant
>>>> and
>>>>> makes things simpler.
>>>>> 
>>>>> Consider three use cases for headers:
>>>>> 
>>>>> 1. Kafka-scope: We want to add a feature to Kafka that needs a
>>>>> particular field.
>>>>> 2. Company-scope: You want to add a header to be shared by
>>>> everyone in
>>>>> your company.
>>>>> 3. World-wide scope: You are building a third party tool and want
>>>> to add
>>>>> some kind of header.
>>>>> 
>>>>> For the case of (1) you should not use headers, you should just add
>>>> a field
>>>>> to the record format. Having a second way of encoding things doesn't
>>>> make
>>>>> sense. Occasionally people have complained that adding to the record
>>>> format
>>>>> is hard and it would be nice to just shove lots of things in
>>>> quickly. I
>>>>> think a better solution would be to make it easy to add to the record
>>>>> format, and I think we've made progress on that. I also think we
>>>> should be
>>>>> insanely focused on the simplicity of the abstraction and not adding
>>>> in new
>>>>> thingies often---we thought about time for years before adding a
>>>> timestamp
>>>>> and I guarantee you we would have goofed it up if we'd gone with the
>>>>> earlier proposals. These things end up being long term commitments
>>>> so it's
>>>>> really worth being thoughtful.
>>>>> 
>>>>> For case (2) just use the body of the message. You don't need a
>>>> globally
>>>>> agreed on definition of headers, just standardize on a header you
>>>> want to
>>>>> include in the value in your company. Since this is just used by
>>>> code in
>>>>> your company having a more standard header format doesn't really
>>>> help you.
>>>>> In fact by using something like Avro you can define exactly the
>>>> types you
>>>>> want, the required header fields, etc.
>>>>> 
>>>>> The only case that headers help is (3). This is a bit of a niche
>>>> case and i
>>>>> think is easily solved just making the reading and writing of given
>>>>> required fields pluggable to work with the header you have.
>>>>> 
>>>>> A couple of specific problems with this proposal:
>>>>> 
>>>>> 1. A global registry of numeric keys is super super ugly. This
>>>> seems
>>>>> silly compared to the Avro (or whatever) header solution which
>>>> gives more
>>>>> compact encoding, rich types, etc.
>>>>> 2. Using byte arrays for header values means they aren't really
>>>>> interoperable for case (3). E.g. I can't make a UI that displays
>>>> headers,
>>>>> or allow you to set them in config. To work with third party
>>>> headers, the
>>>>> only case I think this really helps, you need the union of all
>>>>> serialization schemes people have used for any tool.
>>>>> 3. For case (2) and (3) your key numbers are going to collide like
>>>>> crazy. I don't think a global registry of magic numbers
>>>> maintained either
>>>>> by word of mouth or checking in changes to kafka source is the
>>>> right thing
>>>>> to do.
>>>>> 4. We are introducing a new serialization primitive which makes
>>>> fields
>>>>> disappear conditional on the contents of other fields. This
>>>> breaks the
>>>>> whole serialization/schema system we have today.
>>>>> 5. We're adding a hashmap to each record
>>>>> 6. This proposes making the ProducerRecord and ConsumerRecord
>>>> mutable
>>>>> and adding setters and getters (which we try to avoid).
>>>>> 
>>>>> For context on LinkedIn: I set up the system there, but it may have
>>>> changed
>>>>> since i left. The header is maintained with the record schemas in
>>>> the avro
>>>>> schema registry and is required for all records. Essentially all
>>>> messages
>>>>> must have a field named "header" of type EventHeader which is itself
>>>> a
>>>>> record schema with a handful of fields (time, host, etc). The header
>>>>> follows the same compatibility rules as other avro fields, so it can
>>>> be
>>>>> evolved in a compatible way gradually across apps. Avro is typed and
>>>>> doesn't require deserializing the full record to read the header. The
>>>>> header information is (timestamp, host, etc) is important and needs
>>>> to
>>>>> propagate into other systems like Hadoop which don't have a concept
>>>> of
>>>>> headers for records, so I doubt it could move out of the value in
>>>> any case.
>>>>> Not allowing teams to chose a data format other than avro was
>>>> considered a
>>>>> feature, not a bug, since the whole point was to be able to share
>>>> data,
>>>>> which doesn't work if every team chooses their own format.
>>>>> 
>>>>> I agree with the critique of compaction not having a value. I think
>>>> we
>>>>> should consider fixing that directly.
>>>>> 
>>>>> -Jay
>>>>> 
>>>>> On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <
>>>> michael.pea...@ig.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> 
>>>>>> I would like to discuss the following KIP proposal:
>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>> 82+-+Add+Record+Headers
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I have some initial ?drafts of roughly the changes that would be
>>>> needed.
>>>>>> This is no where finalized and look forward to the discussion
>>>> especially as
>>>>>> some bits I'm personally in two minds about.
>>>>>> 
>>>>>> https://github.com/michaelandrepearce/kafka/tree/
>>>> kafka-headers-properties
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Here is a link to a alternative option mentioned in the kip but one
>>>> i
>>>>>> would personally would discard (disadvantages mentioned in kip)
>>>>>> 
>>>>>> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full
>>>> ?
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Mike
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The information contained in this email is strictly confidential
>>>> and for
>>>>>> the use of the addressee only, unless otherwise indicated. If you
>>>> are not
>>>>>> the intended recipient, please do not read, copy, use or disclose
>>>> to others
>>>>>> this message or any attachment. Please also notify the sender by
>>>> replying
>>>>>> to this email or by telephone (+44(020 7896 0011) and then delete
>>>> the email
>>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate
>>>> to the
>>>>>> official business of this company shall be understood as neither
>>>> given nor
>>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a
>>>> company
>>>>>> registered in England and Wales, company number 04008957) and IG
>>>> Index
>>>>>> Limited (a company registered in England and Wales, company number
>>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate
>>>> Hill,
>>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355)
>>>> and IG
>>>>>> Index Limited (register number 114059) are authorised and regulated
>>>> by the
>>>>>> Financial Conduct Authority.
>>>>>> 
>>>>> The information contained in this email is strictly confidential and
>>>> for the use of the addressee only, unless otherwise indicated. If you are
>>>> not the intended recipient, please do not read, copy, use or disclose to
>>>> others this message or any attachment. Please also notify the sender by
>>>> replying to this email or by telephone (+44(020 7896 0011) and then delete
>>>> the email and any copies of it. Opinions, conclusion (etc) that do not
>>>> relate to the official business of this company shall be understood as
>>>> neither given nor endorsed by it. IG is a trading name of IG Markets
>>>> Limited (a company registered in England and Wales, company number
>>>> 04008957) and IG Index Limited (a company registered in England and Wales,
>>>> company number 01190902). Registered address at Cannon Bridge House, 25
>>>> Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number
>>>> 195355) and IG Index Limited (register number 114059) are authorised and
>>>> regulated by the Financial Conduct Authority.
>>>> 
>>>> 
>>> The information contained in this email is strictly confidential and for 
>>> the use of the addressee only, unless otherwise indicated. If you are not 
>>> the intended recipient, please do not read, copy, use or disclose to others 
>>> this message or any attachment. Please also notify the sender by replying 
>>> to this email or by telephone (+44(020 7896 0011) and then delete the email 
>>> and any copies of it. Opinions, conclusion (etc) that do not relate to the 
>>> official business of this company shall be understood as neither given nor 
>>> endorsed by it. IG is a trading name of IG Markets Limited (a company 
>>> registered in England and Wales, company number 04008957) and IG Index 
>>> Limited (a company registered in England and Wales, company number 
>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, 
>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG 
>>> Index Limited (register number 114059) are authorised and regulated by the 
>>> Financial Conduct Authority.
>> 
> 
> 
>

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to