Re: [DISCUSS] KIP-82 - Add Record Headers

Dana Powers Fri, 07 Oct 2016 15:10:08 -0700

> I agree with the critique of compaction not having a value. I think we should 
> consider fixing that directly.


Agree that the compaction issue is troubling: compacted "null" deletes
are incompatible w/ headers that must be packed into the message
value. Are there any alternatives on compaction delete semantics that
could address this? The KIP wiki discussion I think mostly assumes
that compaction-delete is what it is and can't be changed/fixed.

-Dana

On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com> wrote:
>
> Hi Jay,
>
> Thanks for the comments and feedback.
>
> I think its quite clear that if a problem keeps arising then it is clear that 
> it needs resolving, and addressing properly.
>
> Fair enough at linkedIn, and historically for the very first use cases 
> addressing this maybe not have been a big priority. But as Kafka is now 
> Apache open source and being picked up by many including my company, it is 
> clear and evident that this is a requirement and issue that needs to be now 
> addressed to address these needs.
>
> The fact in almost every transport mechanism including networking layers in 
> the enterprise ive worked in, there has always been headers i think clearly 
> shows their need and success for a transport mechanism.
>
> I understand some concerns with regards to impact for others not needing it.
>
> What we are proposing is flexible solution that provides no overhead on 
> storage or network traffic layers if you chose not to use headers, but does 
> enable those who need or want it to use it.
>
>
> On your response to 1), there is nothing saying that it should be put in any 
> faster or without diligence and the same KIP process can still apply for 
> adding kafka-scope headers, having headers, just makes it easier to add, 
> without constant message and record changes. Timestamp is a clear real 
> example of actually what should be in a header (along with other fields) but 
> as such the whole message/record object needed to be changed to add this, as 
> will any further headers deemed needed by kafka.
>
> On response to 2) why within my company as a platforms designer should i 
> enforce that all teams use the same serialization for their payloads? But 
> what i do need is some core cross cutting concerns and information addressed 
> at my platform level and i don't want to impose onto my development teams. 
> This is the same argument why byte[] is the exposed value and key because as 
> a messaging platform you dont want to impose that on my company.
>
> On response to 3) Actually this isnt true, there are many 3rd party tools, we 
> need to hook into our messaging flows that they only build onto standardised 
> interfaces as obviously the cost to have a custom implementation for every 
> company would be very high.
> APM tooling is a clear case in point, every enterprise level APM tool on the 
> market is able to stitch in transaction flow end 2 end over a platform over 
> http, jms because they can stitch in some "magic" data in a 
> uniform/standardised for the two mentioned they stitch this into the headers. 
> It is current form they cannot do this with Kafka. Providing a standardised 
> interface will i believe actually benefit the project as commercial companies 
> like these will now be able to plugin their tooling uniformly, making it 
> attractive and possible.
>
> Some of you other concerns as Joel mentions these are more implementation 
> details, that i think should be agreed upon, but i think can be addressed.
>
> e.g. re your concern on the hashmap.
> it is more than possible not to have every record have to have a hashmap 
> unless it actually has a header (just like we have managed to do on the 
> serialized meesage) so if theres a concern on the in memory record size for 
> those using kafka without headers.
>
> On your second to last comment about every team choosing their own format, 
> actually we do want this a little, as very first mentioned, no we don't want 
> a free for all, but some freedom to have different serialization has 
> different benefits and draw backs across our business. I can iterate these if 
> needed. One of the use case for headers provided by linkedIn on top of my KIP 
> even shows where headers could be beneficial here as a header could be used 
> to detail which data format the message is serialized to allowing me to 
> consume different formats.
>
> Also we have some systems that we need to integrate that pretty near 
> impossible to wrap or touch their binary payloads, or we’re not allowed to 
> touch them (historic system, or inter/intra corporate)
>
> Headers really gives as a solution to provide a pluggable platform, and 
> standardisation that allows users to build platforms that adapt to their 
> needs.
>
>
> Cheers
> Mike
>
>
> ________________________________________
> From: Jay Kreps <j...@confluent.io>
> Sent: Friday, October 7, 2016 4:45 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Hey guys,
>
> This discussion has come up a number of times and we've always passed.
>
> One of things that has helped keep Kafka simple is not adding in new
> abstractions and concepts except when the proposal is really elegant and
> makes things simpler.
>
> Consider three use cases for headers:
>
>    1. Kafka-scope: We want to add a feature to Kafka that needs a
>    particular field.
>    2. Company-scope: You want to add a header to be shared by everyone in
>    your company.
>    3. World-wide scope: You are building a third party tool and want to add
>    some kind of header.
>
> For the case of (1) you should not use headers, you should just add a field
> to the record format. Having a second way of encoding things doesn't make
> sense. Occasionally people have complained that adding to the record format
> is hard and it would be nice to just shove lots of things in quickly. I
> think a better solution would be to make it easy to add to the record
> format, and I think we've made progress on that. I also think we should be
> insanely focused on the simplicity of the abstraction and not adding in new
> thingies often---we thought about time for years before adding a timestamp
> and I guarantee you we would have goofed it up if we'd gone with the
> earlier proposals. These things end up being long term commitments so it's
> really worth being thoughtful.
>
> For case (2) just use the body of the message. You don't need a globally
> agreed on definition of headers, just standardize on a header you want to
> include in the value in your company. Since this is just used by code in
> your company having a more standard header format doesn't really help you.
> In fact by using something like Avro you can define exactly the types you
> want, the required header fields, etc.
>
> The only case that headers help is (3). This is a bit of a niche case and i
> think is easily solved just making the reading and writing of given
> required fields pluggable to work with the header you have.
>
> A couple of specific problems with this proposal:
>
>    1. A global registry of numeric keys is super super ugly. This seems
>    silly compared to the Avro (or whatever) header solution which gives more
>    compact encoding, rich types, etc.
>    2. Using byte arrays for header values means they aren't really
>    interoperable for case (3). E.g. I can't make a UI that displays headers,
>    or allow you to set them in config. To work with third party headers, the
>    only case I think this really helps, you need the union of all
>    serialization schemes people have used for any tool.
>    3. For case (2) and (3) your key numbers are going to collide like
>    crazy. I don't think a global registry of magic numbers maintained either
>    by word of mouth or checking in changes to kafka source is the right thing
>    to do.
>    4. We are introducing a new serialization primitive which makes fields
>    disappear conditional on the contents of other fields. This breaks the
>    whole serialization/schema system we have today.
>    5. We're adding a hashmap to each record
>    6. This proposes making the ProducerRecord and ConsumerRecord mutable
>    and adding setters and getters (which we try to avoid).
>
> For context on LinkedIn: I set up the system there, but it may have changed
> since i left. The header is maintained with the record schemas in the avro
> schema registry and is required for all records. Essentially all messages
> must have a field named "header" of type EventHeader which is itself a
> record schema with a handful of fields (time, host, etc). The header
> follows the same compatibility rules as other avro fields, so it can be
> evolved in a compatible way gradually across apps. Avro is typed and
> doesn't require deserializing the full record to read the header. The
> header information is (timestamp, host, etc) is important and needs to
> propagate into other systems like Hadoop which don't have a concept of
> headers for records, so I doubt it could move out of the value in any case.
> Not allowing teams to chose a data format other than avro was considered a
> feature, not a bug, since the whole point was to be able to share data,
> which doesn't work if every team chooses their own format.
>
> I agree with the critique of compaction not having a value. I think we
> should consider fixing that directly.
>
> -Jay
>
> On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com>
> wrote:
>
>> Hi All,
>>
>>
>> I would like to discuss the following KIP proposal:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 82+-+Add+Record+Headers
>>
>>
>>
>> I have some initial ?drafts of roughly the changes that would be needed.
>> This is no where finalized and look forward to the discussion especially as
>> some bits I'm personally in two minds about.
>>
>> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties
>>
>>
>>
>> Here is a link to a alternative option mentioned in the kip but one i
>> would personally would discard (disadvantages mentioned in kip)
>>
>> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full?
>>
>>
>> Thanks
>>
>> Mike
>>
>>
>>
>>
>>
>> The information contained in this email is strictly confidential and for
>> the use of the addressee only, unless otherwise indicated. If you are not
>> the intended recipient, please do not read, copy, use or disclose to others
>> this message or any attachment. Please also notify the sender by replying
>> to this email or by telephone (+44(020 7896 0011) and then delete the email
>> and any copies of it. Opinions, conclusion (etc) that do not relate to the
>> official business of this company shall be understood as neither given nor
>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>> registered in England and Wales, company number 04008957) and IG Index
>> Limited (a company registered in England and Wales, company number
>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> Index Limited (register number 114059) are authorised and regulated by the
>> Financial Conduct Authority.
>>
> The information contained in this email is strictly confidential and for the 
> use of the addressee only, unless otherwise indicated. If you are not the 
> intended recipient, please do not read, copy, use or disclose to others this 
> message or any attachment. Please also notify the sender by replying to this 
> email or by telephone (+44(020 7896 0011) and then delete the email and any 
> copies of it. Opinions, conclusion (etc) that do not relate to the official 
> business of this company shall be understood as neither given nor endorsed by 
> it. IG is a trading name of IG Markets Limited (a company registered in 
> England and Wales, company number 04008957) and IG Index Limited (a company 
> registered in England and Wales, company number 01190902). Registered address 
> at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets 
> Limited (register number 195355) and IG Index Limited (register number 
> 114059) are authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to