Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Fri, 07 Oct 2016 13:39:07 -0700

Hi Jay,

Thanks for the comments and feedback.

I think its quite clear that if a problem keeps arising then it is clear that 
it needs resolving, and addressing properly.

Fair enough at linkedIn, and historically for the very first use cases 
addressing this maybe not have been a big priority. But as Kafka is now Apache 
open source and being picked up by many including my company, it is clear and 
evident that this is a requirement and issue that needs to be now addressed to 
address these needs.

The fact in almost every transport mechanism including networking layers in the 
enterprise ive worked in, there has always been headers i think clearly shows 
their need and success for a transport mechanism.

I understand some concerns with regards to impact for others not needing it.

What we are proposing is flexible solution that provides no overhead on storage 
or network traffic layers if you chose not to use headers, but does enable 
those who need or want it to use it.

On your response to 1), there is nothing saying that it should be put in any 
faster or without diligence and the same KIP process can still apply for adding 
kafka-scope headers, having headers, just makes it easier to add, without 
constant message and record changes. Timestamp is a clear real example of 
actually what should be in a header (along with other fields) but as such the 
whole message/record object needed to be changed to add this, as will any 
further headers deemed needed by kafka.

On response to 2) why within my company as a platforms designer should i 
enforce that all teams use the same serialization for their payloads? But what 
i do need is some core cross cutting concerns and information addressed at my 
platform level and i don't want to impose onto my development teams. This is 
the same argument why byte[] is the exposed value and key because as a 
messaging platform you dont want to impose that on my company.

On response to 3) Actually this isnt true, there are many 3rd party tools, we 
need to hook into our messaging flows that they only build onto standardised 
interfaces as obviously the cost to have a custom implementation for every 
company would be very high.
APM tooling is a clear case in point, every enterprise level APM tool on the 
market is able to stitch in transaction flow end 2 end over a platform over 
http, jms because they can stitch in some "magic" data in a 
uniform/standardised for the two mentioned they stitch this into the headers. 
It is current form they cannot do this with Kafka. Providing a standardised 
interface will i believe actually benefit the project as commercial companies 
like these will now be able to plugin their tooling uniformly, making it 
attractive and possible.

Some of you other concerns as Joel mentions these are more implementation 
details, that i think should be agreed upon, but i think can be addressed.

e.g. re your concern on the hashmap.
it is more than possible not to have every record have to have a hashmap unless 
it actually has a header (just like we have managed to do on the serialized 
meesage) so if theres a concern on the in memory record size for those using 
kafka without headers.

On your second to last comment about every team choosing their own format, 
actually we do want this a little, as very first mentioned, no we don't want a 
free for all, but some freedom to have different serialization has different 
benefits and draw backs across our business. I can iterate these if needed. One 
of the use case for headers provided by linkedIn on top of my KIP even shows 
where headers could be beneficial here as a header could be used to detail 
which data format the message is serialized to allowing me to consume different 
formats.

Also we have some systems that we need to integrate that pretty near impossible 
to wrap or touch their binary payloads, or we’re not allowed to touch them 
(historic system, or inter/intra corporate)

Headers really gives as a solution to provide a pluggable platform, and 
standardisation that allows users to build platforms that adapt to their needs.

Cheers
Mike

________________________________________
From: Jay Kreps <j...@confluent.io>
Sent: Friday, October 7, 2016 4:45 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

Hey guys,

This discussion has come up a number of times and we've always passed.

One of things that has helped keep Kafka simple is not adding in new
abstractions and concepts except when the proposal is really elegant and
makes things simpler.

Consider three use cases for headers:

   1. Kafka-scope: We want to add a feature to Kafka that needs a
   particular field.
   2. Company-scope: You want to add a header to be shared by everyone in
   your company.
   3. World-wide scope: You are building a third party tool and want to add
   some kind of header.

For the case of (1) you should not use headers, you should just add a field
to the record format. Having a second way of encoding things doesn't make
sense. Occasionally people have complained that adding to the record format
is hard and it would be nice to just shove lots of things in quickly. I
think a better solution would be to make it easy to add to the record
format, and I think we've made progress on that. I also think we should be
insanely focused on the simplicity of the abstraction and not adding in new
thingies often---we thought about time for years before adding a timestamp
and I guarantee you we would have goofed it up if we'd gone with the
earlier proposals. These things end up being long term commitments so it's
really worth being thoughtful.

For case (2) just use the body of the message. You don't need a globally
agreed on definition of headers, just standardize on a header you want to
include in the value in your company. Since this is just used by code in
your company having a more standard header format doesn't really help you.
In fact by using something like Avro you can define exactly the types you
want, the required header fields, etc.

The only case that headers help is (3). This is a bit of a niche case and i
think is easily solved just making the reading and writing of given
required fields pluggable to work with the header you have.

A couple of specific problems with this proposal:

   1. A global registry of numeric keys is super super ugly. This seems
   silly compared to the Avro (or whatever) header solution which gives more
   compact encoding, rich types, etc.
   2. Using byte arrays for header values means they aren't really
   interoperable for case (3). E.g. I can't make a UI that displays headers,
   or allow you to set them in config. To work with third party headers, the
   only case I think this really helps, you need the union of all
   serialization schemes people have used for any tool.
   3. For case (2) and (3) your key numbers are going to collide like
   crazy. I don't think a global registry of magic numbers maintained either
   by word of mouth or checking in changes to kafka source is the right thing
   to do.
   4. We are introducing a new serialization primitive which makes fields
   disappear conditional on the contents of other fields. This breaks the
   whole serialization/schema system we have today.
   5. We're adding a hashmap to each record
   6. This proposes making the ProducerRecord and ConsumerRecord mutable
   and adding setters and getters (which we try to avoid).

For context on LinkedIn: I set up the system there, but it may have changed
since i left. The header is maintained with the record schemas in the avro
schema registry and is required for all records. Essentially all messages
must have a field named "header" of type EventHeader which is itself a
record schema with a handful of fields (time, host, etc). The header
follows the same compatibility rules as other avro fields, so it can be
evolved in a compatible way gradually across apps. Avro is typed and
doesn't require deserializing the full record to read the header. The
header information is (timestamp, host, etc) is important and needs to
propagate into other systems like Hadoop which don't have a concept of
headers for records, so I doubt it could move out of the value in any case.
Not allowing teams to chose a data format other than avro was considered a
feature, not a bug, since the whole point was to be able to share data,
which doesn't work if every team chooses their own format.

I agree with the critique of compaction not having a value. I think we
should consider fixing that directly.

-Jay

On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com>
wrote:

> Hi All,
>
>
> I would like to discuss the following KIP proposal:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 82+-+Add+Record+Headers
>
>
>
> I have some initial ?drafts of roughly the changes that would be needed.
> This is no where finalized and look forward to the discussion especially as
> some bits I'm personally in two minds about.
>
> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties
>
>
>
> Here is a link to a alternative option mentioned in the kip but one i
> would personally would discard (disadvantages mentioned in kip)
>
> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full?
>
>
> Thanks
>
> Mike
>
>
>
>
>
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>
The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to