Hey guys,

This discussion has come up a number of times and we've always passed.

One of things that has helped keep Kafka simple is not adding in new
abstractions and concepts except when the proposal is really elegant and
makes things simpler.

Consider three use cases for headers:

   1. Kafka-scope: We want to add a feature to Kafka that needs a
   particular field.
   2. Company-scope: You want to add a header to be shared by everyone in
   your company.
   3. World-wide scope: You are building a third party tool and want to add
   some kind of header.

For the case of (1) you should not use headers, you should just add a field
to the record format. Having a second way of encoding things doesn't make
sense. Occasionally people have complained that adding to the record format
is hard and it would be nice to just shove lots of things in quickly. I
think a better solution would be to make it easy to add to the record
format, and I think we've made progress on that. I also think we should be
insanely focused on the simplicity of the abstraction and not adding in new
thingies often---we thought about time for years before adding a timestamp
and I guarantee you we would have goofed it up if we'd gone with the
earlier proposals. These things end up being long term commitments so it's
really worth being thoughtful.

For case (2) just use the body of the message. You don't need a globally
agreed on definition of headers, just standardize on a header you want to
include in the value in your company. Since this is just used by code in
your company having a more standard header format doesn't really help you.
In fact by using something like Avro you can define exactly the types you
want, the required header fields, etc.

The only case that headers help is (3). This is a bit of a niche case and i
think is easily solved just making the reading and writing of given
required fields pluggable to work with the header you have.

A couple of specific problems with this proposal:

   1. A global registry of numeric keys is super super ugly. This seems
   silly compared to the Avro (or whatever) header solution which gives more
   compact encoding, rich types, etc.
   2. Using byte arrays for header values means they aren't really
   interoperable for case (3). E.g. I can't make a UI that displays headers,
   or allow you to set them in config. To work with third party headers, the
   only case I think this really helps, you need the union of all
   serialization schemes people have used for any tool.
   3. For case (2) and (3) your key numbers are going to collide like
   crazy. I don't think a global registry of magic numbers maintained either
   by word of mouth or checking in changes to kafka source is the right thing
   to do.
   4. We are introducing a new serialization primitive which makes fields
   disappear conditional on the contents of other fields. This breaks the
   whole serialization/schema system we have today.
   5. We're adding a hashmap to each record
   6. This proposes making the ProducerRecord and ConsumerRecord mutable
   and adding setters and getters (which we try to avoid).

For context on LinkedIn: I set up the system there, but it may have changed
since i left. The header is maintained with the record schemas in the avro
schema registry and is required for all records. Essentially all messages
must have a field named "header" of type EventHeader which is itself a
record schema with a handful of fields (time, host, etc). The header
follows the same compatibility rules as other avro fields, so it can be
evolved in a compatible way gradually across apps. Avro is typed and
doesn't require deserializing the full record to read the header. The
header information is (timestamp, host, etc) is important and needs to
propagate into other systems like Hadoop which don't have a concept of
headers for records, so I doubt it could move out of the value in any case.
Not allowing teams to chose a data format other than avro was considered a
feature, not a bug, since the whole point was to be able to share data,
which doesn't work if every team chooses their own format.

I agree with the critique of compaction not having a value. I think we
should consider fixing that directly.

-Jay

On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com>
wrote:

> Hi All,
>
>
> I would like to discuss the following KIP proposal:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 82+-+Add+Record+Headers
>
>
>
> I have some initial ?drafts of roughly the changes that would be needed.
> This is no where finalized and look forward to the discussion especially as
> some bits I'm personally in two minds about.
>
> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties
>
>
>
> Here is a link to a alternative option mentioned in the kip but one i
> would personally would discard (disadvantages mentioned in kip)
>
> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full?
>
>
> Thanks
>
> Mike
>
>
>
>
>
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>

Reply via email to