Hey guys, This discussion has come up a number of times and we've always passed.
One of things that has helped keep Kafka simple is not adding in new abstractions and concepts except when the proposal is really elegant and makes things simpler. Consider three use cases for headers: 1. Kafka-scope: We want to add a feature to Kafka that needs a particular field. 2. Company-scope: You want to add a header to be shared by everyone in your company. 3. World-wide scope: You are building a third party tool and want to add some kind of header. For the case of (1) you should not use headers, you should just add a field to the record format. Having a second way of encoding things doesn't make sense. Occasionally people have complained that adding to the record format is hard and it would be nice to just shove lots of things in quickly. I think a better solution would be to make it easy to add to the record format, and I think we've made progress on that. I also think we should be insanely focused on the simplicity of the abstraction and not adding in new thingies often---we thought about time for years before adding a timestamp and I guarantee you we would have goofed it up if we'd gone with the earlier proposals. These things end up being long term commitments so it's really worth being thoughtful. For case (2) just use the body of the message. You don't need a globally agreed on definition of headers, just standardize on a header you want to include in the value in your company. Since this is just used by code in your company having a more standard header format doesn't really help you. In fact by using something like Avro you can define exactly the types you want, the required header fields, etc. The only case that headers help is (3). This is a bit of a niche case and i think is easily solved just making the reading and writing of given required fields pluggable to work with the header you have. A couple of specific problems with this proposal: 1. A global registry of numeric keys is super super ugly. This seems silly compared to the Avro (or whatever) header solution which gives more compact encoding, rich types, etc. 2. Using byte arrays for header values means they aren't really interoperable for case (3). E.g. I can't make a UI that displays headers, or allow you to set them in config. To work with third party headers, the only case I think this really helps, you need the union of all serialization schemes people have used for any tool. 3. For case (2) and (3) your key numbers are going to collide like crazy. I don't think a global registry of magic numbers maintained either by word of mouth or checking in changes to kafka source is the right thing to do. 4. We are introducing a new serialization primitive which makes fields disappear conditional on the contents of other fields. This breaks the whole serialization/schema system we have today. 5. We're adding a hashmap to each record 6. This proposes making the ProducerRecord and ConsumerRecord mutable and adding setters and getters (which we try to avoid). For context on LinkedIn: I set up the system there, but it may have changed since i left. The header is maintained with the record schemas in the avro schema registry and is required for all records. Essentially all messages must have a field named "header" of type EventHeader which is itself a record schema with a handful of fields (time, host, etc). The header follows the same compatibility rules as other avro fields, so it can be evolved in a compatible way gradually across apps. Avro is typed and doesn't require deserializing the full record to read the header. The header information is (timestamp, host, etc) is important and needs to propagate into other systems like Hadoop which don't have a concept of headers for records, so I doubt it could move out of the value in any case. Not allowing teams to chose a data format other than avro was considered a feature, not a bug, since the whole point was to be able to share data, which doesn't work if every team chooses their own format. I agree with the critique of compaction not having a value. I think we should consider fixing that directly. -Jay On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com> wrote: > Hi All, > > > I would like to discuss the following KIP proposal: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 82+-+Add+Record+Headers > > > > I have some initial ?drafts of roughly the changes that would be needed. > This is no where finalized and look forward to the discussion especially as > some bits I'm personally in two minds about. > > https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties > > > > Here is a link to a alternative option mentioned in the kip but one i > would personally would discard (disadvantages mentioned in kip) > > https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full? > > > Thanks > > Mike > > > > > > The information contained in this email is strictly confidential and for > the use of the addressee only, unless otherwise indicated. If you are not > the intended recipient, please do not read, copy, use or disclose to others > this message or any attachment. Please also notify the sender by replying > to this email or by telephone (+44(020 7896 0011) and then delete the email > and any copies of it. Opinions, conclusion (etc) that do not relate to the > official business of this company shall be understood as neither given nor > endorsed by it. IG is a trading name of IG Markets Limited (a company > registered in England and Wales, company number 04008957) and IG Index > Limited (a company registered in England and Wales, company number > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > Index Limited (register number 114059) are authorised and regulated by the > Financial Conduct Authority. >