> I agree with the critique of compaction not having a value. I think we should > consider fixing that directly.
Agree that the compaction issue is troubling: compacted "null" deletes are incompatible w/ headers that must be packed into the message value. Are there any alternatives on compaction delete semantics that could address this? The KIP wiki discussion I think mostly assumes that compaction-delete is what it is and can't be changed/fixed. -Dana On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com> wrote: > > Hi Jay, > > Thanks for the comments and feedback. > > I think its quite clear that if a problem keeps arising then it is clear that > it needs resolving, and addressing properly. > > Fair enough at linkedIn, and historically for the very first use cases > addressing this maybe not have been a big priority. But as Kafka is now > Apache open source and being picked up by many including my company, it is > clear and evident that this is a requirement and issue that needs to be now > addressed to address these needs. > > The fact in almost every transport mechanism including networking layers in > the enterprise ive worked in, there has always been headers i think clearly > shows their need and success for a transport mechanism. > > I understand some concerns with regards to impact for others not needing it. > > What we are proposing is flexible solution that provides no overhead on > storage or network traffic layers if you chose not to use headers, but does > enable those who need or want it to use it. > > > On your response to 1), there is nothing saying that it should be put in any > faster or without diligence and the same KIP process can still apply for > adding kafka-scope headers, having headers, just makes it easier to add, > without constant message and record changes. Timestamp is a clear real > example of actually what should be in a header (along with other fields) but > as such the whole message/record object needed to be changed to add this, as > will any further headers deemed needed by kafka. > > On response to 2) why within my company as a platforms designer should i > enforce that all teams use the same serialization for their payloads? But > what i do need is some core cross cutting concerns and information addressed > at my platform level and i don't want to impose onto my development teams. > This is the same argument why byte[] is the exposed value and key because as > a messaging platform you dont want to impose that on my company. > > On response to 3) Actually this isnt true, there are many 3rd party tools, we > need to hook into our messaging flows that they only build onto standardised > interfaces as obviously the cost to have a custom implementation for every > company would be very high. > APM tooling is a clear case in point, every enterprise level APM tool on the > market is able to stitch in transaction flow end 2 end over a platform over > http, jms because they can stitch in some "magic" data in a > uniform/standardised for the two mentioned they stitch this into the headers. > It is current form they cannot do this with Kafka. Providing a standardised > interface will i believe actually benefit the project as commercial companies > like these will now be able to plugin their tooling uniformly, making it > attractive and possible. > > Some of you other concerns as Joel mentions these are more implementation > details, that i think should be agreed upon, but i think can be addressed. > > e.g. re your concern on the hashmap. > it is more than possible not to have every record have to have a hashmap > unless it actually has a header (just like we have managed to do on the > serialized meesage) so if theres a concern on the in memory record size for > those using kafka without headers. > > On your second to last comment about every team choosing their own format, > actually we do want this a little, as very first mentioned, no we don't want > a free for all, but some freedom to have different serialization has > different benefits and draw backs across our business. I can iterate these if > needed. One of the use case for headers provided by linkedIn on top of my KIP > even shows where headers could be beneficial here as a header could be used > to detail which data format the message is serialized to allowing me to > consume different formats. > > Also we have some systems that we need to integrate that pretty near > impossible to wrap or touch their binary payloads, or we’re not allowed to > touch them (historic system, or inter/intra corporate) > > Headers really gives as a solution to provide a pluggable platform, and > standardisation that allows users to build platforms that adapt to their > needs. > > > Cheers > Mike > > > ________________________________________ > From: Jay Kreps <j...@confluent.io> > Sent: Friday, October 7, 2016 4:45 PM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers > > Hey guys, > > This discussion has come up a number of times and we've always passed. > > One of things that has helped keep Kafka simple is not adding in new > abstractions and concepts except when the proposal is really elegant and > makes things simpler. > > Consider three use cases for headers: > > 1. Kafka-scope: We want to add a feature to Kafka that needs a > particular field. > 2. Company-scope: You want to add a header to be shared by everyone in > your company. > 3. World-wide scope: You are building a third party tool and want to add > some kind of header. > > For the case of (1) you should not use headers, you should just add a field > to the record format. Having a second way of encoding things doesn't make > sense. Occasionally people have complained that adding to the record format > is hard and it would be nice to just shove lots of things in quickly. I > think a better solution would be to make it easy to add to the record > format, and I think we've made progress on that. I also think we should be > insanely focused on the simplicity of the abstraction and not adding in new > thingies often---we thought about time for years before adding a timestamp > and I guarantee you we would have goofed it up if we'd gone with the > earlier proposals. These things end up being long term commitments so it's > really worth being thoughtful. > > For case (2) just use the body of the message. You don't need a globally > agreed on definition of headers, just standardize on a header you want to > include in the value in your company. Since this is just used by code in > your company having a more standard header format doesn't really help you. > In fact by using something like Avro you can define exactly the types you > want, the required header fields, etc. > > The only case that headers help is (3). This is a bit of a niche case and i > think is easily solved just making the reading and writing of given > required fields pluggable to work with the header you have. > > A couple of specific problems with this proposal: > > 1. A global registry of numeric keys is super super ugly. This seems > silly compared to the Avro (or whatever) header solution which gives more > compact encoding, rich types, etc. > 2. Using byte arrays for header values means they aren't really > interoperable for case (3). E.g. I can't make a UI that displays headers, > or allow you to set them in config. To work with third party headers, the > only case I think this really helps, you need the union of all > serialization schemes people have used for any tool. > 3. For case (2) and (3) your key numbers are going to collide like > crazy. I don't think a global registry of magic numbers maintained either > by word of mouth or checking in changes to kafka source is the right thing > to do. > 4. We are introducing a new serialization primitive which makes fields > disappear conditional on the contents of other fields. This breaks the > whole serialization/schema system we have today. > 5. We're adding a hashmap to each record > 6. This proposes making the ProducerRecord and ConsumerRecord mutable > and adding setters and getters (which we try to avoid). > > For context on LinkedIn: I set up the system there, but it may have changed > since i left. The header is maintained with the record schemas in the avro > schema registry and is required for all records. Essentially all messages > must have a field named "header" of type EventHeader which is itself a > record schema with a handful of fields (time, host, etc). The header > follows the same compatibility rules as other avro fields, so it can be > evolved in a compatible way gradually across apps. Avro is typed and > doesn't require deserializing the full record to read the header. The > header information is (timestamp, host, etc) is important and needs to > propagate into other systems like Hadoop which don't have a concept of > headers for records, so I doubt it could move out of the value in any case. > Not allowing teams to chose a data format other than avro was considered a > feature, not a bug, since the whole point was to be able to share data, > which doesn't work if every team chooses their own format. > > I agree with the critique of compaction not having a value. I think we > should consider fixing that directly. > > -Jay > > On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com> > wrote: > >> Hi All, >> >> >> I would like to discuss the following KIP proposal: >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> 82+-+Add+Record+Headers >> >> >> >> I have some initial ?drafts of roughly the changes that would be needed. >> This is no where finalized and look forward to the discussion especially as >> some bits I'm personally in two minds about. >> >> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties >> >> >> >> Here is a link to a alternative option mentioned in the kip but one i >> would personally would discard (disadvantages mentioned in kip) >> >> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full? >> >> >> Thanks >> >> Mike >> >> >> >> >> >> The information contained in this email is strictly confidential and for >> the use of the addressee only, unless otherwise indicated. If you are not >> the intended recipient, please do not read, copy, use or disclose to others >> this message or any attachment. Please also notify the sender by replying >> to this email or by telephone (+44(020 7896 0011) and then delete the email >> and any copies of it. Opinions, conclusion (etc) that do not relate to the >> official business of this company shall be understood as neither given nor >> endorsed by it. IG is a trading name of IG Markets Limited (a company >> registered in England and Wales, company number 04008957) and IG Index >> Limited (a company registered in England and Wales, company number >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >> Index Limited (register number 114059) are authorised and regulated by the >> Financial Conduct Authority. >> > The information contained in this email is strictly confidential and for the > use of the addressee only, unless otherwise indicated. If you are not the > intended recipient, please do not read, copy, use or disclose to others this > message or any attachment. Please also notify the sender by replying to this > email or by telephone (+44(020 7896 0011) and then delete the email and any > copies of it. Opinions, conclusion (etc) that do not relate to the official > business of this company shall be understood as neither given nor endorsed by > it. IG is a trading name of IG Markets Limited (a company registered in > England and Wales, company number 04008957) and IG Index Limited (a company > registered in England and Wales, company number 01190902). Registered address > at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets > Limited (register number 195355) and IG Index Limited (register number > 114059) are authorised and regulated by the Financial Conduct Authority.