both 5a and 5c would involve a wire format change, so any arguments about needing an upgrade path bumping protocol version etc apply equally to both. so the "cost" (in terms of impact of a wire format change) is the same.
5c, to me, means doing all the work (more exactly incurring all the cost) but getting very few of the benefits. a universal, agreed-upon structure for headers (specifically their keys) is, in my opinion, a basic requirement to reap the full benefits of headers - an active ecosystem of composable, re-usable, 3rd-party extensions to kafka. as for what exactly those keys are (int vs string) - since using ints is such a giant sticking point and given kafka usually operates with batching and compression and does not achieve high-enough iops for it to make a noticeable difference in CPU consumption I'm willing to go with string keys just to get that out of the way. On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <michael.pea...@ig.com> wrote: > +1 on this slimmer version of our proposal > > I def think the Id space we can reduce from the proposed int32(4bytes) > down to int16(2bytes) it saves on space and as headers we wouldn't expect > the number of headers being used concurrently being that high. > > I would wonder if we should make the value byte array length still int32 > though as This is the standard Max array length in Java saying that it is a > header and I guess limiting the size is sensible and would work for all the > use cases we have in mind so happy with limiting this. > > Do people generally concur on Magnus's slimmer version? Anyone see any > issues if we moved from int32 to int16? > > Re configurable ids per plugin over a global registry also would work for > us. As such if this has better concensus over the proposed global registry > I'd be happy to change that. > > I was already sold on ints over strings for keys ;) > > Cheers > Mike > > ________________________________________ > From: Magnus Edenhill <mag...@edenhill.se> > Sent: Monday, November 7, 2016 10:10:21 PM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers > > Hi, > > I'm +1 for adding generic message headers, but I do share the concerns > previously aired on this thread and during the KIP meeting. > > So let me propose a slimmer alternative that does not require any sort of > global header registry, does not affect broker performance or operations, > and adds as little overhead as possible. > > > Message > ------------ > The protocol Message type is extended with a Headers array consting of > Tags, where a Tag is defined as: > int16 Id > int16 Len // binary_data length > binary_data[Len] // opaque binary data > > > Ids > --- > The Id space is not centrally managed, so whenever an application needs to > add headers, or use an eco-system plugin that does, its Id allocation will > need to be manually configured. > This moves the allocation concern from the global space down to > organization level and avoids the risk for id conflicts. > Example pseudo-config for some app: > sometrackerplugin.tag.sourcev3.id=1000 > dbthing.tag.tablename.id=1001 > myschemareg.tag.schemaname.id=1002 > myschemareg.tag.schemaversion.id=1003 > > > Each header-writing or header-reading plugin must provide means (typically > through configuration) to specify the tag for each header it uses. Defaults > should be avoided. > A consumer silently ignores tags it does not have a mapping for (since the > binary_data can't be parsed without knowing what it is). > > Id range 0..999 is reserved for future use by the broker and must not be > used by plugins. > > > > Broker > --------- > The broker does not process the tags (other than the standard protocol > syntax verification), it simply stores and forwards them as opaque data. > > Standard message translation (removal of Headers) kicks in for older > clients. > > > Why not string ids? > ------------------------- > String ids might seem like a good idea, but: > * does not really solve uniqueness > * consumes a lot of space (2 byte string length + string, per header) to > be meaningful > * doesn't really say anything how to parse the tag's data, so it is in > effect useless on its own. > > > Regards, > Magnus > > > > > 2016-11-07 18:32 GMT+01:00 Michael Pearce <michael.pea...@ig.com>: > > > Hi Roger, > > > > Thanks for the support. > > > > I think the key thing is to have a common key space to make an ecosystem, > > there does have to be some level of contract for people to play nicely. > > > > Having map<String, byte[]> or as per current proposed in kip of having a > > numerical key space of map<int, byte[]> is a level of the contract that > > most people would expect. > > > > I think the example in a previous comment someone else made linking to > AWS > > blog and also implemented api where originally they didn’t have a header > > space but not they do, where keys are uniform but the value can be > string, > > int, anything is a good example. > > > > Having a custom MetadataSerializer is something we had played with, but > > discounted the idea, as if you wanted everyone to work the same way in > the > > ecosystem, having to have this also customizable makes it a bit harder. > > Think about making the whole message record custom serializable, this > would > > make it fairly tricky (though it would not be impossible) to have made > work > > nicely. Having the value customizable we thought is a reasonable tradeoff > > here of flexibility over contract of interaction between different > parties. > > > > Is there a particular case or benefit of having serialization > customizable > > that you have in mind? > > > > Saying this it is obviously something that could be implemented, if there > > is a need. If we did go this avenue I think a defaulted serializer > > implementation should exist so for the 80:20 rule, people can just have > the > > broker and clients get default behavior. > > > > Cheers > > Mike > > > > On 11/6/16, 5:25 PM, "radai" <radai.rosenbl...@gmail.com> wrote: > > > > making header _key_ serialization configurable potentially undermines > > the > > board usefulness of the feature (any point along the path must be > able > > to > > read the header keys. the values may be whatever and require more > > intimate > > knowledge of the code that produced specific headers, but keys should > > be > > universally readable). > > > > it would also make it hard to write really portable plugins - say i > > wrote a > > large message splitter/combiner - if i rely on key "largeMessage" and > > values of the form "1/20" someone who uses (contrived example) > > Map<Byte[], > > Double> wouldnt be able to re-use my code. > > > > not the end of a the world within an organization, but problematic if > > you > > want to enable an ecosystem > > > > On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoo...@gmail.com > > > > wrote: > > > > > As others have laid out, I see strong reasons for a common message > > > metadata structure for the Kafka ecosystem. In particular, I've > > seen that > > > even within a single organization, infrastructure teams often own > the > > > message metadata while application teams own the application-level > > data > > > format. Allowing metadata and content to have different structure > > and > > > evolve separately is very helpful for this. Also, I think there's > a > > lot of > > > value to having a common metadata structure shared across the Kafka > > > ecosystem so that tools which leverage metadata can more easily be > > shared > > > across organizations and integrated together. > > > > > > The question is, where does the metadata structure belong? Here's > > my take: > > > > > > We change the Kafka wire and on-disk format to from a (key, value) > > model to > > > a (key, metadata, value) model where all three are byte arrays from > > the > > > brokers point of view. The primary reason for this is that it > > provides a > > > backward compatible migration path forward. Producers can start > > populating > > > metadata fields before all consumers understand the metadata > > structure. > > > For people who already have custom envelope structures, they can > > populate > > > their existing structure and the new structure for a while as they > > make the > > > transition. > > > > > > We could stop there and let the clients plug in a KeySerializer, > > > MetadataSerializer, and ValueSerializer but I think it is also be > > useful to > > > have a default MetadataSerializer that implements a key-value model > > similar > > > to AMQP or HTTP headers. Or we could go even further and > prescribe a > > > Map<String, byte[]> or Map<String, String> data model for headers > in > > the > > > clients (while still allowing custom serialization of the header > data > > > model). > > > > > > I think this would address Radai's concerns: > > > 1. All client code would not need to be updated to know about the > > > container. > > > 2. Middleware friendly clients would have a standard header data > > model to > > > work with. > > > 3. KIP is required both b/c of broker changes and because of client > > API > > > changes. > > > > > > Cheers, > > > > > > Roger > > > > > > > > > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenbl...@gmail.com> > > wrote: > > > > > > > my biggest issues with a "standard" wrapper format: > > > > > > > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be > > updated > > > to > > > > know about the container, because any old naive code trying to > > directly > > > > deserialize its own payload would keel over and die (it needs to > > know to > > > > deserialize a container, and then dig in there for its payload). > > > > 2. in order to write middleware-friendly clients that utilize > such > > a > > > > container one would basically have to write their own > > producer/consumer > > > API > > > > on top of the open source kafka one. > > > > 3. if you were going to go with a wrapper format you really dont > > need to > > > > bother with a kip (just open source your own client stack from #2 > > above > > > so > > > > others could stop re-inventing it) > > > > > > > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng < > wushuja...@gmail.com> > > > wrote: > > > > > > > > > How exactly would this work? Or maybe that's out of scope for > > this > > > email. > > > > > > > > > > > > > The information contained in this email is strictly confidential and for > > the use of the addressee only, unless otherwise indicated. If you are not > > the intended recipient, please do not read, copy, use or disclose to > others > > this message or any attachment. Please also notify the sender by replying > > to this email or by telephone (+44(020 7896 0011) and then delete the > email > > and any copies of it. Opinions, conclusion (etc) that do not relate to > the > > official business of this company shall be understood as neither given > nor > > endorsed by it. IG is a trading name of IG Markets Limited (a company > > registered in England and Wales, company number 04008957) and IG Index > > Limited (a company registered in England and Wales, company number > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > > Index Limited (register number 114059) are authorised and regulated by > the > > Financial Conduct Authority. > > > The information contained in this email is strictly confidential and for > the use of the addressee only, unless otherwise indicated. If you are not > the intended recipient, please do not read, copy, use or disclose to others > this message or any attachment. Please also notify the sender by replying > to this email or by telephone (+44(020 7896 0011) and then delete the email > and any copies of it. Opinions, conclusion (etc) that do not relate to the > official business of this company shall be understood as neither given nor > endorsed by it. IG is a trading name of IG Markets Limited (a company > registered in England and Wales, company number 04008957) and IG Index > Limited (a company registered in England and Wales, company number > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > Index Limited (register number 114059) are authorised and regulated by the > Financial Conduct Authority. >