For what it is worth also i agree. As a user: 1) Yes - Headers are worthwhile 2) Yes - Headers should be a top level option
14.11.2016, 21:15, "Ignacio Solis" <iso...@igso.net>: > 1) Yes - Headers are worthwhile > 2) Yes - Headers should be a top level option > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <michael.pea...@ig.com> > wrote: > >> Hi Roger, >> >> The kip details/examples the original proposal for key spacing , not the >> new mentioned as per discussion namespace idea. >> >> We will need to update the kip, when we get agreement this is a better >> approach (which seems to be the case if I have understood the general >> feeling in the conversation) >> >> Re the variable ints, at very early stage we did think about this. I think >> the added complexity for the saving isn't worth it. I'd rather go with, if >> we want to reduce overheads and size int16 (2bytes) keys as it keeps it >> simple. >> >> On the note of no headers, there is as per the kip as we use an attribute >> bit to denote if headers are present or not as such provides a zero >> overhead currently if headers are not used. >> >> I think as radai mentions would be good first if we can get clarity if do >> we now have general consensus that (1) headers are worthwhile and useful, >> and (2) we want it as a top level entity. >> >> Just to state the obvious i believe (1) headers are worthwhile and (2) >> agree as a top level entity. >> >> Cheers >> Mike >> ________________________________________ >> From: Roger Hoover <roger.hoo...@gmail.com> >> Sent: Wednesday, November 9, 2016 9:10:47 PM >> To: dev@kafka.apache.org >> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >> >> Sorry for going a little in the weeds but thanks for the replies regarding >> varint. >> >> Agreed that a prefix and {int, int} can be the same. It doesn't look like >> that's what the KIP is saying the "Open" section. The example shows >> 2100001 >> for New Relic and 210002 for App Dynamics implying that the New Relic >> organization will have only a single header id to work with. Or is 2100001 >> a prefix? The main point of a namespace or prefix is to reduce the >> overhead of config mapping or registration depending on how >> namespaces/prefixes are managed. >> >> Would love to hear more feedback on the higher-level questions though... >> >> Cheers, >> >> Roger >> >> On Wed, Nov 9, 2016 at 11:38 AM, radai <radai.rosenbl...@gmail.com> wrote: >> >> > I think this discussion is getting a bit into the weeds on technical >> > implementation details. >> > I'd liek to step back a minute and try and establish where we are in the >> > larger picture: >> > >> > (re-wording nacho's last paragraph) >> > 1. are we all in agreement that headers are a worthwhile and useful >> > addition to have? this was contested early on >> > 2. are we all in agreement on headers as top level entity vs headers >> > squirreled-away in V? >> > >> > if there are still concerns around these #2 points (#jay? #jun?)? >> > >> > (and now back to our normal programming ...) >> > >> > varints are nice. having said that, its adding complexity (see >> > https://github.com/addthis/stream-lib/blob/master/src/ >> > main/java/com/clearspring/analytics/util/Varint.java >> > as 1st google result) and would require anyone writing other clients (C? >> > Python? Go? Bash? ;-) ) to get/implement the same, and for relatively >> > little gain (int vs string is order of magnitude, this isnt). >> > >> > int namespacing vs {int, int} namespacing are basically the same thing - >> > youre just namespacing an int64 and giving people while 2^32 ranges at a >> > time. the part i like about this is letting people have a large swath of >> > numbers with one registration so they dont have to come back for every >> > single plugin/header they want to "reserve". >> > >> > >> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <roger.hoo...@gmail.com> >> > wrote: >> > >> > > Since some of the debate has been about overhead + performance, I'm >> > > wondering if we have considered a varint encoding ( >> > > https://developers.google.com/protocol-buffers/docs/encoding#varints) >> > for >> > > the header length field (int32 in the proposal) and for header ids? If >> > you >> > > don't use headers, the overhead would be a single byte and for each >> > header >> > > id < 128 would also need only a single byte? >> > > >> > > >> > > >> > > On Wed, Nov 9, 2016 at 6:43 AM, radai <radai.rosenbl...@gmail.com> >> > wrote: >> > > >> > > > @magnus - and very dangerous (youre essentially downloading and >> > executing >> > > > arbitrary code off the internet on your servers ... bad idea without >> a >> > > > sandbox, even with) >> > > > >> > > > as for it being a purely administrative task - i disagree. >> > > > >> > > > i wish it would, really, because then my earlier point on the >> > complexity >> > > of >> > > > the remapping process would be invalid, but at linkedin, for example, >> > we >> > > > (the team im in) run kafka as a service. we dont really know what our >> > > users >> > > > (developing applications that use kafka) are up to at any given >> moment. >> > > it >> > > > is very possible (given the existance of headers and a corresponding >> > > plugin >> > > > ecosystem) for some application to "equip" their producers and >> > consumers >> > > > with the required plugin without us knowing. i dont mean to imply >> thats >> > > > bad, i just want to make the point that its not as simple keeping it >> in >> > > > sync across a large-enough organization. >> > > > >> > > > >> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <mag...@edenhill.se> >> > > > wrote: >> > > > >> > > > > I think there is a piece missing in the Strings discussion, where >> > > > > pro-Stringers >> > > > > reason that by providing unique string identifiers for each header >> > > > > everything will just >> > > > > magically work for all parts of the stream pipeline. >> > > > > >> > > > > But the strings dont mean anything by themselves, and while we >> could >> > > > > probably envision >> > > > > some auto plugin loader that downloads, compiles, links and runs >> > > plugins >> > > > > on-demand >> > > > > as soon as they're seen by a consumer, I dont really see a use-case >> > for >> > > > > something >> > > > > so dynamic (and fragile) in practice. >> > > > > >> > > > > In the real world an application will be configured with a set of >> > > plugins >> > > > > to either add (producer) >> > > > > or read (consumer) headers. >> > > > > This is an administrative task based on what features a client >> > > > > needs/provides and results in >> > > > > some sort of configuration to enable and configure the desired >> > plugins. >> > > > > >> > > > > Since this needs to be kept somewhat in sync across an organisation >> > > > (there >> > > > > is no point in having producers >> > > > > add headers no consumers will read, and vice versa), the added >> > > complexity >> > > > > of assigning an id namespace >> > > > > for each plugin as it is being configured should be tolerable. >> > > > > >> > > > > >> > > > > /Magnus >> > > > > >> > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <michael.pea...@ig.com>: >> > > > > >> > > > > > Just following/catching up on what seems to be an active night :) >> > > > > > >> > > > > > @Radai sorry if it may seem obvious but what does MD stand for? >> > > > > > >> > > > > > My take on String vs Int: >> > > > > > >> > > > > > I will state first I am pro Int (16 or 32). >> > > > > > >> > > > > > I do though playing devils advocate see a big plus with the >> > argument >> > > of >> > > > > > String keys, this is around integrating into an existing >> > eco-system. >> > > > > > >> > > > > > As many other systems use String based headers (Flume, JMS) it >> > makes >> > > > it >> > > > > > much easier for these to be incorporated/integrated into. >> > > > > > >> > > > > > How with Int based headers could we provide a way/guidence to >> make >> > > this >> > > > > > integration simple / easy with transition flows over to kafka? >> > > > > > >> > > > > > * tough luck buddy you're on your own >> > > > > > * simply hash the string into int code and hope for no collisions >> > > (how >> > > > to >> > > > > > convert back though?) >> > > > > > * http2 style as mentioned by nacho. >> > > > > > >> > > > > > cheers, >> > > > > > Mike >> > > > > > >> > > > > > >> > > > > > ________________________________________ >> > > > > > From: radai <radai.rosenbl...@gmail.com> >> > > > > > Sent: Wednesday, November 9, 2016 8:12 AM >> > > > > > To: dev@kafka.apache.org >> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >> > > > > > >> > > > > > thinking about it some more, the best way to transmit the header >> > > > > remapping >> > > > > > data to consumers would be to put it in the MD response payload, >> so >> > > > maybe >> > > > > > it should be discussed now. >> > > > > > >> > > > > > >> > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai < >> radai.rosenbl...@gmail.com >> > > >> > > > > wrote: >> > > > > > >> > > > > > > im not opposed to the idea of namespace mapping. all im saying >> is >> > > > that >> > > > > > its >> > > > > > > not part of the "mvp" and, since it requires no wire format >> > change, >> > > > can >> > > > > > > always be added later. >> > > > > > > also, its not as simple as just configuring MM to do the >> > transform: >> > > > > lets >> > > > > > > say i've implemented large message support as {666,1} and on >> some >> > > > > mirror >> > > > > > > target cluster its been remapped to {999,1}. the consumer >> plugin >> > > code >> > > > > > would >> > > > > > > also need to be told to look for the large message "part X of >> Y" >> > > > header >> > > > > > > under {999,1}. doable, but tricky. >> > > > > > > >> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen Shapira < >> g...@confluent.io >> > > >> > > > > wrote: >> > > > > > > >> > > > > > >> While you can do whatever you want with a namespace and your >> > code, >> > > > > > >> what I'd expect is for each app to namespaces configurable... >> > > > > > >> >> > > > > > >> So if I accidentally used 666 for my HR department, and still >> > want >> > > > to >> > > > > > >> run RadaiApp, I can config "namespace=42" for RadaiApp and >> > > > everything >> > > > > > >> will look normal. >> > > > > > >> >> > > > > > >> This means you only need to sync usage inside your own >> > > organization. >> > > > > > >> Still hard, but somewhat easier than syncing with the entire >> > > world. >> > > > > > >> >> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, radai < >> > > radai.rosenbl...@gmail.com> >> > > > > > >> wrote: >> > > > > > >> > and we can start with {namespace, id} and no re-mapping >> > support >> > > > and >> > > > > > >> always >> > > > > > >> > add it later on if/when collisions actually happen (i dont >> > think >> > > > > > they'd >> > > > > > >> be >> > > > > > >> > a problem). >> > > > > > >> > >> > > > > > >> > every interested party (so orgs or individuals) could then >> > > > register >> > > > > a >> > > > > > >> > prefix (0 = reserved, 1 = confluent ... 666 = me :-) ) and >> do >> > > > > whatever >> > > > > > >> with >> > > > > > >> > the 2nd ID - so once linkedin registers, say 3, then >> linkedin >> > > devs >> > > > > are >> > > > > > >> free >> > > > > > >> > to use {3, *} with a reasonable expectation to to collide >> with >> > > > > > anything >> > > > > > >> > else. further partitioning of that * becomes linkedin's >> > problem, >> > > > but >> > > > > > the >> > > > > > >> > "upstream registration" of a namespace only has to happen >> > once. >> > > > > > >> > >> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, James Cheng < >> > > wushuja...@gmail.com >> > > > > >> > > > > > >> wrote: >> > > > > > >> > >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen Shapira < >> > g...@confluent.io> >> > > > > > wrote: >> > > > > > >> >> > >> > > > > > >> >> > Thank you so much for this clear and fair summary of the >> > > > > arguments. >> > > > > > >> >> > >> > > > > > >> >> > I'm in favor of ints. Not a deal-breaker, but in favor. >> > > > > > >> >> > >> > > > > > >> >> > Even more in favor of Magnus's decentralized suggestion >> > with >> > > > > > Roger's >> > > > > > >> >> > tweak: add a namespace for headers. This will allow each >> > app >> > > to >> > > > > > just >> > > > > > >> >> > use whatever IDs it wants internally, and then let the >> > admin >> > > > > > >> deploying >> > > > > > >> >> > the app figure out an available namespace ID for the app >> to >> > > > live >> > > > > > in. >> > > > > > >> >> > So io.confluent.schema-registry can be namespace 0x01 on >> my >> > > > > > >> deployment >> > > > > > >> >> > and 0x57 on yours, and the poor guys developing the app >> > don't >> > > > > need >> > > > > > to >> > > > > > >> >> > worry about that. >> > > > > > >> >> > >> > > > > > >> >> >> > > > > > >> >> Gwen, if I understand your example right, an application >> > > deployer >> > > > > > might >> > > > > > >> >> decide to use 0x01 in one deployment, and that means that >> > once >> > > > the >> > > > > > >> message >> > > > > > >> >> is written into the broker, it will be saved on the broker >> > with >> > > > > that >> > > > > > >> >> specific namespace (0x01). >> > > > > > >> >> >> > > > > > >> >> If you were to mirror that message into another cluster, >> the >> > > 0x01 >> > > > > > would >> > > > > > >> >> accompany the message, right? What if the deployers of the >> > same >> > > > app >> > > > > > in >> > > > > > >> the >> > > > > > >> >> other cluster uses 0x57? They won't understand each other? >> > > > > > >> >> >> > > > > > >> >> I'm not sure that's an avoidable problem. I think it simply >> > > means >> > > > > > that >> > > > > > >> in >> > > > > > >> >> order to share data, you have to also have a shared (agreed >> > > upon) >> > > > > > >> >> understanding of what the namespaces mean. Which I think >> > makes >> > > > > sense, >> > > > > > >> >> because the alternate (sharing *nothing* at all) would mean >> > > that >> > > > > > there >> > > > > > >> >> would be no way to understand each other. >> > > > > > >> >> >> > > > > > >> >> -James >> > > > > > >> >> >> > > > > > >> >> > Gwen >> > > > > > >> >> > >> > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM, radai < >> > > > > radai.rosenbl...@gmail.com> >> > > > > > >> >> wrote: >> > > > > > >> >> >> +1 for sean's document. it covers pretty much all the >> > > > trade-offs >> > > > > > and >> > > > > > >> >> >> provides concrete figures to argue about :-) >> > > > > > >> >> >> (nit-picking - used the same xkcd twice, also trove has >> > been >> > > > > > >> superceded >> > > > > > >> >> for >> > > > > > >> >> >> purposes of high performance collections: look at >> > > > > > >> >> >> https://github.com/leventov/Koloboke) >> > > > > > >> >> >> >> > > > > > >> >> >> so to sum up the string vs int debate: >> > > > > > >> >> >> >> > > > > > >> >> >> performance - you can do 140k ops/sec _per thread_ with >> > > string >> > > > > > >> headers. >> > > > > > >> >> you >> > > > > > >> >> >> could do x2-3 better with ints. there's no arguing the >> > > > relative >> > > > > > diff >> > > > > > >> >> >> between the two, there's only the question of whether or >> > not >> > > > > _the >> > > > > > >> rest >> > > > > > >> >> of >> > > > > > >> >> >> kafka_ operates fast enough to care. if we want to make >> > > > choices >> > > > > > >> solely >> > > > > > >> >> >> based on performance we need ints. if we are willing to >> > > > > > >> >> settle/compromise >> > > > > > >> >> >> for a nicer (to some) API than strings are good enough >> for >> > > the >> > > > > > >> current >> > > > > > >> >> >> state of affairs. >> > > > > > >> >> >> >> > > > > > >> >> >> message size - with batching and compression it comes >> down >> > > to >> > > > a >> > > > > > ~5% >> > > > > > >> >> >> difference (internal testing, not in the doc. maybe >> would >> > > help >> > > > > > >> adding if >> > > > > > >> >> >> this becomes a point of contention?). this means it wont >> > > > really >> > > > > > >> affect >> > > > > > >> >> >> kafka in "throughput mode" (large, compressed batches). >> in >> > > > "low >> > > > > > >> latency" >> > > > > > >> >> >> mode (meaning less/no batching and compression) the >> > > difference >> > > > > can >> > > > > > >> be >> > > > > > >> >> >> extreme (it'll easily be an order of magnitude with >> small >> > > > > payloads >> > > > > > >> like >> > > > > > >> >> >> stock ticks and header keys of the form >> > > > > > >> >> >> "com.acme.infraTeam.kafka.hiMom.auditPlugin"). we have >> a >> > > few >> > > > > such >> > > > > > >> >> topics at >> > > > > > >> >> >> linkedin where actual payloads are ~2 ints and are >> > eclipsed >> > > by >> > > > > our >> > > > > > >> >> in-house >> > > > > > >> >> >> audit "header" which is why we liked ints to begin with. >> > > > > > >> >> >> >> > > > > > >> >> >> "ease of use" - strings would probably still require >> > _some_ >> > > > > degree >> > > > > > >> of >> > > > > > >> >> >> partitioning by convention (imagine if everyone used the >> > key >> > > > > > >> "infra"...) >> > > > > > >> >> >> but its very intuitive for java devs to do anyway >> > > > > (reverse-domain >> > > > > > is >> > > > > > >> >> >> ingrained into java developers at a young age :-) ). >> also >> > > most >> > > > > > java >> > > > > > >> devs >> > > > > > >> >> >> find Map<String, whatever> more intuitive than >> > Map<Integer, >> > > > > > >> whatever> - >> > > > > > >> >> >> probably because of other text-based protocols like >> http. >> > > ints >> > > > > > would >> > > > > > >> >> >> require a number registry. if you think number >> registries >> > > are >> > > > > hard >> > > > > > >> just >> > > > > > >> >> >> look at the wiki page for KIPs (specifically the number >> > for >> > > > next >> > > > > > >> >> available >> > > > > > >> >> >> KIP) and think again - we are probably talking about the >> > > same >> > > > > > >> volume of >> > > > > > >> >> >> requests. also this would only be "required" (good >> > > > citizenship, >> > > > > > more >> > > > > > >> >> like) >> > > > > > >> >> >> if you want to publish your plugin for others to use. >> > within >> > > > > your >> > > > > > >> org do >> > > > > > >> >> >> whatever you want - just know that if you use [some >> > > "reserved" >> > > > > > >> range] >> > > > > > >> >> and a >> > > > > > >> >> >> future kafka update breaks it its your problem. RTFM. >> > > > > > >> >> >> >> > > > > > >> >> >> personally im in favor of ints. >> > > > > > >> >> >> >> > > > > > >> >> >> having said that (and like nacho) I will settle if int >> vs >> > > > string >> > > > > > >> remains >> > > > > > >> >> >> the only obstacle to this. >> > > > > > >> >> >> >> > > > > > >> >> >> On Tue, Nov 8, 2016 at 3:53 PM, Nacho Solis >> > > > > > >> <nso...@linkedin.com.invalid >> > > > > > >> >> > >> > > > > > >> >> >> wrote: >> > > > > > >> >> >> >> > > > > > >> >> >>> I think it's well known I've been pushing for ints >> (and I >> > > > could >> > > > > > >> switch >> > > > > > >> >> to >> > > > > > >> >> >>> 16 bit shorts if pressed). >> > > > > > >> >> >>> >> > > > > > >> >> >>> - efficient (space) >> > > > > > >> >> >>> - efficient (processing) >> > > > > > >> >> >>> - easily partitionable >> > > > > > >> >> >>> >> > > > > > >> >> >>> >> > > > > > >> >> >>> However, if the only thing that is keeping us from >> > adopting >> > > > > > >> headers is >> > > > > > >> >> the >> > > > > > >> >> >>> use of strings vs ints as keys, then I would cave in >> and >> > > > accept >> > > > > > >> >> strings. If >> > > > > > >> >> >>> we do so, I would like to limit string keys to 128 >> bytes >> > in >> > > > > > length. >> > > > > > >> >> This >> > > > > > >> >> >>> way 1) I could use a 3 letter string if I wanted >> > > (effectively >> > > > > > >> using 4 >> > > > > > >> >> total >> > > > > > >> >> >>> bytes), 2) limit overall impact of possible keys (don't >> > > > really >> > > > > > want >> > > > > > >> >> people >> > > > > > >> >> >>> to send a 16K header string key). >> > > > > > >> >> >>> >> > > > > > >> >> >>> Nacho >> > > > > > >> >> >>> >> > > > > > >> >> >>> >> > > > > > >> >> >>> On Tue, Nov 8, 2016 at 3:35 PM, Gwen Shapira < >> > > > > g...@confluent.io> >> > > > > > >> >> wrote: >> > > > > > >> >> >>> >> > > > > > >> >> >>>> Forgot to mention: Thank you for quantifying the >> > > trade-off - >> > > > > it >> > > > > > is >> > > > > > >> >> >>>> helpful and important regardless of what we end up >> > > deciding. >> > > > > > >> >> >>>> >> > > > > > >> >> >>>> On Tue, Nov 8, 2016 at 3:12 PM, Sean McCauliff >> > > > > > >> >> >>>> <smccaul...@linkedin.com.invalid> wrote: >> > > > > > >> >> >>>>> On Tue, Nov 8, 2016 at 2:15 PM, Gwen Shapira < >> > > > > > g...@confluent.io> >> > > > > > >> >> >>> wrote: >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>>> Since Kafka specifically targets high-throughput, >> > > > > low-latency >> > > > > > >> >> >>>>>> use-cases, I don't think we should trade them off >> that >> > > > > easily. >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> I find these kind of design goals not to be really >> > > helpful >> > > > > > unless >> > > > > > >> >> it's >> > > > > > >> >> >>>>> quantified in someway. Because it's always possible >> to >> > > > argue >> > > > > > >> against >> > > > > > >> >> >>>>> something as either being not performant or just an >> > > > > > >> implementation >> > > > > > >> >> >>>> detail. >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> This is a single threaded benchmarks so all the >> > > > measurements >> > > > > > are >> > > > > > >> per >> > > > > > >> >> >>>>> thread. >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> For 1M messages/s/thread if header keys are int and >> > you >> > > > had >> > > > > > >> even a >> > > > > > >> >> >>>> single >> > > > > > >> >> >>>>> header key, value pair then it's still about 2^-2 >> > > > > microseconds >> > > > > > >> which >> > > > > > >> >> >>>> means >> > > > > > >> >> >>>>> you only have another 0.75 microseconds to do >> > everything >> > > > else >> > > > > > you >> > > > > > >> >> want >> > > > > > >> >> >>> to >> > > > > > >> >> >>>>> do with a message (1M messages/s means 1 micro second >> > per >> > > > > > >> message). >> > > > > > >> >> >>> With >> > > > > > >> >> >>>>> string header keys there is still 0.5 micro seconds >> to >> > > > > process >> > > > > > a >> > > > > > >> >> >>> message. >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> I love strings as much as the next guy (we had them >> in >> > > > > Flume), >> > > > > > >> but I >> > > > > > >> >> >>>>>> was convinced by Magnus/Michael/Radai that strings >> > don't >> > > > > > >> actually >> > > > > > >> >> have >> > > > > > >> >> >>>>>> strong benefits as opposed to ints (you'll need a >> > string >> > > > > > >> registry >> > > > > > >> >> >>>>>> anyway - otherwise, how will you know what does the >> > > > > > "profile_id" >> > > > > > >> >> >>>>>> header refers to?) and I want to keep closer to our >> > > > original >> > > > > > >> design >> > > > > > >> >> >>>>>> goals for Kafka. >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> "confluent.profile_id" >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> If someone likes strings in the headers and doesn't >> do >> > > > > > millions >> > > > > > >> of >> > > > > > >> >> >>>>>> messages a sec, they probably have lots of other >> > systems >> > > > > they >> > > > > > >> can >> > > > > > >> >> use >> > > > > > >> >> >>>>>> instead. >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> None of them will scale like Kafka. Horizontal >> scaling >> > > is >> > > > > > still >> > > > > > >> >> good. >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> On Tue, Nov 8, 2016 at 1:22 PM, Sean McCauliff >> > > > > > >> >> >>>>>> <smccaul...@linkedin.com.invalid> wrote: >> > > > > > >> >> >>>>>>> +1 for String keys. >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> I've been doing some bechmarking and it seems like >> > the >> > > > > > speedup >> > > > > > >> for >> > > > > > >> >> >>>> using >> > > > > > >> >> >>>>>>> integer keys is about 2-5 depending on the length >> of >> > > the >> > > > > > >> strings >> > > > > > >> >> and >> > > > > > >> >> >>>> what >> > > > > > >> >> >>>>>>> collections are being used. The overall amount of >> > time >> > > > > spent >> > > > > > >> >> >>> parsing >> > > > > > >> >> >>>> a >> > > > > > >> >> >>>>>> set >> > > > > > >> >> >>>>>>> of header key, value pairs probably does not matter >> > > > unless >> > > > > > you >> > > > > > >> are >> > > > > > >> >> >>>>>> getting >> > > > > > >> >> >>>>>>> close to 1M messages per consumer. In which case >> > > > probably >> > > > > > >> don't >> > > > > > >> >> use >> > > > > > >> >> >>>>>>> headers. There is also the option to use very >> short >> > > > > strings; >> > > > > > >> some >> > > > > > >> >> >>>> that >> > > > > > >> >> >>>>>> are >> > > > > > >> >> >>>>>>> even shorter than integers. >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> Partitioning the string key space will be easier >> than >> > > > > > >> partitioning >> > > > > > >> >> >>> an >> > > > > > >> >> >>>>>>> integer key space. We won't need a global registry. >> > > > Kafka >> > > > > > >> >> >>> internally >> > > > > > >> >> >>>> can >> > > > > > >> >> >>>>>>> reserve some prefix like "_" as its namespace. >> > > Everyone >> > > > > else >> > > > > > >> can >> > > > > > >> >> >>> use >> > > > > > >> >> >>>>>> their >> > > > > > >> >> >>>>>>> company or project name as namespace prefix and >> life >> > > > should >> > > > > > be >> > > > > > >> >> good. >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> Here's the link to some of the benchmarking info: >> > > > > > >> >> >>>>>>> https://docs.google.com/document/d/1tfT- >> > > > > > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL >> > > > > > >> >> >>>>>> dw6p65pAI/edit?usp=sharing >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> -- >> > > > > > >> >> >>>>>>> Sean McCauliff >> > > > > > >> >> >>>>>>> Staff Software Engineer >> > > > > > >> >> >>>>>>> Kafka >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> smccaul...@linkedin.com >> > > > > > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192 >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce < >> > > > > > >> >> >>>> michael.pea...@ig.com> >> > > > > > >> >> >>>>>>> wrote: >> > > > > > >> >> >>>>>>> >> > > > > > >> >> >>>>>>>> +1 on this slimmer version of our proposal >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> I def think the Id space we can reduce from the >> > > proposed >> > > > > > >> >> >>>> int32(4bytes) >> > > > > > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as >> > headers >> > > > we >> > > > > > >> wouldn't >> > > > > > >> >> >>>>>> expect >> > > > > > >> >> >>>>>>>> the number of headers being used concurrently >> being >> > > that >> > > > > > high. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> I would wonder if we should make the value byte >> > array >> > > > > length >> > > > > > >> still >> > > > > > >> >> >>>> int32 >> > > > > > >> >> >>>>>>>> though as This is the standard Max array length in >> > > Java >> > > > > > saying >> > > > > > >> >> that >> > > > > > >> >> >>>> it >> > > > > > >> >> >>>>>> is a >> > > > > > >> >> >>>>>>>> header and I guess limiting the size is sensible >> and >> > > > would >> > > > > > >> work >> > > > > > >> >> for >> > > > > > >> >> >>>> all >> > > > > > >> >> >>>>>> the >> > > > > > >> >> >>>>>>>> use cases we have in mind so happy with limiting >> > this. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Do people generally concur on Magnus's slimmer >> > > version? >> > > > > > >> Anyone see >> > > > > > >> >> >>>> any >> > > > > > >> >> >>>>>>>> issues if we moved from int32 to int16? >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Re configurable ids per plugin over a global >> > registry >> > > > also >> > > > > > >> would >> > > > > > >> >> >>> work >> > > > > > >> >> >>>>>> for >> > > > > > >> >> >>>>>>>> us. As such if this has better concensus over the >> > > > > proposed >> > > > > > >> global >> > > > > > >> >> >>>>>> registry >> > > > > > >> >> >>>>>>>> I'd be happy to change that. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> I was already sold on ints over strings for keys >> ;) >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Cheers >> > > > > > >> >> >>>>>>>> Mike >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> ________________________________________ >> > > > > > >> >> >>>>>>>> From: Magnus Edenhill <mag...@edenhill.se> >> > > > > > >> >> >>>>>>>> Sent: Monday, November 7, 2016 10:10:21 PM >> > > > > > >> >> >>>>>>>> To: dev@kafka.apache.org >> > > > > > >> >> >>>>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Hi, >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> I'm +1 for adding generic message headers, but I >> do >> > > > share >> > > > > > the >> > > > > > >> >> >>>> concerns >> > > > > > >> >> >>>>>>>> previously aired on this thread and during the KIP >> > > > > meeting. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> So let me propose a slimmer alternative that does >> > not >> > > > > > require >> > > > > > >> any >> > > > > > >> >> >>>> sort >> > > > > > >> >> >>>>>> of >> > > > > > >> >> >>>>>>>> global header registry, does not affect broker >> > > > performance >> > > > > > or >> > > > > > >> >> >>>>>> operations, >> > > > > > >> >> >>>>>>>> and adds as little overhead as possible. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Message >> > > > > > >> >> >>>>>>>> ------------ >> > > > > > >> >> >>>>>>>> The protocol Message type is extended with a >> Headers >> > > > array >> > > > > > >> >> consting >> > > > > > >> >> >>>> of >> > > > > > >> >> >>>>>>>> Tags, where a Tag is defined as: >> > > > > > >> >> >>>>>>>> int16 Id >> > > > > > >> >> >>>>>>>> int16 Len // binary_data length >> > > > > > >> >> >>>>>>>> binary_data[Len] // opaque binary data >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Ids >> > > > > > >> >> >>>>>>>> --- >> > > > > > >> >> >>>>>>>> The Id space is not centrally managed, so whenever >> > an >> > > > > > >> application >> > > > > > >> >> >>>> needs >> > > > > > >> >> >>>>>> to >> > > > > > >> >> >>>>>>>> add headers, or use an eco-system plugin that >> does, >> > > its >> > > > Id >> > > > > > >> >> >>> allocation >> > > > > > >> >> >>>>>> will >> > > > > > >> >> >>>>>>>> need to be manually configured. >> > > > > > >> >> >>>>>>>> This moves the allocation concern from the global >> > > space >> > > > > down >> > > > > > >> to >> > > > > > >> >> >>>>>>>> organization level and avoids the risk for id >> > > conflicts. >> > > > > > >> >> >>>>>>>> Example pseudo-config for some app: >> > > > > > >> >> >>>>>>>> sometrackerplugin.tag.sourcev3.id=1000 >> > > > > > >> >> >>>>>>>> dbthing.tag.tablename.id=1001 >> > > > > > >> >> >>>>>>>> myschemareg.tag.schemaname.id=1002 >> > > > > > >> >> >>>>>>>> myschemareg.tag.schemaversion.id=1003 >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Each header-writing or header-reading plugin must >> > > > provide >> > > > > > >> means >> > > > > > >> >> >>>>>> (typically >> > > > > > >> >> >>>>>>>> through configuration) to specify the tag for each >> > > > header >> > > > > it >> > > > > > >> uses. >> > > > > > >> >> >>>>>> Defaults >> > > > > > >> >> >>>>>>>> should be avoided. >> > > > > > >> >> >>>>>>>> A consumer silently ignores tags it does not have >> a >> > > > > mapping >> > > > > > >> for >> > > > > > >> >> >>>> (since >> > > > > > >> >> >>>>>> the >> > > > > > >> >> >>>>>>>> binary_data can't be parsed without knowing what >> it >> > > is). >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Id range 0..999 is reserved for future use by the >> > > broker >> > > > > and >> > > > > > >> must >> > > > > > >> >> >>>> not be >> > > > > > >> >> >>>>>>>> used by plugins. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Broker >> > > > > > >> >> >>>>>>>> --------- >> > > > > > >> >> >>>>>>>> The broker does not process the tags (other than >> the >> > > > > > standard >> > > > > > >> >> >>>> protocol >> > > > > > >> >> >>>>>>>> syntax verification), it simply stores and >> forwards >> > > them >> > > > > as >> > > > > > >> opaque >> > > > > > >> >> >>>> data. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Standard message translation (removal of Headers) >> > > kicks >> > > > in >> > > > > > for >> > > > > > >> >> >>> older >> > > > > > >> >> >>>>>>>> clients. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Why not string ids? >> > > > > > >> >> >>>>>>>> ------------------------- >> > > > > > >> >> >>>>>>>> String ids might seem like a good idea, but: >> > > > > > >> >> >>>>>>>> * does not really solve uniqueness >> > > > > > >> >> >>>>>>>> * consumes a lot of space (2 byte string length + >> > > > string, >> > > > > > per >> > > > > > >> >> >>>> header) >> > > > > > >> >> >>>>>> to >> > > > > > >> >> >>>>>>>> be meaningful >> > > > > > >> >> >>>>>>>> * doesn't really say anything how to parse the >> tag's >> > > > data, >> > > > > > so >> > > > > > >> it >> > > > > > >> >> >>> is >> > > > > > >> >> >>>> in >> > > > > > >> >> >>>>>>>> effect useless on its own. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> Regards, >> > > > > > >> >> >>>>>>>> Magnus >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>> 2016-11-07 18:32 GMT+01:00 Michael Pearce < >> > > > > > >> michael.pea...@ig.com >> > > > > > >> >> >: >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>>>>> Hi Roger, >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Thanks for the support. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> I think the key thing is to have a common key >> space >> > > to >> > > > > make >> > > > > > >> an >> > > > > > >> >> >>>>>> ecosystem, >> > > > > > >> >> >>>>>>>>> there does have to be some level of contract for >> > > people >> > > > > to >> > > > > > >> play >> > > > > > >> >> >>>>>> nicely. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Having map<String, byte[]> or as per current >> > proposed >> > > > in >> > > > > > kip >> > > > > > >> of >> > > > > > >> >> >>>>>> having a >> > > > > > >> >> >>>>>>>>> numerical key space of map<int, byte[]> is a >> level >> > > of >> > > > > the >> > > > > > >> >> >>> contract >> > > > > > >> >> >>>>>> that >> > > > > > >> >> >>>>>>>>> most people would expect. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> I think the example in a previous comment someone >> > > else >> > > > > made >> > > > > > >> >> >>>> linking to >> > > > > > >> >> >>>>>>>> AWS >> > > > > > >> >> >>>>>>>>> blog and also implemented api where originally >> they >> > > > > didn't >> > > > > > >> have a >> > > > > > >> >> >>>>>> header >> > > > > > >> >> >>>>>>>>> space but not they do, where keys are uniform but >> > the >> > > > > value >> > > > > > >> can >> > > > > > >> >> >>> be >> > > > > > >> >> >>>>>>>> string, >> > > > > > >> >> >>>>>>>>> int, anything is a good example. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Having a custom MetadataSerializer is something >> we >> > > had >> > > > > > played >> > > > > > >> >> >>> with, >> > > > > > >> >> >>>>>> but >> > > > > > >> >> >>>>>>>>> discounted the idea, as if you wanted everyone to >> > > work >> > > > > the >> > > > > > >> same >> > > > > > >> >> >>>> way in >> > > > > > >> >> >>>>>>>> the >> > > > > > >> >> >>>>>>>>> ecosystem, having to have this also customizable >> > > makes >> > > > > it a >> > > > > > >> bit >> > > > > > >> >> >>>>>> harder. >> > > > > > >> >> >>>>>>>>> Think about making the whole message record >> custom >> > > > > > >> serializable, >> > > > > > >> >> >>>> this >> > > > > > >> >> >>>>>>>> would >> > > > > > >> >> >>>>>>>>> make it fairly tricky (though it would not be >> > > > impossible) >> > > > > > to >> > > > > > >> have >> > > > > > >> >> >>>> made >> > > > > > >> >> >>>>>>>> work >> > > > > > >> >> >>>>>>>>> nicely. Having the value customizable we thought >> > is a >> > > > > > >> reasonable >> > > > > > >> >> >>>>>> tradeoff >> > > > > > >> >> >>>>>>>>> here of flexibility over contract of interaction >> > > > between >> > > > > > >> >> >>> different >> > > > > > >> >> >>>>>>>> parties. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Is there a particular case or benefit of having >> > > > > > serialization >> > > > > > >> >> >>>>>>>> customizable >> > > > > > >> >> >>>>>>>>> that you have in mind? >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Saying this it is obviously something that could >> be >> > > > > > >> implemented, >> > > > > > >> >> >>> if >> > > > > > >> >> >>>>>> there >> > > > > > >> >> >>>>>>>>> is a need. If we did go this avenue I think a >> > > defaulted >> > > > > > >> >> >>> serializer >> > > > > > >> >> >>>>>>>>> implementation should exist so for the 80:20 >> rule, >> > > > people >> > > > > > can >> > > > > > >> >> >>> just >> > > > > > >> >> >>>>>> have >> > > > > > >> >> >>>>>>>> the >> > > > > > >> >> >>>>>>>>> broker and clients get default behavior. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> Cheers >> > > > > > >> >> >>>>>>>>> Mike >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> On 11/6/16, 5:25 PM, "radai" < >> > > > radai.rosenbl...@gmail.com >> > > > > > >> > > > > > >> wrote: >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> making header _key_ serialization configurable >> > > > > > potentially >> > > > > > >> >> >>>>>> undermines >> > > > > > >> >> >>>>>>>>> the >> > > > > > >> >> >>>>>>>>> board usefulness of the feature (any point >> along >> > > the >> > > > > > path >> > > > > > >> >> >>> must >> > > > > > >> >> >>>> be >> > > > > > >> >> >>>>>>>> able >> > > > > > >> >> >>>>>>>>> to >> > > > > > >> >> >>>>>>>>> read the header keys. the values may be >> whatever >> > > and >> > > > > > >> require >> > > > > > >> >> >>>> more >> > > > > > >> >> >>>>>>>>> intimate >> > > > > > >> >> >>>>>>>>> knowledge of the code that produced specific >> > > > headers, >> > > > > > but >> > > > > > >> >> >>> keys >> > > > > > >> >> >>>>>> should >> > > > > > >> >> >>>>>>>>> be >> > > > > > >> >> >>>>>>>>> universally readable). >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> it would also make it hard to write really >> > > portable >> > > > > > >> plugins - >> > > > > > >> >> >>>> say >> > > > > > >> >> >>>>>> i >> > > > > > >> >> >>>>>>>>> wrote a >> > > > > > >> >> >>>>>>>>> large message splitter/combiner - if i rely on >> > key >> > > > > > >> >> >>>> "largeMessage" >> > > > > > >> >> >>>>>> and >> > > > > > >> >> >>>>>>>>> values of the form "1/20" someone who uses >> > > > (contrived >> > > > > > >> >> >>> example) >> > > > > > >> >> >>>>>>>>> Map<Byte[], >> > > > > > >> >> >>>>>>>>> Double> wouldnt be able to re-use my code. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> not the end of a the world within an >> > organization, >> > > > but >> > > > > > >> >> >>>>>> problematic if >> > > > > > >> >> >>>>>>>>> you >> > > > > > >> >> >>>>>>>>> want to enable an ecosystem >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover < >> > > > > > >> >> >>>>>> roger.hoo...@gmail.com >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> wrote: >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>>> As others have laid out, I see strong reasons >> for >> > a >> > > > > common >> > > > > > >> >> >>>>>> message >> > > > > > >> >> >>>>>>>>>> metadata structure for the Kafka ecosystem. In >> > > > > > particular, >> > > > > > >> >> >>>> I've >> > > > > > >> >> >>>>>>>>> seen that >> > > > > > >> >> >>>>>>>>>> even within a single organization, >> infrastructure >> > > > teams >> > > > > > >> >> >>> often >> > > > > > >> >> >>>>>> own >> > > > > > >> >> >>>>>>>> the >> > > > > > >> >> >>>>>>>>>> message metadata while application teams own the >> > > > > > >> >> >>>>>> application-level >> > > > > > >> >> >>>>>>>>> data >> > > > > > >> >> >>>>>>>>>> format. Allowing metadata and content to have >> > > > different >> > > > > > >> >> >>>>>> structure >> > > > > > >> >> >>>>>>>>> and >> > > > > > >> >> >>>>>>>>>> evolve separately is very helpful for this. >> > Also, I >> > > > > think >> > > > > > >> >> >>>>>> there's >> > > > > > >> >> >>>>>>>> a >> > > > > > >> >> >>>>>>>>> lot of >> > > > > > >> >> >>>>>>>>>> value to having a common metadata structure >> shared >> > > > > across >> > > > > > >> >> >>> the >> > > > > > >> >> >>>>>> Kafka >> > > > > > >> >> >>>>>>>>>> ecosystem so that tools which leverage metadata >> > can >> > > > more >> > > > > > >> >> >>>> easily >> > > > > > >> >> >>>>>> be >> > > > > > >> >> >>>>>>>>> shared >> > > > > > >> >> >>>>>>>>>> across organizations and integrated together. >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> The question is, where does the metadata >> structure >> > > > > belong? >> > > > > > >> >> >>>>>> Here's >> > > > > > >> >> >>>>>>>>> my take: >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> We change the Kafka wire and on-disk format to >> > from >> > > a >> > > > > > (key, >> > > > > > >> >> >>>>>> value) >> > > > > > >> >> >>>>>>>>> model to >> > > > > > >> >> >>>>>>>>>> a (key, metadata, value) model where all three >> are >> > > > byte >> > > > > > >> >> >>>> arrays >> > > > > > >> >> >>>>>> from >> > > > > > >> >> >>>>>>>>> the >> > > > > > >> >> >>>>>>>>>> brokers point of view. The primary reason for >> > this >> > > is >> > > > > > that >> > > > > > >> >> >>>> it >> > > > > > >> >> >>>>>>>>> provides a >> > > > > > >> >> >>>>>>>>>> backward compatible migration path forward. >> > > Producers >> > > > > can >> > > > > > >> >> >>>> start >> > > > > > >> >> >>>>>>>>> populating >> > > > > > >> >> >>>>>>>>>> metadata fields before all consumers understand >> > the >> > > > > > >> >> >>> metadata >> > > > > > >> >> >>>>>>>>> structure. >> > > > > > >> >> >>>>>>>>>> For people who already have custom envelope >> > > > structures, >> > > > > > >> >> >>> they >> > > > > > >> >> >>>> can >> > > > > > >> >> >>>>>>>>> populate >> > > > > > >> >> >>>>>>>>>> their existing structure and the new structure >> > for a >> > > > > while >> > > > > > >> >> >>> as >> > > > > > >> >> >>>>>> they >> > > > > > >> >> >>>>>>>>> make the >> > > > > > >> >> >>>>>>>>>> transition. >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> We could stop there and let the clients plug in >> a >> > > > > > >> >> >>>> KeySerializer, >> > > > > > >> >> >>>>>>>>>> MetadataSerializer, and ValueSerializer but I >> > think >> > > it >> > > > > is >> > > > > > >> >> >>>> also >> > > > > > >> >> >>>>>> be >> > > > > > >> >> >>>>>>>>> useful to >> > > > > > >> >> >>>>>>>>>> have a default MetadataSerializer that >> implements >> > a >> > > > > > >> >> >>> key-value >> > > > > > >> >> >>>>>> model >> > > > > > >> >> >>>>>>>>> similar >> > > > > > >> >> >>>>>>>>>> to AMQP or HTTP headers. Or we could go even >> > > further >> > > > > and >> > > > > > >> >> >>>>>>>> prescribe a >> > > > > > >> >> >>>>>>>>>> Map<String, byte[]> or Map<String, String> data >> > > model >> > > > > for >> > > > > > >> >> >>>>>> headers >> > > > > > >> >> >>>>>>>> in >> > > > > > >> >> >>>>>>>>> the >> > > > > > >> >> >>>>>>>>>> clients (while still allowing custom >> serialization >> > > of >> > > > > the >> > > > > > >> >> >>>> header >> > > > > > >> >> >>>>>>>> data >> > > > > > >> >> >>>>>>>>>> model). >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> I think this would address Radai's concerns: >> > > > > > >> >> >>>>>>>>>> 1. All client code would not need to be updated >> to >> > > > know >> > > > > > >> >> >>> about >> > > > > > >> >> >>>>>> the >> > > > > > >> >> >>>>>>>>>> container. >> > > > > > >> >> >>>>>>>>>> 2. Middleware friendly clients would have a >> > standard >> > > > > > header >> > > > > > >> >> >>>> data >> > > > > > >> >> >>>>>>>>> model to >> > > > > > >> >> >>>>>>>>>> work with. >> > > > > > >> >> >>>>>>>>>> 3. KIP is required both b/c of broker changes >> and >> > > > > because >> > > > > > >> >> >>> of >> > > > > > >> >> >>>>>> client >> > > > > > >> >> >>>>>>>>> API >> > > > > > >> >> >>>>>>>>>> changes. >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> Cheers, >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> Roger >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> On Wed, Nov 2, 2016 at 4:38 PM, radai < >> > > > > > >> >> >>>>>> radai.rosenbl...@gmail.com> >> > > > > > >> >> >>>>>>>>> wrote: >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>>>> my biggest issues with a "standard" wrapper >> > format: >> > > > > > >> >> >>>>>>>>>>> >> > > > > > >> >> >>>>>>>>>>> 1. _ALL_ client _CODE_ (as opposed to kafka lib >> > > > > version) >> > > > > > >> >> >>>> must >> > > > > > >> >> >>>>>> be >> > > > > > >> >> >>>>>>>>> updated >> > > > > > >> >> >>>>>>>>>> to >> > > > > > >> >> >>>>>>>>>>> know about the container, because any old naive >> > > code >> > > > > > >> >> >>>> trying to >> > > > > > >> >> >>>>>>>>> directly >> > > > > > >> >> >>>>>>>>>>> deserialize its own payload would keel over and >> > die >> > > > (it >> > > > > > >> >> >>>> needs >> > > > > > >> >> >>>>>> to >> > > > > > >> >> >>>>>>>>> know to >> > > > > > >> >> >>>>>>>>>>> deserialize a container, and then dig in there >> > for >> > > > its >> > > > > > >> >> >>>>>> payload). >> > > > > > >> >> >>>>>>>>>>> 2. in order to write middleware-friendly >> clients >> > > that >> > > > > > >> >> >>>> utilize >> > > > > > >> >> >>>>>>>> such >> > > > > > >> >> >>>>>>>>> a >> > > > > > >> >> >>>>>>>>>>> container one would basically have to write >> their >> > > own >> > > > > > >> >> >>>>>>>>> producer/consumer >> > > > > > >> >> >>>>>>>>>> API >> > > > > > >> >> >>>>>>>>>>> on top of the open source kafka one. >> > > > > > >> >> >>>>>>>>>>> 3. if you were going to go with a wrapper >> format >> > > you >> > > > > > >> >> >>> really >> > > > > > >> >> >>>>>> dont >> > > > > > >> >> >>>>>>>>> need to >> > > > > > >> >> >>>>>>>>>>> bother with a kip (just open source your own >> > client >> > > > > stack >> > > > > > >> >> >>>>>> from #2 >> > > > > > >> >> >>>>>>>>> above >> > > > > > >> >> >>>>>>>>>> so >> > > > > > >> >> >>>>>>>>>>> others could stop re-inventing it) >> > > > > > >> >> >>>>>>>>>>> >> > > > > > >> >> >>>>>>>>>>> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng < >> > > > > > >> >> >>>>>>>> wushuja...@gmail.com> >> > > > > > >> >> >>>>>>>>>> wrote: >> > > > > > >> >> >>>>>>>>>>> >> > > > > > >> >> >>>>>>>>>>>> How exactly would this work? Or maybe that's >> out >> > > of >> > > > > > >> >> >>> scope >> > > > > > >> >> >>>>>> for >> > > > > > >> >> >>>>>>>>> this >> > > > > > >> >> >>>>>>>>>> email. >> > > > > > >> >> >>>>>>>>>>> >> > > > > > >> >> >>>>>>>>>> >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>>> The information contained in this email is >> strictly >> > > > > > >> confidential >> > > > > > >> >> >>>> and >> > > > > > >> >> >>>>>> for >> > > > > > >> >> >>>>>>>>> the use of the addressee only, unless otherwise >> > > > > indicated. >> > > > > > >> If you >> > > > > > >> >> >>>> are >> > > > > > >> >> >>>>>> not >> > > > > > >> >> >>>>>>>>> the intended recipient, please do not read, copy, >> > use >> > > > or >> > > > > > >> disclose >> > > > > > >> >> >>>> to >> > > > > > >> >> >>>>>>>> others >> > > > > > >> >> >>>>>>>>> this message or any attachment. Please also >> notify >> > > the >> > > > > > >> sender by >> > > > > > >> >> >>>>>> replying >> > > > > > >> >> >>>>>>>>> to this email or by telephone (+44(020 7896 0011) >> > and >> > > > > then >> > > > > > >> delete >> > > > > > >> >> >>>> the >> > > > > > >> >> >>>>>>>> email >> > > > > > >> >> >>>>>>>>> and any copies of it. Opinions, conclusion (etc) >> > that >> > > > do >> > > > > > not >> > > > > > >> >> >>>> relate to >> > > > > > >> >> >>>>>>>> the >> > > > > > >> >> >>>>>>>>> official business of this company shall be >> > understood >> > > > as >> > > > > > >> neither >> > > > > > >> >> >>>> given >> > > > > > >> >> >>>>>>>> nor >> > > > > > >> >> >>>>>>>>> endorsed by it. IG is a trading name of IG >> Markets >> > > > > Limited >> > > > > > (a >> > > > > > >> >> >>>> company >> > > > > > >> >> >>>>>>>>> registered in England and Wales, company number >> > > > 04008957) >> > > > > > >> and IG >> > > > > > >> >> >>>> Index >> > > > > > >> >> >>>>>>>>> Limited (a company registered in England and >> Wales, >> > > > > company >> > > > > > >> >> >>> number >> > > > > > >> >> >>>>>>>>> 01190902). Registered address at Cannon Bridge >> > House, >> > > > 25 >> > > > > > >> Dowgate >> > > > > > >> >> >>>> Hill, >> > > > > > >> >> >>>>>>>>> London EC4R 2YA. Both IG Markets Limited >> (register >> > > > number >> > > > > > >> 195355) >> > > > > > >> >> >>>> and >> > > > > > >> >> >>>>>> IG >> > > > > > >> >> >>>>>>>>> Index Limited (register number 114059) are >> > authorised >> > > > and >> > > > > > >> >> >>>> regulated by >> > > > > > >> >> >>>>>>>> the >> > > > > > >> >> >>>>>>>>> Financial Conduct Authority. >> > > > > > >> >> >>>>>>>>> >> > > > > > >> >> >>>>>>>> The information contained in this email is >> strictly >> > > > > > >> confidential >> > > > > > >> >> >>> and >> > > > > > >> >> >>>> for >> > > > > > >> >> >>>>>>>> the use of the addressee only, unless otherwise >> > > > indicated. >> > > > > > If >> > > > > > >> you >> > > > > > >> >> >>> are >> > > > > > >> >> >>>>>> not >> > > > > > >> >> >>>>>>>> the intended recipient, please do not read, copy, >> > use >> > > or >> > > > > > >> disclose >> > > > > > >> >> >>> to >> > > > > > >> >> >>>>>> others >> > > > > > >> >> >>>>>>>> this message or any attachment. Please also notify >> > the >> > > > > > sender >> > > > > > >> by >> > > > > > >> >> >>>>>> replying >> > > > > > >> >> >>>>>>>> to this email or by telephone (+44(020 7896 0011) >> > and >> > > > then >> > > > > > >> delete >> > > > > > >> >> >>> the >> > > > > > >> >> >>>>>> email >> > > > > > >> >> >>>>>>>> and any copies of it. Opinions, conclusion (etc) >> > that >> > > do >> > > > > not >> > > > > > >> >> relate >> > > > > > >> >> >>>> to >> > > > > > >> >> >>>>>> the >> > > > > > >> >> >>>>>>>> official business of this company shall be >> > understood >> > > as >> > > > > > >> neither >> > > > > > >> >> >>>> given >> > > > > > >> >> >>>>>> nor >> > > > > > >> >> >>>>>>>> endorsed by it. IG is a trading name of IG Markets >> > > > Limited >> > > > > > (a >> > > > > > >> >> >>> company >> > > > > > >> >> >>>>>>>> registered in England and Wales, company number >> > > > 04008957) >> > > > > > and >> > > > > > >> IG >> > > > > > >> >> >>>> Index >> > > > > > >> >> >>>>>>>> Limited (a company registered in England and >> Wales, >> > > > > company >> > > > > > >> number >> > > > > > >> >> >>>>>>>> 01190902). Registered address at Cannon Bridge >> > House, >> > > 25 >> > > > > > >> Dowgate >> > > > > > >> >> >>>> Hill, >> > > > > > >> >> >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register >> > > > number >> > > > > > >> 195355) >> > > > > > >> >> >>>> and IG >> > > > > > >> >> >>>>>>>> Index Limited (register number 114059) are >> > authorised >> > > > and >> > > > > > >> >> regulated >> > > > > > >> >> >>>> by >> > > > > > >> >> >>>>>> the >> > > > > > >> >> >>>>>>>> Financial Conduct Authority. >> > > > > > >> >> >>>>>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>>>> -- >> > > > > > >> >> >>>>>> Gwen Shapira >> > > > > > >> >> >>>>>> Product Manager | Confluent >> > > > > > >> >> >>>>>> 650.450.2760 | @gwenshap >> > > > > > >> >> >>>>>> Follow us: Twitter | blog >> > > > > > >> >> >>>>>> >> > > > > > >> >> >>>> >> > > > > > >> >> >>>> >> > > > > > >> >> >>>> >> > > > > > >> >> >>>> -- >> > > > > > >> >> >>>> Gwen Shapira >> > > > > > >> >> >>>> Product Manager | Confluent >> > > > > > >> >> >>>> 650.450.2760 | @gwenshap >> > > > > > >> >> >>>> Follow us: Twitter | blog >> > > > > > >> >> >>>> >> > > > > > >> >> >>> >> > > > > > >> >> >>> >> > > > > > >> >> >>> >> > > > > > >> >> >>> -- >> > > > > > >> >> >>> Nacho (Ignacio) Solis >> > > > > > >> >> >>> Kafka >> > > > > > >> >> >>> nso...@linkedin.com >> > > > > > >> >> >>> >> > > > > > >> >> > >> > > > > > >> >> > >> > > > > > >> >> > >> > > > > > >> >> > -- >> > > > > > >> >> > Gwen Shapira >> > > > > > >> >> > Product Manager | Confluent >> > > > > > >> >> > 650.450.2760 | @gwenshap >> > > > > > >> >> > Follow us: Twitter | blog >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> -- >> > > > > > >> Gwen Shapira >> > > > > > >> Product Manager | Confluent >> > > > > > >> 650.450.2760 | @gwenshap >> > > > > > >> Follow us: Twitter | blog >> > > > > > >> >> > > > > > > >> > > > > > > >> > > > > > The information contained in this email is strictly confidential >> > and >> > > > for >> > > > > > the use of the addressee only, unless otherwise indicated. If you >> > are >> > > > not >> > > > > > the intended recipient, please do not read, copy, use or disclose >> > to >> > > > > others >> > > > > > this message or any attachment. Please also notify the sender by >> > > > replying >> > > > > > to this email or by telephone (+44(020 7896 0011) and then delete >> > the >> > > > > email >> > > > > > and any copies of it. Opinions, conclusion (etc) that do not >> relate >> > > to >> > > > > the >> > > > > > official business of this company shall be understood as neither >> > > given >> > > > > nor >> > > > > > endorsed by it. IG is a trading name of IG Markets Limited (a >> > company >> > > > > > registered in England and Wales, company number 04008957) and IG >> > > Index >> > > > > > Limited (a company registered in England and Wales, company >> number >> > > > > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate >> > > Hill, >> > > > > > London EC4R 2YA. Both IG Markets Limited (register number 195355) >> > and >> > > > IG >> > > > > > Index Limited (register number 114059) are authorised and >> regulated >> > > by >> > > > > the >> > > > > > Financial Conduct Authority. >> > > > > > >> > > > > >> > > > >> > > >> > >> The information contained in this email is strictly confidential and for >> the use of the addressee only, unless otherwise indicated. If you are not >> the intended recipient, please do not read, copy, use or disclose to others >> this message or any attachment. Please also notify the sender by replying >> to this email or by telephone (+44(020 7896 0011) and then delete the email >> and any copies of it. Opinions, conclusion (etc) that do not relate to the >> official business of this company shall be understood as neither given nor >> endorsed by it. IG is a trading name of IG Markets Limited (a company >> registered in England and Wales, company number 04008957) and IG Index >> Limited (a company registered in England and Wales, company number >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >> Index Limited (register number 114059) are authorised and regulated by the >> Financial Conduct Authority. > > -- > Nacho - Ignacio Solis - iso...@igso.net