Re: [DISCUSS] KIP-82 - Add Record Headers

K Burstev Fri, 18 Nov 2016 01:13:02 -0800

For what it is worth also i agree. As a user:

 1) Yes - Headers are worthwhile
 2) Yes - Headers should be a top level option


14.11.2016, 21:15, "Ignacio Solis" <[email protected]>:
> 1) Yes - Headers are worthwhile
> 2) Yes - Headers should be a top level option
>
> On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <[email protected]>
> wrote:
>
>>  Hi Roger,
>>
>>  The kip details/examples the original proposal for key spacing , not the
>>  new mentioned as per discussion namespace idea.
>>
>>  We will need to update the kip, when we get agreement this is a better
>>  approach (which seems to be the case if I have understood the general
>>  feeling in the conversation)
>>
>>  Re the variable ints, at very early stage we did think about this. I think
>>  the added complexity for the saving isn't worth it. I'd rather go with, if
>>  we want to reduce overheads and size int16 (2bytes) keys as it keeps it
>>  simple.
>>
>>  On the note of no headers, there is as per the kip as we use an attribute
>>  bit to denote if headers are present or not as such provides a zero
>>  overhead currently if headers are not used.
>>
>>  I think as radai mentions would be good first if we can get clarity if do
>>  we now have general consensus that (1) headers are worthwhile and useful,
>>  and (2) we want it as a top level entity.
>>
>>  Just to state the obvious i believe (1) headers are worthwhile and (2)
>>  agree as a top level entity.
>>
>>  Cheers
>>  Mike
>>  ________________________________________
>>  From: Roger Hoover <[email protected]>
>>  Sent: Wednesday, November 9, 2016 9:10:47 PM
>>  To: [email protected]
>>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>
>>  Sorry for going a little in the weeds but thanks for the replies regarding
>>  varint.
>>
>>  Agreed that a prefix and {int, int} can be the same. It doesn't look like
>>  that's what the KIP is saying the "Open" section. The example shows
>>  2100001
>>  for New Relic and 210002 for App Dynamics implying that the New Relic
>>  organization will have only a single header id to work with. Or is 2100001
>>  a prefix? The main point of a namespace or prefix is to reduce the
>>  overhead of config mapping or registration depending on how
>>  namespaces/prefixes are managed.
>>
>>  Would love to hear more feedback on the higher-level questions though...
>>
>>  Cheers,
>>
>>  Roger
>>
>>  On Wed, Nov 9, 2016 at 11:38 AM, radai <[email protected]> wrote:
>>
>>  > I think this discussion is getting a bit into the weeds on technical
>>  > implementation details.
>>  > I'd liek to step back a minute and try and establish where we are in the
>>  > larger picture:
>>  >
>>  > (re-wording nacho's last paragraph)
>>  > 1. are we all in agreement that headers are a worthwhile and useful
>>  > addition to have? this was contested early on
>>  > 2. are we all in agreement on headers as top level entity vs headers
>>  > squirreled-away in V?
>>  >
>>  > if there are still concerns around these #2 points (#jay? #jun?)?
>>  >
>>  > (and now back to our normal programming ...)
>>  >
>>  > varints are nice. having said that, its adding complexity (see
>>  > https://github.com/addthis/stream-lib/blob/master/src/
>>  > main/java/com/clearspring/analytics/util/Varint.java
>>  > as 1st google result) and would require anyone writing other clients (C?
>>  > Python? Go? Bash? ;-) ) to get/implement the same, and for relatively
>>  > little gain (int vs string is order of magnitude, this isnt).
>>  >
>>  > int namespacing vs {int, int} namespacing are basically the same thing -
>>  > youre just namespacing an int64 and giving people while 2^32 ranges at a
>>  > time. the part i like about this is letting people have a large swath of
>>  > numbers with one registration so they dont have to come back for every
>>  > single plugin/header they want to "reserve".
>>  >
>>  >
>>  > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <[email protected]>
>>  > wrote:
>>  >
>>  > > Since some of the debate has been about overhead + performance, I'm
>>  > > wondering if we have considered a varint encoding (
>>  > > https://developers.google.com/protocol-buffers/docs/encoding#varints)
>>  > for
>>  > > the header length field (int32 in the proposal) and for header ids? If
>>  > you
>>  > > don't use headers, the overhead would be a single byte and for each
>>  > header
>>  > > id < 128 would also need only a single byte?
>>  > >
>>  > >
>>  > >
>>  > > On Wed, Nov 9, 2016 at 6:43 AM, radai <[email protected]>
>>  > wrote:
>>  > >
>>  > > > @magnus - and very dangerous (youre essentially downloading and
>>  > executing
>>  > > > arbitrary code off the internet on your servers ... bad idea without
>>  a
>>  > > > sandbox, even with)
>>  > > >
>>  > > > as for it being a purely administrative task - i disagree.
>>  > > >
>>  > > > i wish it would, really, because then my earlier point on the
>>  > complexity
>>  > > of
>>  > > > the remapping process would be invalid, but at linkedin, for example,
>>  > we
>>  > > > (the team im in) run kafka as a service. we dont really know what our
>>  > > users
>>  > > > (developing applications that use kafka) are up to at any given
>>  moment.
>>  > > it
>>  > > > is very possible (given the existance of headers and a corresponding
>>  > > plugin
>>  > > > ecosystem) for some application to "equip" their producers and
>>  > consumers
>>  > > > with the required plugin without us knowing. i dont mean to imply
>>  thats
>>  > > > bad, i just want to make the point that its not as simple keeping it
>>  in
>>  > > > sync across a large-enough organization.
>>  > > >
>>  > > >
>>  > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <[email protected]>
>>  > > > wrote:
>>  > > >
>>  > > > > I think there is a piece missing in the Strings discussion, where
>>  > > > > pro-Stringers
>>  > > > > reason that by providing unique string identifiers for each header
>>  > > > > everything will just
>>  > > > > magically work for all parts of the stream pipeline.
>>  > > > >
>>  > > > > But the strings dont mean anything by themselves, and while we
>>  could
>>  > > > > probably envision
>>  > > > > some auto plugin loader that downloads, compiles, links and runs
>>  > > plugins
>>  > > > > on-demand
>>  > > > > as soon as they're seen by a consumer, I dont really see a use-case
>>  > for
>>  > > > > something
>>  > > > > so dynamic (and fragile) in practice.
>>  > > > >
>>  > > > > In the real world an application will be configured with a set of
>>  > > plugins
>>  > > > > to either add (producer)
>>  > > > > or read (consumer) headers.
>>  > > > > This is an administrative task based on what features a client
>>  > > > > needs/provides and results in
>>  > > > > some sort of configuration to enable and configure the desired
>>  > plugins.
>>  > > > >
>>  > > > > Since this needs to be kept somewhat in sync across an organisation
>>  > > > (there
>>  > > > > is no point in having producers
>>  > > > > add headers no consumers will read, and vice versa), the added
>>  > > complexity
>>  > > > > of assigning an id namespace
>>  > > > > for each plugin as it is being configured should be tolerable.
>>  > > > >
>>  > > > >
>>  > > > > /Magnus
>>  > > > >
>>  > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <[email protected]>:
>>  > > > >
>>  > > > > > Just following/catching up on what seems to be an active night :)
>>  > > > > >
>>  > > > > > @Radai sorry if it may seem obvious but what does MD stand for?
>>  > > > > >
>>  > > > > > My take on String vs Int:
>>  > > > > >
>>  > > > > > I will state first I am pro Int (16 or 32).
>>  > > > > >
>>  > > > > > I do though playing devils advocate see a big plus with the
>>  > argument
>>  > > of
>>  > > > > > String keys, this is around integrating into an existing
>>  > eco-system.
>>  > > > > >
>>  > > > > > As many other systems use String based headers (Flume, JMS) it
>>  > makes
>>  > > > it
>>  > > > > > much easier for these to be incorporated/integrated into.
>>  > > > > >
>>  > > > > > How with Int based headers could we provide a way/guidence to
>>  make
>>  > > this
>>  > > > > > integration simple / easy with transition flows over to kafka?
>>  > > > > >
>>  > > > > > * tough luck buddy you're on your own
>>  > > > > > * simply hash the string into int code and hope for no collisions
>>  > > (how
>>  > > > to
>>  > > > > > convert back though?)
>>  > > > > > * http2 style as mentioned by nacho.
>>  > > > > >
>>  > > > > > cheers,
>>  > > > > > Mike
>>  > > > > >
>>  > > > > >
>>  > > > > > ________________________________________
>>  > > > > > From: radai <[email protected]>
>>  > > > > > Sent: Wednesday, November 9, 2016 8:12 AM
>>  > > > > > To: [email protected]
>>  > > > > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>  > > > > >
>>  > > > > > thinking about it some more, the best way to transmit the header
>>  > > > > remapping
>>  > > > > > data to consumers would be to put it in the MD response payload,
>>  so
>>  > > > maybe
>>  > > > > > it should be discussed now.
>>  > > > > >
>>  > > > > >
>>  > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai <
>>  [email protected]
>>  > >
>>  > > > > wrote:
>>  > > > > >
>>  > > > > > > im not opposed to the idea of namespace mapping. all im saying
>>  is
>>  > > > that
>>  > > > > > its
>>  > > > > > > not part of the "mvp" and, since it requires no wire format
>>  > change,
>>  > > > can
>>  > > > > > > always be added later.
>>  > > > > > > also, its not as simple as just configuring MM to do the
>>  > transform:
>>  > > > > lets
>>  > > > > > > say i've implemented large message support as {666,1} and on
>>  some
>>  > > > > mirror
>>  > > > > > > target cluster its been remapped to {999,1}. the consumer
>>  plugin
>>  > > code
>>  > > > > > would
>>  > > > > > > also need to be told to look for the large message "part X of
>>  Y"
>>  > > > header
>>  > > > > > > under {999,1}. doable, but tricky.
>>  > > > > > >
>>  > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen Shapira <
>>  [email protected]
>>  > >
>>  > > > > wrote:
>>  > > > > > >
>>  > > > > > >> While you can do whatever you want with a namespace and your
>>  > code,
>>  > > > > > >> what I'd expect is for each app to namespaces configurable...
>>  > > > > > >>
>>  > > > > > >> So if I accidentally used 666 for my HR department, and still
>>  > want
>>  > > > to
>>  > > > > > >> run RadaiApp, I can config "namespace=42" for RadaiApp and
>>  > > > everything
>>  > > > > > >> will look normal.
>>  > > > > > >>
>>  > > > > > >> This means you only need to sync usage inside your own
>>  > > organization.
>>  > > > > > >> Still hard, but somewhat easier than syncing with the entire
>>  > > world.
>>  > > > > > >>
>>  > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, radai <
>>  > > [email protected]>
>>  > > > > > >> wrote:
>>  > > > > > >> > and we can start with {namespace, id} and no re-mapping
>>  > support
>>  > > > and
>>  > > > > > >> always
>>  > > > > > >> > add it later on if/when collisions actually happen (i dont
>>  > think
>>  > > > > > they'd
>>  > > > > > >> be
>>  > > > > > >> > a problem).
>>  > > > > > >> >
>>  > > > > > >> > every interested party (so orgs or individuals) could then
>>  > > > register
>>  > > > > a
>>  > > > > > >> > prefix (0 = reserved, 1 = confluent ... 666 = me :-) ) and
>>  do
>>  > > > > whatever
>>  > > > > > >> with
>>  > > > > > >> > the 2nd ID - so once linkedin registers, say 3, then
>>  linkedin
>>  > > devs
>>  > > > > are
>>  > > > > > >> free
>>  > > > > > >> > to use {3, *} with a reasonable expectation to to collide
>>  with
>>  > > > > > anything
>>  > > > > > >> > else. further partitioning of that * becomes linkedin's
>>  > problem,
>>  > > > but
>>  > > > > > the
>>  > > > > > >> > "upstream registration" of a namespace only has to happen
>>  > once.
>>  > > > > > >> >
>>  > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, James Cheng <
>>  > > [email protected]
>>  > > > >
>>  > > > > > >> wrote:
>>  > > > > > >> >
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen Shapira <
>>  > [email protected]>
>>  > > > > > wrote:
>>  > > > > > >> >> >
>>  > > > > > >> >> > Thank you so much for this clear and fair summary of the
>>  > > > > arguments.
>>  > > > > > >> >> >
>>  > > > > > >> >> > I'm in favor of ints. Not a deal-breaker, but in favor.
>>  > > > > > >> >> >
>>  > > > > > >> >> > Even more in favor of Magnus's decentralized suggestion
>>  > with
>>  > > > > > Roger's
>>  > > > > > >> >> > tweak: add a namespace for headers. This will allow each
>>  > app
>>  > > to
>>  > > > > > just
>>  > > > > > >> >> > use whatever IDs it wants internally, and then let the
>>  > admin
>>  > > > > > >> deploying
>>  > > > > > >> >> > the app figure out an available namespace ID for the app
>>  to
>>  > > > live
>>  > > > > > in.
>>  > > > > > >> >> > So io.confluent.schema-registry can be namespace 0x01 on
>>  my
>>  > > > > > >> deployment
>>  > > > > > >> >> > and 0x57 on yours, and the poor guys developing the app
>>  > don't
>>  > > > > need
>>  > > > > > to
>>  > > > > > >> >> > worry about that.
>>  > > > > > >> >> >
>>  > > > > > >> >>
>>  > > > > > >> >> Gwen, if I understand your example right, an application
>>  > > deployer
>>  > > > > > might
>>  > > > > > >> >> decide to use 0x01 in one deployment, and that means that
>>  > once
>>  > > > the
>>  > > > > > >> message
>>  > > > > > >> >> is written into the broker, it will be saved on the broker
>>  > with
>>  > > > > that
>>  > > > > > >> >> specific namespace (0x01).
>>  > > > > > >> >>
>>  > > > > > >> >> If you were to mirror that message into another cluster,
>>  the
>>  > > 0x01
>>  > > > > > would
>>  > > > > > >> >> accompany the message, right? What if the deployers of the
>>  > same
>>  > > > app
>>  > > > > > in
>>  > > > > > >> the
>>  > > > > > >> >> other cluster uses 0x57? They won't understand each other?
>>  > > > > > >> >>
>>  > > > > > >> >> I'm not sure that's an avoidable problem. I think it simply
>>  > > means
>>  > > > > > that
>>  > > > > > >> in
>>  > > > > > >> >> order to share data, you have to also have a shared (agreed
>>  > > upon)
>>  > > > > > >> >> understanding of what the namespaces mean. Which I think
>>  > makes
>>  > > > > sense,
>>  > > > > > >> >> because the alternate (sharing *nothing* at all) would mean
>>  > > that
>>  > > > > > there
>>  > > > > > >> >> would be no way to understand each other.
>>  > > > > > >> >>
>>  > > > > > >> >> -James
>>  > > > > > >> >>
>>  > > > > > >> >> > Gwen
>>  > > > > > >> >> >
>>  > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM, radai <
>>  > > > > [email protected]>
>>  > > > > > >> >> wrote:
>>  > > > > > >> >> >> +1 for sean's document. it covers pretty much all the
>>  > > > trade-offs
>>  > > > > > and
>>  > > > > > >> >> >> provides concrete figures to argue about :-)
>>  > > > > > >> >> >> (nit-picking - used the same xkcd twice, also trove has
>>  > been
>>  > > > > > >> superceded
>>  > > > > > >> >> for
>>  > > > > > >> >> >> purposes of high performance collections: look at
>>  > > > > > >> >> >> https://github.com/leventov/Koloboke)
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> so to sum up the string vs int debate:
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> performance - you can do 140k ops/sec _per thread_ with
>>  > > string
>>  > > > > > >> headers.
>>  > > > > > >> >> you
>>  > > > > > >> >> >> could do x2-3 better with ints. there's no arguing the
>>  > > > relative
>>  > > > > > diff
>>  > > > > > >> >> >> between the two, there's only the question of whether or
>>  > not
>>  > > > > _the
>>  > > > > > >> rest
>>  > > > > > >> >> of
>>  > > > > > >> >> >> kafka_ operates fast enough to care. if we want to make
>>  > > > choices
>>  > > > > > >> solely
>>  > > > > > >> >> >> based on performance we need ints. if we are willing to
>>  > > > > > >> >> settle/compromise
>>  > > > > > >> >> >> for a nicer (to some) API than strings are good enough
>>  for
>>  > > the
>>  > > > > > >> current
>>  > > > > > >> >> >> state of affairs.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> message size - with batching and compression it comes
>>  down
>>  > > to
>>  > > > a
>>  > > > > > ~5%
>>  > > > > > >> >> >> difference (internal testing, not in the doc. maybe
>>  would
>>  > > help
>>  > > > > > >> adding if
>>  > > > > > >> >> >> this becomes a point of contention?). this means it wont
>>  > > > really
>>  > > > > > >> affect
>>  > > > > > >> >> >> kafka in "throughput mode" (large, compressed batches).
>>  in
>>  > > > "low
>>  > > > > > >> latency"
>>  > > > > > >> >> >> mode (meaning less/no batching and compression) the
>>  > > difference
>>  > > > > can
>>  > > > > > >> be
>>  > > > > > >> >> >> extreme (it'll easily be an order of magnitude with
>>  small
>>  > > > > payloads
>>  > > > > > >> like
>>  > > > > > >> >> >> stock ticks and header keys of the form
>>  > > > > > >> >> >> "com.acme.infraTeam.kafka.hiMom.auditPlugin"). we have
>>  a
>>  > > few
>>  > > > > such
>>  > > > > > >> >> topics at
>>  > > > > > >> >> >> linkedin where actual payloads are ~2 ints and are
>>  > eclipsed
>>  > > by
>>  > > > > our
>>  > > > > > >> >> in-house
>>  > > > > > >> >> >> audit "header" which is why we liked ints to begin with.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> "ease of use" - strings would probably still require
>>  > _some_
>>  > > > > degree
>>  > > > > > >> of
>>  > > > > > >> >> >> partitioning by convention (imagine if everyone used the
>>  > key
>>  > > > > > >> "infra"...)
>>  > > > > > >> >> >> but its very intuitive for java devs to do anyway
>>  > > > > (reverse-domain
>>  > > > > > is
>>  > > > > > >> >> >> ingrained into java developers at a young age :-) ).
>>  also
>>  > > most
>>  > > > > > java
>>  > > > > > >> devs
>>  > > > > > >> >> >> find Map<String, whatever> more intuitive than
>>  > Map<Integer,
>>  > > > > > >> whatever> -
>>  > > > > > >> >> >> probably because of other text-based protocols like
>>  http.
>>  > > ints
>>  > > > > > would
>>  > > > > > >> >> >> require a number registry. if you think number
>>  registries
>>  > > are
>>  > > > > hard
>>  > > > > > >> just
>>  > > > > > >> >> >> look at the wiki page for KIPs (specifically the number
>>  > for
>>  > > > next
>>  > > > > > >> >> available
>>  > > > > > >> >> >> KIP) and think again - we are probably talking about the
>>  > > same
>>  > > > > > >> volume of
>>  > > > > > >> >> >> requests. also this would only be "required" (good
>>  > > > citizenship,
>>  > > > > > more
>>  > > > > > >> >> like)
>>  > > > > > >> >> >> if you want to publish your plugin for others to use.
>>  > within
>>  > > > > your
>>  > > > > > >> org do
>>  > > > > > >> >> >> whatever you want - just know that if you use [some
>>  > > "reserved"
>>  > > > > > >> range]
>>  > > > > > >> >> and a
>>  > > > > > >> >> >> future kafka update breaks it its your problem. RTFM.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> personally im in favor of ints.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> having said that (and like nacho) I will settle if int
>>  vs
>>  > > > string
>>  > > > > > >> remains
>>  > > > > > >> >> >> the only obstacle to this.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> On Tue, Nov 8, 2016 at 3:53 PM, Nacho Solis
>>  > > > > > >> <[email protected]
>>  > > > > > >> >> >
>>  > > > > > >> >> >> wrote:
>>  > > > > > >> >> >>
>>  > > > > > >> >> >>> I think it's well known I've been pushing for ints
>>  (and I
>>  > > > could
>>  > > > > > >> switch
>>  > > > > > >> >> to
>>  > > > > > >> >> >>> 16 bit shorts if pressed).
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> - efficient (space)
>>  > > > > > >> >> >>> - efficient (processing)
>>  > > > > > >> >> >>> - easily partitionable
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> However, if the only thing that is keeping us from
>>  > adopting
>>  > > > > > >> headers is
>>  > > > > > >> >> the
>>  > > > > > >> >> >>> use of strings vs ints as keys, then I would cave in
>>  and
>>  > > > accept
>>  > > > > > >> >> strings. If
>>  > > > > > >> >> >>> we do so, I would like to limit string keys to 128
>>  bytes
>>  > in
>>  > > > > > length.
>>  > > > > > >> >> This
>>  > > > > > >> >> >>> way 1) I could use a 3 letter string if I wanted
>>  > > (effectively
>>  > > > > > >> using 4
>>  > > > > > >> >> total
>>  > > > > > >> >> >>> bytes), 2) limit overall impact of possible keys (don't
>>  > > > really
>>  > > > > > want
>>  > > > > > >> >> people
>>  > > > > > >> >> >>> to send a 16K header string key).
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> Nacho
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> On Tue, Nov 8, 2016 at 3:35 PM, Gwen Shapira <
>>  > > > > [email protected]>
>>  > > > > > >> >> wrote:
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>> Forgot to mention: Thank you for quantifying the
>>  > > trade-off -
>>  > > > > it
>>  > > > > > is
>>  > > > > > >> >> >>>> helpful and important regardless of what we end up
>>  > > deciding.
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>> On Tue, Nov 8, 2016 at 3:12 PM, Sean McCauliff
>>  > > > > > >> >> >>>> <[email protected]> wrote:
>>  > > > > > >> >> >>>>> On Tue, Nov 8, 2016 at 2:15 PM, Gwen Shapira <
>>  > > > > > [email protected]>
>>  > > > > > >> >> >>> wrote:
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>> Since Kafka specifically targets high-throughput,
>>  > > > > low-latency
>>  > > > > > >> >> >>>>>> use-cases, I don't think we should trade them off
>>  that
>>  > > > > easily.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> I find these kind of design goals not to be really
>>  > > helpful
>>  > > > > > unless
>>  > > > > > >> >> it's
>>  > > > > > >> >> >>>>> quantified in someway. Because it's always possible
>>  to
>>  > > > argue
>>  > > > > > >> against
>>  > > > > > >> >> >>>>> something as either being not performant or just an
>>  > > > > > >> implementation
>>  > > > > > >> >> >>>> detail.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> This is a single threaded benchmarks so all the
>>  > > > measurements
>>  > > > > > are
>>  > > > > > >> per
>>  > > > > > >> >> >>>>> thread.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> For 1M messages/s/thread if header keys are int and
>>  > you
>>  > > > had
>>  > > > > > >> even a
>>  > > > > > >> >> >>>> single
>>  > > > > > >> >> >>>>> header key, value pair then it's still about 2^-2
>>  > > > > microseconds
>>  > > > > > >> which
>>  > > > > > >> >> >>>> means
>>  > > > > > >> >> >>>>> you only have another 0.75 microseconds to do
>>  > everything
>>  > > > else
>>  > > > > > you
>>  > > > > > >> >> want
>>  > > > > > >> >> >>> to
>>  > > > > > >> >> >>>>> do with a message (1M messages/s means 1 micro second
>>  > per
>>  > > > > > >> message).
>>  > > > > > >> >> >>> With
>>  > > > > > >> >> >>>>> string header keys there is still 0.5 micro seconds
>>  to
>>  > > > > process
>>  > > > > > a
>>  > > > > > >> >> >>> message.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> I love strings as much as the next guy (we had them
>>  in
>>  > > > > Flume),
>>  > > > > > >> but I
>>  > > > > > >> >> >>>>>> was convinced by Magnus/Michael/Radai that strings
>>  > don't
>>  > > > > > >> actually
>>  > > > > > >> >> have
>>  > > > > > >> >> >>>>>> strong benefits as opposed to ints (you'll need a
>>  > string
>>  > > > > > >> registry
>>  > > > > > >> >> >>>>>> anyway - otherwise, how will you know what does the
>>  > > > > > "profile_id"
>>  > > > > > >> >> >>>>>> header refers to?) and I want to keep closer to our
>>  > > > original
>>  > > > > > >> design
>>  > > > > > >> >> >>>>>> goals for Kafka.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> "confluent.profile_id"
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> If someone likes strings in the headers and doesn't
>>  do
>>  > > > > > millions
>>  > > > > > >> of
>>  > > > > > >> >> >>>>>> messages a sec, they probably have lots of other
>>  > systems
>>  > > > > they
>>  > > > > > >> can
>>  > > > > > >> >> use
>>  > > > > > >> >> >>>>>> instead.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> None of them will scale like Kafka. Horizontal
>>  scaling
>>  > > is
>>  > > > > > still
>>  > > > > > >> >> good.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> On Tue, Nov 8, 2016 at 1:22 PM, Sean McCauliff
>>  > > > > > >> >> >>>>>> <[email protected]> wrote:
>>  > > > > > >> >> >>>>>>> +1 for String keys.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> I've been doing some bechmarking and it seems like
>>  > the
>>  > > > > > speedup
>>  > > > > > >> for
>>  > > > > > >> >> >>>> using
>>  > > > > > >> >> >>>>>>> integer keys is about 2-5 depending on the length
>>  of
>>  > > the
>>  > > > > > >> strings
>>  > > > > > >> >> and
>>  > > > > > >> >> >>>> what
>>  > > > > > >> >> >>>>>>> collections are being used. The overall amount of
>>  > time
>>  > > > > spent
>>  > > > > > >> >> >>> parsing
>>  > > > > > >> >> >>>> a
>>  > > > > > >> >> >>>>>> set
>>  > > > > > >> >> >>>>>>> of header key, value pairs probably does not matter
>>  > > > unless
>>  > > > > > you
>>  > > > > > >> are
>>  > > > > > >> >> >>>>>> getting
>>  > > > > > >> >> >>>>>>> close to 1M messages per consumer. In which case
>>  > > > probably
>>  > > > > > >> don't
>>  > > > > > >> >> use
>>  > > > > > >> >> >>>>>>> headers. There is also the option to use very
>>  short
>>  > > > > strings;
>>  > > > > > >> some
>>  > > > > > >> >> >>>> that
>>  > > > > > >> >> >>>>>> are
>>  > > > > > >> >> >>>>>>> even shorter than integers.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> Partitioning the string key space will be easier
>>  than
>>  > > > > > >> partitioning
>>  > > > > > >> >> >>> an
>>  > > > > > >> >> >>>>>>> integer key space. We won't need a global registry.
>>  > > > Kafka
>>  > > > > > >> >> >>> internally
>>  > > > > > >> >> >>>> can
>>  > > > > > >> >> >>>>>>> reserve some prefix like "_" as its namespace.
>>  > > Everyone
>>  > > > > else
>>  > > > > > >> can
>>  > > > > > >> >> >>> use
>>  > > > > > >> >> >>>>>> their
>>  > > > > > >> >> >>>>>>> company or project name as namespace prefix and
>>  life
>>  > > > should
>>  > > > > > be
>>  > > > > > >> >> good.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> Here's the link to some of the benchmarking info:
>>  > > > > > >> >> >>>>>>> https://docs.google.com/document/d/1tfT-
>>  > > > > > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL
>>  > > > > > >> >> >>>>>> dw6p65pAI/edit?usp=sharing
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> --
>>  > > > > > >> >> >>>>>>> Sean McCauliff
>>  > > > > > >> >> >>>>>>> Staff Software Engineer
>>  > > > > > >> >> >>>>>>> Kafka
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> [email protected]
>>  > > > > > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <
>>  > > > > > >> >> >>>> [email protected]>
>>  > > > > > >> >> >>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>> +1 on this slimmer version of our proposal
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I def think the Id space we can reduce from the
>>  > > proposed
>>  > > > > > >> >> >>>> int32(4bytes)
>>  > > > > > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as
>>  > headers
>>  > > > we
>>  > > > > > >> wouldn't
>>  > > > > > >> >> >>>>>> expect
>>  > > > > > >> >> >>>>>>>> the number of headers being used concurrently
>>  being
>>  > > that
>>  > > > > > high.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I would wonder if we should make the value byte
>>  > array
>>  > > > > length
>>  > > > > > >> still
>>  > > > > > >> >> >>>> int32
>>  > > > > > >> >> >>>>>>>> though as This is the standard Max array length in
>>  > > Java
>>  > > > > > saying
>>  > > > > > >> >> that
>>  > > > > > >> >> >>>> it
>>  > > > > > >> >> >>>>>> is a
>>  > > > > > >> >> >>>>>>>> header and I guess limiting the size is sensible
>>  and
>>  > > > would
>>  > > > > > >> work
>>  > > > > > >> >> for
>>  > > > > > >> >> >>>> all
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> use cases we have in mind so happy with limiting
>>  > this.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Do people generally concur on Magnus's slimmer
>>  > > version?
>>  > > > > > >> Anyone see
>>  > > > > > >> >> >>>> any
>>  > > > > > >> >> >>>>>>>> issues if we moved from int32 to int16?
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Re configurable ids per plugin over a global
>>  > registry
>>  > > > also
>>  > > > > > >> would
>>  > > > > > >> >> >>> work
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>> us. As such if this has better concensus over the
>>  > > > > proposed
>>  > > > > > >> global
>>  > > > > > >> >> >>>>>> registry
>>  > > > > > >> >> >>>>>>>> I'd be happy to change that.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I was already sold on ints over strings for keys
>>  ;)
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Cheers
>>  > > > > > >> >> >>>>>>>> Mike
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> ________________________________________
>>  > > > > > >> >> >>>>>>>> From: Magnus Edenhill <[email protected]>
>>  > > > > > >> >> >>>>>>>> Sent: Monday, November 7, 2016 10:10:21 PM
>>  > > > > > >> >> >>>>>>>> To: [email protected]
>>  > > > > > >> >> >>>>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Hi,
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I'm +1 for adding generic message headers, but I
>>  do
>>  > > > share
>>  > > > > > the
>>  > > > > > >> >> >>>> concerns
>>  > > > > > >> >> >>>>>>>> previously aired on this thread and during the KIP
>>  > > > > meeting.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> So let me propose a slimmer alternative that does
>>  > not
>>  > > > > > require
>>  > > > > > >> any
>>  > > > > > >> >> >>>> sort
>>  > > > > > >> >> >>>>>> of
>>  > > > > > >> >> >>>>>>>> global header registry, does not affect broker
>>  > > > performance
>>  > > > > > or
>>  > > > > > >> >> >>>>>> operations,
>>  > > > > > >> >> >>>>>>>> and adds as little overhead as possible.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Message
>>  > > > > > >> >> >>>>>>>> ------------
>>  > > > > > >> >> >>>>>>>> The protocol Message type is extended with a
>>  Headers
>>  > > > array
>>  > > > > > >> >> consting
>>  > > > > > >> >> >>>> of
>>  > > > > > >> >> >>>>>>>> Tags, where a Tag is defined as:
>>  > > > > > >> >> >>>>>>>> int16 Id
>>  > > > > > >> >> >>>>>>>> int16 Len // binary_data length
>>  > > > > > >> >> >>>>>>>> binary_data[Len] // opaque binary data
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Ids
>>  > > > > > >> >> >>>>>>>> ---
>>  > > > > > >> >> >>>>>>>> The Id space is not centrally managed, so whenever
>>  > an
>>  > > > > > >> application
>>  > > > > > >> >> >>>> needs
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>> add headers, or use an eco-system plugin that
>>  does,
>>  > > its
>>  > > > Id
>>  > > > > > >> >> >>> allocation
>>  > > > > > >> >> >>>>>> will
>>  > > > > > >> >> >>>>>>>> need to be manually configured.
>>  > > > > > >> >> >>>>>>>> This moves the allocation concern from the global
>>  > > space
>>  > > > > down
>>  > > > > > >> to
>>  > > > > > >> >> >>>>>>>> organization level and avoids the risk for id
>>  > > conflicts.
>>  > > > > > >> >> >>>>>>>> Example pseudo-config for some app:
>>  > > > > > >> >> >>>>>>>> sometrackerplugin.tag.sourcev3.id=1000
>>  > > > > > >> >> >>>>>>>> dbthing.tag.tablename.id=1001
>>  > > > > > >> >> >>>>>>>> myschemareg.tag.schemaname.id=1002
>>  > > > > > >> >> >>>>>>>> myschemareg.tag.schemaversion.id=1003
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Each header-writing or header-reading plugin must
>>  > > > provide
>>  > > > > > >> means
>>  > > > > > >> >> >>>>>> (typically
>>  > > > > > >> >> >>>>>>>> through configuration) to specify the tag for each
>>  > > > header
>>  > > > > it
>>  > > > > > >> uses.
>>  > > > > > >> >> >>>>>> Defaults
>>  > > > > > >> >> >>>>>>>> should be avoided.
>>  > > > > > >> >> >>>>>>>> A consumer silently ignores tags it does not have
>>  a
>>  > > > > mapping
>>  > > > > > >> for
>>  > > > > > >> >> >>>> (since
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> binary_data can't be parsed without knowing what
>>  it
>>  > > is).
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Id range 0..999 is reserved for future use by the
>>  > > broker
>>  > > > > and
>>  > > > > > >> must
>>  > > > > > >> >> >>>> not be
>>  > > > > > >> >> >>>>>>>> used by plugins.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Broker
>>  > > > > > >> >> >>>>>>>> ---------
>>  > > > > > >> >> >>>>>>>> The broker does not process the tags (other than
>>  the
>>  > > > > > standard
>>  > > > > > >> >> >>>> protocol
>>  > > > > > >> >> >>>>>>>> syntax verification), it simply stores and
>>  forwards
>>  > > them
>>  > > > > as
>>  > > > > > >> opaque
>>  > > > > > >> >> >>>> data.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Standard message translation (removal of Headers)
>>  > > kicks
>>  > > > in
>>  > > > > > for
>>  > > > > > >> >> >>> older
>>  > > > > > >> >> >>>>>>>> clients.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Why not string ids?
>>  > > > > > >> >> >>>>>>>> -------------------------
>>  > > > > > >> >> >>>>>>>> String ids might seem like a good idea, but:
>>  > > > > > >> >> >>>>>>>> * does not really solve uniqueness
>>  > > > > > >> >> >>>>>>>> * consumes a lot of space (2 byte string length +
>>  > > > string,
>>  > > > > > per
>>  > > > > > >> >> >>>> header)
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>> be meaningful
>>  > > > > > >> >> >>>>>>>> * doesn't really say anything how to parse the
>>  tag's
>>  > > > data,
>>  > > > > > so
>>  > > > > > >> it
>>  > > > > > >> >> >>> is
>>  > > > > > >> >> >>>> in
>>  > > > > > >> >> >>>>>>>> effect useless on its own.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Regards,
>>  > > > > > >> >> >>>>>>>> Magnus
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> 2016-11-07 18:32 GMT+01:00 Michael Pearce <
>>  > > > > > >> [email protected]
>>  > > > > > >> >> >:
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Hi Roger,
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Thanks for the support.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> I think the key thing is to have a common key
>>  space
>>  > > to
>>  > > > > make
>>  > > > > > >> an
>>  > > > > > >> >> >>>>>> ecosystem,
>>  > > > > > >> >> >>>>>>>>> there does have to be some level of contract for
>>  > > people
>>  > > > > to
>>  > > > > > >> play
>>  > > > > > >> >> >>>>>> nicely.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Having map<String, byte[]> or as per current
>>  > proposed
>>  > > > in
>>  > > > > > kip
>>  > > > > > >> of
>>  > > > > > >> >> >>>>>> having a
>>  > > > > > >> >> >>>>>>>>> numerical key space of map<int, byte[]> is a
>>  level
>>  > > of
>>  > > > > the
>>  > > > > > >> >> >>> contract
>>  > > > > > >> >> >>>>>> that
>>  > > > > > >> >> >>>>>>>>> most people would expect.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> I think the example in a previous comment someone
>>  > > else
>>  > > > > made
>>  > > > > > >> >> >>>> linking to
>>  > > > > > >> >> >>>>>>>> AWS
>>  > > > > > >> >> >>>>>>>>> blog and also implemented api where originally
>>  they
>>  > > > > didn't
>>  > > > > > >> have a
>>  > > > > > >> >> >>>>>> header
>>  > > > > > >> >> >>>>>>>>> space but not they do, where keys are uniform but
>>  > the
>>  > > > > value
>>  > > > > > >> can
>>  > > > > > >> >> >>> be
>>  > > > > > >> >> >>>>>>>> string,
>>  > > > > > >> >> >>>>>>>>> int, anything is a good example.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Having a custom MetadataSerializer is something
>>  we
>>  > > had
>>  > > > > > played
>>  > > > > > >> >> >>> with,
>>  > > > > > >> >> >>>>>> but
>>  > > > > > >> >> >>>>>>>>> discounted the idea, as if you wanted everyone to
>>  > > work
>>  > > > > the
>>  > > > > > >> same
>>  > > > > > >> >> >>>> way in
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> ecosystem, having to have this also customizable
>>  > > makes
>>  > > > > it a
>>  > > > > > >> bit
>>  > > > > > >> >> >>>>>> harder.
>>  > > > > > >> >> >>>>>>>>> Think about making the whole message record
>>  custom
>>  > > > > > >> serializable,
>>  > > > > > >> >> >>>> this
>>  > > > > > >> >> >>>>>>>> would
>>  > > > > > >> >> >>>>>>>>> make it fairly tricky (though it would not be
>>  > > > impossible)
>>  > > > > > to
>>  > > > > > >> have
>>  > > > > > >> >> >>>> made
>>  > > > > > >> >> >>>>>>>> work
>>  > > > > > >> >> >>>>>>>>> nicely. Having the value customizable we thought
>>  > is a
>>  > > > > > >> reasonable
>>  > > > > > >> >> >>>>>> tradeoff
>>  > > > > > >> >> >>>>>>>>> here of flexibility over contract of interaction
>>  > > > between
>>  > > > > > >> >> >>> different
>>  > > > > > >> >> >>>>>>>> parties.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Is there a particular case or benefit of having
>>  > > > > > serialization
>>  > > > > > >> >> >>>>>>>> customizable
>>  > > > > > >> >> >>>>>>>>> that you have in mind?
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Saying this it is obviously something that could
>>  be
>>  > > > > > >> implemented,
>>  > > > > > >> >> >>> if
>>  > > > > > >> >> >>>>>> there
>>  > > > > > >> >> >>>>>>>>> is a need. If we did go this avenue I think a
>>  > > defaulted
>>  > > > > > >> >> >>> serializer
>>  > > > > > >> >> >>>>>>>>> implementation should exist so for the 80:20
>>  rule,
>>  > > > people
>>  > > > > > can
>>  > > > > > >> >> >>> just
>>  > > > > > >> >> >>>>>> have
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> broker and clients get default behavior.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Cheers
>>  > > > > > >> >> >>>>>>>>> Mike
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> On 11/6/16, 5:25 PM, "radai" <
>>  > > > [email protected]
>>  > > > > >
>>  > > > > > >> wrote:
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> making header _key_ serialization configurable
>>  > > > > > potentially
>>  > > > > > >> >> >>>>>> undermines
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> board usefulness of the feature (any point
>>  along
>>  > > the
>>  > > > > > path
>>  > > > > > >> >> >>> must
>>  > > > > > >> >> >>>> be
>>  > > > > > >> >> >>>>>>>> able
>>  > > > > > >> >> >>>>>>>>> to
>>  > > > > > >> >> >>>>>>>>> read the header keys. the values may be
>>  whatever
>>  > > and
>>  > > > > > >> require
>>  > > > > > >> >> >>>> more
>>  > > > > > >> >> >>>>>>>>> intimate
>>  > > > > > >> >> >>>>>>>>> knowledge of the code that produced specific
>>  > > > headers,
>>  > > > > > but
>>  > > > > > >> >> >>> keys
>>  > > > > > >> >> >>>>>> should
>>  > > > > > >> >> >>>>>>>>> be
>>  > > > > > >> >> >>>>>>>>> universally readable).
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> it would also make it hard to write really
>>  > > portable
>>  > > > > > >> plugins -
>>  > > > > > >> >> >>>> say
>>  > > > > > >> >> >>>>>> i
>>  > > > > > >> >> >>>>>>>>> wrote a
>>  > > > > > >> >> >>>>>>>>> large message splitter/combiner - if i rely on
>>  > key
>>  > > > > > >> >> >>>> "largeMessage"
>>  > > > > > >> >> >>>>>> and
>>  > > > > > >> >> >>>>>>>>> values of the form "1/20" someone who uses
>>  > > > (contrived
>>  > > > > > >> >> >>> example)
>>  > > > > > >> >> >>>>>>>>> Map<Byte[],
>>  > > > > > >> >> >>>>>>>>> Double> wouldnt be able to re-use my code.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> not the end of a the world within an
>>  > organization,
>>  > > > but
>>  > > > > > >> >> >>>>>> problematic if
>>  > > > > > >> >> >>>>>>>>> you
>>  > > > > > >> >> >>>>>>>>> want to enable an ecosystem
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <
>>  > > > > > >> >> >>>>>> [email protected]
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> As others have laid out, I see strong reasons
>>  for
>>  > a
>>  > > > > common
>>  > > > > > >> >> >>>>>> message
>>  > > > > > >> >> >>>>>>>>>> metadata structure for the Kafka ecosystem. In
>>  > > > > > particular,
>>  > > > > > >> >> >>>> I've
>>  > > > > > >> >> >>>>>>>>> seen that
>>  > > > > > >> >> >>>>>>>>>> even within a single organization,
>>  infrastructure
>>  > > > teams
>>  > > > > > >> >> >>> often
>>  > > > > > >> >> >>>>>> own
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> message metadata while application teams own the
>>  > > > > > >> >> >>>>>> application-level
>>  > > > > > >> >> >>>>>>>>> data
>>  > > > > > >> >> >>>>>>>>>> format. Allowing metadata and content to have
>>  > > > different
>>  > > > > > >> >> >>>>>> structure
>>  > > > > > >> >> >>>>>>>>> and
>>  > > > > > >> >> >>>>>>>>>> evolve separately is very helpful for this.
>>  > Also, I
>>  > > > > think
>>  > > > > > >> >> >>>>>> there's
>>  > > > > > >> >> >>>>>>>> a
>>  > > > > > >> >> >>>>>>>>> lot of
>>  > > > > > >> >> >>>>>>>>>> value to having a common metadata structure
>>  shared
>>  > > > > across
>>  > > > > > >> >> >>> the
>>  > > > > > >> >> >>>>>> Kafka
>>  > > > > > >> >> >>>>>>>>>> ecosystem so that tools which leverage metadata
>>  > can
>>  > > > more
>>  > > > > > >> >> >>>> easily
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> shared
>>  > > > > > >> >> >>>>>>>>>> across organizations and integrated together.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> The question is, where does the metadata
>>  structure
>>  > > > > belong?
>>  > > > > > >> >> >>>>>> Here's
>>  > > > > > >> >> >>>>>>>>> my take:
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> We change the Kafka wire and on-disk format to
>>  > from
>>  > > a
>>  > > > > > (key,
>>  > > > > > >> >> >>>>>> value)
>>  > > > > > >> >> >>>>>>>>> model to
>>  > > > > > >> >> >>>>>>>>>> a (key, metadata, value) model where all three
>>  are
>>  > > > byte
>>  > > > > > >> >> >>>> arrays
>>  > > > > > >> >> >>>>>> from
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> brokers point of view. The primary reason for
>>  > this
>>  > > is
>>  > > > > > that
>>  > > > > > >> >> >>>> it
>>  > > > > > >> >> >>>>>>>>> provides a
>>  > > > > > >> >> >>>>>>>>>> backward compatible migration path forward.
>>  > > Producers
>>  > > > > can
>>  > > > > > >> >> >>>> start
>>  > > > > > >> >> >>>>>>>>> populating
>>  > > > > > >> >> >>>>>>>>>> metadata fields before all consumers understand
>>  > the
>>  > > > > > >> >> >>> metadata
>>  > > > > > >> >> >>>>>>>>> structure.
>>  > > > > > >> >> >>>>>>>>>> For people who already have custom envelope
>>  > > > structures,
>>  > > > > > >> >> >>> they
>>  > > > > > >> >> >>>> can
>>  > > > > > >> >> >>>>>>>>> populate
>>  > > > > > >> >> >>>>>>>>>> their existing structure and the new structure
>>  > for a
>>  > > > > while
>>  > > > > > >> >> >>> as
>>  > > > > > >> >> >>>>>> they
>>  > > > > > >> >> >>>>>>>>> make the
>>  > > > > > >> >> >>>>>>>>>> transition.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> We could stop there and let the clients plug in
>>  a
>>  > > > > > >> >> >>>> KeySerializer,
>>  > > > > > >> >> >>>>>>>>>> MetadataSerializer, and ValueSerializer but I
>>  > think
>>  > > it
>>  > > > > is
>>  > > > > > >> >> >>>> also
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> useful to
>>  > > > > > >> >> >>>>>>>>>> have a default MetadataSerializer that
>>  implements
>>  > a
>>  > > > > > >> >> >>> key-value
>>  > > > > > >> >> >>>>>> model
>>  > > > > > >> >> >>>>>>>>> similar
>>  > > > > > >> >> >>>>>>>>>> to AMQP or HTTP headers. Or we could go even
>>  > > further
>>  > > > > and
>>  > > > > > >> >> >>>>>>>> prescribe a
>>  > > > > > >> >> >>>>>>>>>> Map<String, byte[]> or Map<String, String> data
>>  > > model
>>  > > > > for
>>  > > > > > >> >> >>>>>> headers
>>  > > > > > >> >> >>>>>>>> in
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> clients (while still allowing custom
>>  serialization
>>  > > of
>>  > > > > the
>>  > > > > > >> >> >>>> header
>>  > > > > > >> >> >>>>>>>> data
>>  > > > > > >> >> >>>>>>>>>> model).
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> I think this would address Radai's concerns:
>>  > > > > > >> >> >>>>>>>>>> 1. All client code would not need to be updated
>>  to
>>  > > > know
>>  > > > > > >> >> >>> about
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>>>> container.
>>  > > > > > >> >> >>>>>>>>>> 2. Middleware friendly clients would have a
>>  > standard
>>  > > > > > header
>>  > > > > > >> >> >>>> data
>>  > > > > > >> >> >>>>>>>>> model to
>>  > > > > > >> >> >>>>>>>>>> work with.
>>  > > > > > >> >> >>>>>>>>>> 3. KIP is required both b/c of broker changes
>>  and
>>  > > > > because
>>  > > > > > >> >> >>> of
>>  > > > > > >> >> >>>>>> client
>>  > > > > > >> >> >>>>>>>>> API
>>  > > > > > >> >> >>>>>>>>>> changes.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> Cheers,
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> Roger
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> On Wed, Nov 2, 2016 at 4:38 PM, radai <
>>  > > > > > >> >> >>>>>> [email protected]>
>>  > > > > > >> >> >>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> my biggest issues with a "standard" wrapper
>>  > format:
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> 1. _ALL_ client _CODE_ (as opposed to kafka lib
>>  > > > > version)
>>  > > > > > >> >> >>>> must
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> updated
>>  > > > > > >> >> >>>>>>>>>> to
>>  > > > > > >> >> >>>>>>>>>>> know about the container, because any old naive
>>  > > code
>>  > > > > > >> >> >>>> trying to
>>  > > > > > >> >> >>>>>>>>> directly
>>  > > > > > >> >> >>>>>>>>>>> deserialize its own payload would keel over and
>>  > die
>>  > > > (it
>>  > > > > > >> >> >>>> needs
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>>> know to
>>  > > > > > >> >> >>>>>>>>>>> deserialize a container, and then dig in there
>>  > for
>>  > > > its
>>  > > > > > >> >> >>>>>> payload).
>>  > > > > > >> >> >>>>>>>>>>> 2. in order to write middleware-friendly
>>  clients
>>  > > that
>>  > > > > > >> >> >>>> utilize
>>  > > > > > >> >> >>>>>>>> such
>>  > > > > > >> >> >>>>>>>>> a
>>  > > > > > >> >> >>>>>>>>>>> container one would basically have to write
>>  their
>>  > > own
>>  > > > > > >> >> >>>>>>>>> producer/consumer
>>  > > > > > >> >> >>>>>>>>>> API
>>  > > > > > >> >> >>>>>>>>>>> on top of the open source kafka one.
>>  > > > > > >> >> >>>>>>>>>>> 3. if you were going to go with a wrapper
>>  format
>>  > > you
>>  > > > > > >> >> >>> really
>>  > > > > > >> >> >>>>>> dont
>>  > > > > > >> >> >>>>>>>>> need to
>>  > > > > > >> >> >>>>>>>>>>> bother with a kip (just open source your own
>>  > client
>>  > > > > stack
>>  > > > > > >> >> >>>>>> from #2
>>  > > > > > >> >> >>>>>>>>> above
>>  > > > > > >> >> >>>>>>>>>> so
>>  > > > > > >> >> >>>>>>>>>>> others could stop re-inventing it)
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
>>  > > > > > >> >> >>>>>>>> [email protected]>
>>  > > > > > >> >> >>>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>>> How exactly would this work? Or maybe that's
>>  out
>>  > > of
>>  > > > > > >> >> >>> scope
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>>> this
>>  > > > > > >> >> >>>>>>>>>> email.
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> The information contained in this email is
>>  strictly
>>  > > > > > >> confidential
>>  > > > > > >> >> >>>> and
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>>> the use of the addressee only, unless otherwise
>>  > > > > indicated.
>>  > > > > > >> If you
>>  > > > > > >> >> >>>> are
>>  > > > > > >> >> >>>>>> not
>>  > > > > > >> >> >>>>>>>>> the intended recipient, please do not read, copy,
>>  > use
>>  > > > or
>>  > > > > > >> disclose
>>  > > > > > >> >> >>>> to
>>  > > > > > >> >> >>>>>>>> others
>>  > > > > > >> >> >>>>>>>>> this message or any attachment. Please also
>>  notify
>>  > > the
>>  > > > > > >> sender by
>>  > > > > > >> >> >>>>>> replying
>>  > > > > > >> >> >>>>>>>>> to this email or by telephone (+44(020 7896 0011)
>>  > and
>>  > > > > then
>>  > > > > > >> delete
>>  > > > > > >> >> >>>> the
>>  > > > > > >> >> >>>>>>>> email
>>  > > > > > >> >> >>>>>>>>> and any copies of it. Opinions, conclusion (etc)
>>  > that
>>  > > > do
>>  > > > > > not
>>  > > > > > >> >> >>>> relate to
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> official business of this company shall be
>>  > understood
>>  > > > as
>>  > > > > > >> neither
>>  > > > > > >> >> >>>> given
>>  > > > > > >> >> >>>>>>>> nor
>>  > > > > > >> >> >>>>>>>>> endorsed by it. IG is a trading name of IG
>>  Markets
>>  > > > > Limited
>>  > > > > > (a
>>  > > > > > >> >> >>>> company
>>  > > > > > >> >> >>>>>>>>> registered in England and Wales, company number
>>  > > > 04008957)
>>  > > > > > >> and IG
>>  > > > > > >> >> >>>> Index
>>  > > > > > >> >> >>>>>>>>> Limited (a company registered in England and
>>  Wales,
>>  > > > > company
>>  > > > > > >> >> >>> number
>>  > > > > > >> >> >>>>>>>>> 01190902). Registered address at Cannon Bridge
>>  > House,
>>  > > > 25
>>  > > > > > >> Dowgate
>>  > > > > > >> >> >>>> Hill,
>>  > > > > > >> >> >>>>>>>>> London EC4R 2YA. Both IG Markets Limited
>>  (register
>>  > > > number
>>  > > > > > >> 195355)
>>  > > > > > >> >> >>>> and
>>  > > > > > >> >> >>>>>> IG
>>  > > > > > >> >> >>>>>>>>> Index Limited (register number 114059) are
>>  > authorised
>>  > > > and
>>  > > > > > >> >> >>>> regulated by
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> Financial Conduct Authority.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>> The information contained in this email is
>>  strictly
>>  > > > > > >> confidential
>>  > > > > > >> >> >>> and
>>  > > > > > >> >> >>>> for
>>  > > > > > >> >> >>>>>>>> the use of the addressee only, unless otherwise
>>  > > > indicated.
>>  > > > > > If
>>  > > > > > >> you
>>  > > > > > >> >> >>> are
>>  > > > > > >> >> >>>>>> not
>>  > > > > > >> >> >>>>>>>> the intended recipient, please do not read, copy,
>>  > use
>>  > > or
>>  > > > > > >> disclose
>>  > > > > > >> >> >>> to
>>  > > > > > >> >> >>>>>> others
>>  > > > > > >> >> >>>>>>>> this message or any attachment. Please also notify
>>  > the
>>  > > > > > sender
>>  > > > > > >> by
>>  > > > > > >> >> >>>>>> replying
>>  > > > > > >> >> >>>>>>>> to this email or by telephone (+44(020 7896 0011)
>>  > and
>>  > > > then
>>  > > > > > >> delete
>>  > > > > > >> >> >>> the
>>  > > > > > >> >> >>>>>> email
>>  > > > > > >> >> >>>>>>>> and any copies of it. Opinions, conclusion (etc)
>>  > that
>>  > > do
>>  > > > > not
>>  > > > > > >> >> relate
>>  > > > > > >> >> >>>> to
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> official business of this company shall be
>>  > understood
>>  > > as
>>  > > > > > >> neither
>>  > > > > > >> >> >>>> given
>>  > > > > > >> >> >>>>>> nor
>>  > > > > > >> >> >>>>>>>> endorsed by it. IG is a trading name of IG Markets
>>  > > > Limited
>>  > > > > > (a
>>  > > > > > >> >> >>> company
>>  > > > > > >> >> >>>>>>>> registered in England and Wales, company number
>>  > > > 04008957)
>>  > > > > > and
>>  > > > > > >> IG
>>  > > > > > >> >> >>>> Index
>>  > > > > > >> >> >>>>>>>> Limited (a company registered in England and
>>  Wales,
>>  > > > > company
>>  > > > > > >> number
>>  > > > > > >> >> >>>>>>>> 01190902). Registered address at Cannon Bridge
>>  > House,
>>  > > 25
>>  > > > > > >> Dowgate
>>  > > > > > >> >> >>>> Hill,
>>  > > > > > >> >> >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register
>>  > > > number
>>  > > > > > >> 195355)
>>  > > > > > >> >> >>>> and IG
>>  > > > > > >> >> >>>>>>>> Index Limited (register number 114059) are
>>  > authorised
>>  > > > and
>>  > > > > > >> >> regulated
>>  > > > > > >> >> >>>> by
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> Financial Conduct Authority.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> --
>>  > > > > > >> >> >>>>>> Gwen Shapira
>>  > > > > > >> >> >>>>>> Product Manager | Confluent
>>  > > > > > >> >> >>>>>> 650.450.2760 | @gwenshap
>>  > > > > > >> >> >>>>>> Follow us: Twitter | blog
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>> --
>>  > > > > > >> >> >>>> Gwen Shapira
>>  > > > > > >> >> >>>> Product Manager | Confluent
>>  > > > > > >> >> >>>> 650.450.2760 | @gwenshap
>>  > > > > > >> >> >>>> Follow us: Twitter | blog
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> --
>>  > > > > > >> >> >>> Nacho (Ignacio) Solis
>>  > > > > > >> >> >>> Kafka
>>  > > > > > >> >> >>> [email protected]
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >
>>  > > > > > >> >> >
>>  > > > > > >> >> >
>>  > > > > > >> >> > --
>>  > > > > > >> >> > Gwen Shapira
>>  > > > > > >> >> > Product Manager | Confluent
>>  > > > > > >> >> > 650.450.2760 | @gwenshap
>>  > > > > > >> >> > Follow us: Twitter | blog
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >>
>>  > > > > > >>
>>  > > > > > >>
>>  > > > > > >> --
>>  > > > > > >> Gwen Shapira
>>  > > > > > >> Product Manager | Confluent
>>  > > > > > >> 650.450.2760 | @gwenshap
>>  > > > > > >> Follow us: Twitter | blog
>>  > > > > > >>
>>  > > > > > >
>>  > > > > > >
>>  > > > > > The information contained in this email is strictly confidential
>>  > and
>>  > > > for
>>  > > > > > the use of the addressee only, unless otherwise indicated. If you
>>  > are
>>  > > > not
>>  > > > > > the intended recipient, please do not read, copy, use or disclose
>>  > to
>>  > > > > others
>>  > > > > > this message or any attachment. Please also notify the sender by
>>  > > > replying
>>  > > > > > to this email or by telephone (+44(020 7896 0011) and then delete
>>  > the
>>  > > > > email
>>  > > > > > and any copies of it. Opinions, conclusion (etc) that do not
>>  relate
>>  > > to
>>  > > > > the
>>  > > > > > official business of this company shall be understood as neither
>>  > > given
>>  > > > > nor
>>  > > > > > endorsed by it. IG is a trading name of IG Markets Limited (a
>>  > company
>>  > > > > > registered in England and Wales, company number 04008957) and IG
>>  > > Index
>>  > > > > > Limited (a company registered in England and Wales, company
>>  number
>>  > > > > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate
>>  > > Hill,
>>  > > > > > London EC4R 2YA. Both IG Markets Limited (register number 195355)
>>  > and
>>  > > > IG
>>  > > > > > Index Limited (register number 114059) are authorised and
>>  regulated
>>  > > by
>>  > > > > the
>>  > > > > > Financial Conduct Authority.
>>  > > > > >
>>  > > > >
>>  > > >
>>  > >
>>  >
>>  The information contained in this email is strictly confidential and for
>>  the use of the addressee only, unless otherwise indicated. If you are not
>>  the intended recipient, please do not read, copy, use or disclose to others
>>  this message or any attachment. Please also notify the sender by replying
>>  to this email or by telephone (+44(020 7896 0011) and then delete the email
>>  and any copies of it. Opinions, conclusion (etc) that do not relate to the
>>  official business of this company shall be understood as neither given nor
>>  endorsed by it. IG is a trading name of IG Markets Limited (a company
>>  registered in England and Wales, company number 04008957) and IG Index
>>  Limited (a company registered in England and Wales, company number
>>  01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>  London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>>  Index Limited (register number 114059) are authorised and regulated by the
>>  Financial Conduct Authority.
>
> --
> Nacho - Ignacio Solis - [email protected]

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to