Re: [DISCUSS] KIP-82 - Add Record Headers

Jason Gustafson Fri, 17 Feb 2017 10:41:46 -0800

Sorry, my mistake. The consumer interceptor is per batch, though I'm not
sure that's an actual limitation since you still have access to the
individual records.


-Jason

On Fri, Feb 17, 2017 at 10:39 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Re headers as byte array and future use by broker. This doesn't take away
>> from that at all. Nor makes it difficult at all in my opinion.
>
>
> Yeah, I didn't say it was difficult, only awkward. You wouldn't write the
> schema that way if you were planning to use it on the brokers from the
> beginning. Note also that one of the benefits of letting the broker
> understand headers is that it can validate that they are properly
> formatted. If cost is the only concern, we should confirm its impact
> through performance testing.
>
> One of the key use cases requires access on consume at per event/message
>> level at the point that message is being processed, as such the batch
>> interceptors and batch consume api isn't suitable. It needs to be at the
>> record level.
>
>
> I'm not sure I understand the point about batching. Interceptors are
> applied per-message, right?
>
> My intent on interceptors is to keep the usage of headers well-defined so
> that they don't start leaking unnecessarily into applications. My guess is
> that it's probably inevitable, but isolating it in the interceptors would
> at least give people a second thought before deciding to use it. The main
> challenge in my mind is figuring out how an MM use case would work. It
> would be more cumbersome to replicate headers through an interceptor,
> though arguably MM should be working at a lower level anyway.
>
> -Jason
>
> On Fri, Feb 17, 2017 at 10:16 AM, Michael Pearce <michael.pea...@ig.com>
> wrote:
>
>> Re headers available on the record  va interceptors only
>>
>> One of the key use cases requires access on consume at per event/message
>> level at the point that message is being processed, as such the batch
>> interceptors and batch consume api isn't suitable. It needs to be at the
>> record level.
>>
>> This anyhow is similar to jms/http/amqp where headers are available to
>> consuming applications.
>>
>> Re headers as byte array and future use by broker. This doesn't take away
>> from that at all. Nor makes it difficult at all in my opinion.
>>
>>
>>
>> Sent using OWA for iPhone
>> ________________________________________
>> From: Jason Gustafson <ja...@confluent.io>
>> Sent: Friday, February 17, 2017 5:55:42 PM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>
>> >
>> > Would you be proposing in KIP-98 to convert the other message int’s (key
>> > length, value length) also to varint to keep it uniform.
>> > Also I assume there will be a static or helper method made to write/read
>> > these in the client and server.
>>
>>
>> Yes, that is what we are proposing, so using varints for headers would be
>> consistent with the rest of the message. We have used static helper
>> methods
>> in our prototype implementation.
>>
>> The cost of parsing, we want to parse/interpret the headers lazily (this
>> is
>> > a key point brought up earlier in discussions)
>>
>>
>> I'm a bit skeptical of this. Has anyone done the performance testing? I
>> can
>> probably implement it and test it if no one else has. I was also under the
>> impression that there may be use cases down the road where the broker
>> would
>> need to interpret headers. That wouldn't be off the table in the future if
>> it's represented as bytes, but it would be quite a bit more awkward,
>> right?
>>
>> By the way, one question I have been wondering about. My understanding is
>> that headers are primarily for use cases where a third-party components
>> wants to enrich messages without needing to understand or modify the
>> schema
>> of the message key and value. For the applications which directly produce
>> and consume the messages and control the key/value schema directly, it
>> seems we would rather have them implement headers directly in their own
>> schema. Supposing for the sake of argument that it was possible, my
>> question is whether it be sufficient to expose the headers in the
>> interceptor API and not in the common API?
>>
>> -Jason
>>
>> On Fri, Feb 17, 2017 at 3:26 AM, Michael Pearce <michael.pea...@ig.com>
>> wrote:
>>
>> > On the point of varInts
>> >
>> > Would you be proposing in KIP-98 to convert the other message int’s (key
>> > length, value length) also to varint to keep it uniform.
>> > Also I assume there will be a static or helper method made to write/read
>> > these in the client and server.
>> >
>> > Cheers
>> > Mike
>> >
>> >
>> >
>> > On 17/02/2017, 11:22, "Michael Pearce" <michael.pea...@ig.com> wrote:
>> >
>> >     On the point re: headers in the message protocol being a byte array
>> > and not a count of elements followed by the elements. Again this was
>> > discussed/argued previously.
>> >
>> >     It was agreed on for a few reasons some of which you have obviously
>> > picked up on:
>> >
>> >     Broker is able to pass it through opaquely
>> >     The cost of parsing, we want to parse/interpret the headers lazily
>> > (this is a key point brought up earlier in discussions)
>> >     Headers can be copied from consumer record to producer record (aka
>> > mirror makers etc) without parsing if no changes are being made or being
>> > looked at.
>> >     Keeps the broker agnostic to the format
>> >     You need an int32 either for the byte size of the headers, or for
>> the
>> > count of elements, so overheads are the same, but with going with an
>> opaque
>> > byte array has the above advantages.
>> >
>> >     Cheers
>> >     Mike
>> >
>> >
>> >     On 17/02/2017, 02:50, "Jason Gustafson" <ja...@confluent.io> wrote:
>> >
>> >         Sorry, should have noted that the performance testing was done
>> > using the
>> >         producer performance tool shipped with Kafka.
>> >
>> >         -Jason
>> >
>> >         On Thu, Feb 16, 2017 at 6:44 PM, Jason Gustafson <
>> > ja...@confluent.io> wrote:
>> >
>> >         > Hey Nacho,
>> >         >
>> >         > I've compared performance of our KIP-98 implementation with
>> and
>> > without
>> >         > varints. For messages around 128 bytes, we see an increase in
>> > throughput of
>> >         > about 30% using the default configuration settings. At 256
>> > bytes, the
>> >         > increase is around 16%. Obviously the performance converge as
>> > messages get
>> >         > larger, but it seems well worth the cost. Note that we are
>> also
>> > seeing a
>> >         > substantial performance increase against trunk primarily
>> because
>> > of the
>> >         > much more efficient packing that varints provide us. Anything
>> > adding to
>> >         > message overhead, such as record headers, would only increase
>> > the relative
>> >         > difference. (Of course take these numbers with a grain of salt
>> > since I have
>> >         > only used the default settings with both the producer and
>> broker
>> > on my
>> >         > local machine. We intend to provide more extensive performance
>> > details as
>> >         > part of the work for KIP-98.)
>> >         >
>> >         > The implementation we are using is from protobuf (
>> >         > https://developers.google.com/protocol-buffers/docs/encoding
>> ),
>> > which is
>> >         > also used in HBase. It is trivial to implement and as far as I
>> > know doesn't
>> >         > suffer from the aliasing problem you are describing. I checked
>> > with Magnus
>> >         > (the author of librdkafka) and he agreed that the savings
>> seemed
>> > worth the
>> >         > cost of implementation.
>> >         >
>> >         > -Jason
>> >         >
>> >         > On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis <
>> iso...@igso.net>
>> > wrote:
>> >         >
>> >         >> -VarInts
>> >         >>
>> >         >> I'm one of the people (if not the most) opposed to VarInts.
>> > VarInts
>> >         >> have a place, but this is not it.   (We had a large
>> discussion
>> > about
>> >         >> them at the beginning of KIP-82 time)
>> >         >>
>> >         >> If anybody has real life performance numbers of VarInts
>> > improving
>> >         >> things or significantly reducing resources I would like to
>> know
>> > what
>> >         >> that case may be. Yes, you can save some bytes here and
>> there,
>> > but
>> >         >> this is probably insignificant to the overall system behavior
>> > and
>> >         >> storage requirements.  -- I say this with respect to using
>> > VarInts in
>> >         >> the protocol itself, not as part of the data.
>> >         >>
>> >         >> VarInts require you to parse the Int before using it and
>> > depending on
>> >         >> the encoding they can suffer from aliasing (multiple
>> > representations
>> >         >> for the same value).
>> >         >>
>> >         >> Why add complexity?
>> >         >>
>> >         >> Nacho
>> >         >>
>> >         >>
>> >         >> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe <
>> > cmcc...@apache.org>
>> >         >> wrote:
>> >         >> > +1 for varints here-- it would save quite a bit of space.
>> > They are
>> >         >> > pretty quick to implement as well.
>> >         >> >
>> >         >> > I think it makes sense for values to be byte arrays.  Users
>> > might want
>> >         >> > to attach arbitrary payloads; they shouldn't be forced to
>> > serialize
>> >         >> > everything to Java strings.
>> >         >> >
>> >         >> > best,
>> >         >> > Colin
>> >         >> >
>> >         >> >
>> >         >> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
>> >         >> >> Hey Michael,
>> >         >> >>
>> >         >> >> Hmm, I guess the point of representing it as bytes is to
>> > allow the
>> >         >> broker
>> >         >> >> to pass it through opaquely? Is the cost of parsing them a
>> > concern, or
>> >         >> >> are
>> >         >> >> we simply trying to ensure that the broker stays agnostic
>> to
>> > the
>> >         >> format?
>> >         >> >>
>> >         >> >> On varints, I think adding support for them makes less
>> sense
>> > for an
>> >         >> >> isolated use case, but as part of a more holistic change
>> > (such as what
>> >         >> we
>> >         >> >> have proposed in KIP-98), I think they are justifiable. If
>> > we add them,
>> >         >> >> then the need to use attributes becomes quite a bit
>> weaker,
>> > right? The
>> >         >> >> other thing I find slightly odd is the fact that null
>> > headers has no
>> >         >> >> actual
>> >         >> >> semantic meaning for the message (unlike null keys and
>> > values). It is
>> >         >> >> just
>> >         >> >> a space optimization. It seems a bit better to always use
>> > size 0 to
>> >         >> >> indicate having no headers.
>> >         >> >>
>> >         >> >> Overall, the main point is ensuring that the message
>> schema
>> > remains
>> >         >> >> consistent, either within the larger protocol, or at a
>> > minimum within
>> >         >> the
>> >         >> >> message itself.
>> >         >> >>
>> >         >> >> -Jason
>> >         >> >>
>> >         >> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce <
>> > michael.pea...@ig.com
>> >         >> >
>> >         >> >> wrote:
>> >         >> >>
>> >         >> >> > Hi Jason,
>> >         >> >> >
>> >         >> >> > On point 1) in the message protocol the headers are
>> simply
>> > a byte
>> >         >> array,
>> >         >> >> > as like the key or value, this is to clearly demarcate
>> the
>> > header in
>> >         >> the
>> >         >> >> > core message. Then the header byte array in the core
>> > message is an
>> >         >> array of
>> >         >> >> > key, value pairs. This is what it is denoting.
>> >         >> >> >
>> >         >> >> > Then this would be I guess in the given notation:
>> >         >> >> >
>> >         >> >> > Headers => [KeyLength, Key, ValueLength, Value]
>> >         >> >> >     KeyLength => int32 <-----------------NEW size of the
>> > byte[] of
>> >         >> the
>> >         >> >> > serialised key value
>> >         >> >> >     Key => bytes <---------------------- NEW serialised
>> > string (UTF8)
>> >         >> >> > bytes of the header key
>> >         >> >> >     ValueLength => int32 <-------------- NEW size of the
>> > byte[] of
>> >         >> the
>> >         >> >> > serialised header value
>> >         >> >> >     Value => bytes <-------------------- NEW serialised
>> > form of the
>> >         >> header
>> >         >> >> > value
>> >         >> >> >
>> >         >> >> > The key length and value length is matching the way the
>> > protocol is
>> >         >> >> > defined in the core message currently.
>> >         >> >> >
>> >         >> >> >
>> >         >> >> >
>> >         >> >> >
>> >         >> >> > On point 2)
>> >         >> >> > Var sized ints, this was discussed much earlier on, in
>> > fact I had
>> >         >> >> > suggested it myself (with Hadoop references), the
>> > complexity of this
>> >         >> >> > compared to having a simpler protocol was argued and
>> > agreed it
>> >         >> wasn’t worth
>> >         >> >> > the complexity as all other clients in other languages
>> > would need to
>> >         >> ensure
>> >         >> >> > theyre using the right var size algorithm, as there is a
>> > few.
>> >         >> >> >
>> >         >> >> > On point 3)
>> >         >> >> > We did the attributes, optional approach as originally
>> > there was
>> >         >> marked
>> >         >> >> > concern that headers would cause a message size overhead
>> > for others,
>> >         >> who
>> >         >> >> > don’t want them. As such this is the clean solution to
>> > achieve that.
>> >         >> If
>> >         >> >> > that no longer holds, and we don’t care that we add
>> 4bytes
>> > overhead,
>> >         >> then
>> >         >> >> > im happy to remove.
>> >         >> >> >
>> >         >> >> > I’m personally in favour of keeping the message as small
>> > as possible
>> >         >> so
>> >         >> >> > people don’t get shocks in perf and throughputs dues to
>> > message size,
>> >         >> >> > unless they actively use the feature, as such I do
>> prefer
>> > the
>> >         >> attribute bit
>> >         >> >> > wise feature flag approach myself.
>> >         >> >> >
>> >         >> >> >
>> >         >> >> >
>> >         >> >> >
>> >         >> >> > On 16/02/2017, 05:40, "Jason Gustafson" <
>> > ja...@confluent.io> wrote:
>> >         >> >> >
>> >         >> >> >     We have proposed a few significant changes to the
>> > message format
>> >         >> in
>> >         >> >> > KIP-98
>> >         >> >> >     which now seems likely to pass (perhaps with some
>> > iterations on
>> >         >> >> >     implementation details). It would be good to try and
>> > coordinate
>> >         >> the
>> >         >> >> > changes
>> >         >> >> >     in both of the proposals to make sure they are
>> > consistent and
>> >         >> >> > compatible.
>> >         >> >> >
>> >         >> >> >     I think using the attributes to indicate null
>> headers
>> > is a
>> >         >> reasonable
>> >         >> >> >     approach. We have proposed to do the same thing for
>> > the message
>> >         >> key and
>> >         >> >> >     value. That said, I sympathize with Jay's argument.
>> > Having
>> >         >> multiple
>> >         >> >> > ways to
>> >         >> >> >     specify a null value increases the overall
>> complexity
>> > of the
>> >         >> protocol.
>> >         >> >> > You
>> >         >> >> >     can see this just from the fact that you need the
>> > extra verbiage
>> >         >> in the
>> >         >> >> >     protocol specification in this KIP and in KIP-98 to
>> > describe the
>> >         >> >> > dependence
>> >         >> >> >     between the fields and the attributes. It seems
>> like a
>> > slippery
>> >         >> slope
>> >         >> >> > if
>> >         >> >> >     you start allowing different request types to
>> > implement the
>> >         >> protocol
>> >         >> >> >     specification differently.
>> >         >> >> >
>> >         >> >> >     You can also argue that the messages already are and
>> > are likely
>> >         >> to
>> >         >> >> > remain a
>> >         >> >> >     special case. For example, there is currently no
>> > generality in
>> >         >> how
>> >         >> >> >     compressed message sets are represented that would
>> be
>> > applicable
>> >         >> for
>> >         >> >> > other
>> >         >> >> >     request types. Some might see this divergence as an
>> > unfortunate
>> >         >> >> > protocol
>> >         >> >> >     deficiency which should be fixed; others might see
>> it
>> > as sort of
>> >         >> the
>> >         >> >> >     inevitability of needing to optimize where it counts
>> > most. I'm
>> >         >> probably
>> >         >> >> >     somewhere in between, but I think we probably all
>> > share the
>> >         >> intuition
>> >         >> >> > that
>> >         >> >> >     the protocol should be kept as consistent as
>> possible.
>> > With that
>> >         >> in
>> >         >> >> > mind,
>> >         >> >> >     here are a few comments:
>> >         >> >> >
>> >         >> >> >     1. One thing I found a little odd when reading the
>> > current
>> >         >> proposal is
>> >         >> >> > that
>> >         >> >> >     the headers are both represented as an array of
>> bytes
>> > and as an
>> >         >> array
>> >         >> >> > of
>> >         >> >> >     key/value pairs. I'd probably suggest something like
>> > this:
>> >         >> >> >
>> >         >> >> >     Headers => [HeaderKey HeaderValue]
>> >         >> >> >      HeaderKey => String
>> >         >> >> >      HeaderValue => Bytes
>> >         >> >> >
>> >         >> >> >     An array in the Kafka protocol is represented as a
>> > 4-byte integer
>> >         >> >> >     indicating the number of elements in the array
>> > followed by the
>> >         >> >> >     serialization of the elements. Unless I'm
>> > misunderstanding, what
>> >         >> you
>> >         >> >> > have
>> >         >> >> >     instead is the total size of the headers in bytes
>> > followed by the
>> >         >> >> > elements.
>> >         >> >> >     I'm not sure I see any reason for this
>> inconsistency.
>> >         >> >> >
>> >         >> >> >     2. In KIP-98, we've introduced variable-length
>> integer
>> > fields.
>> >         >> >> > Effectively,
>> >         >> >> >     we've enriched (or "complicated" as Jay might say ;)
>> > the protocol
>> >         >> >> >     specification to include the following types:
>> VarInt,
>> > VarLong,
>> >         >> >> >     UnsignedVarInt and UnsignedVarLong.
>> >         >> >> >
>> >         >> >> >     Along with these primitives, we could introduce the
>> > following
>> >         >> types:
>> >         >> >> >
>> >         >> >> >     VarSizeArray => NumberOfItems Item1 Item2 .. ItemN
>> >         >> >> >       NumberOfItems => UnsignedVarInt
>> >         >> >> >
>> >         >> >> >     VarSizeNullableArray => NumberOfItemsOrNull Item1
>> > Item2 .. ItemN
>> >         >> >> >       NumberOfItemsOrNull => VarInt (-1 means null)
>> >         >> >> >
>> >         >> >> >     And similarly for the `String` and `Bytes` types.
>> > These types
>> >         >> can save
>> >         >> >> > a
>> >         >> >> >     considerable amount of space in this proposal
>> because
>> > they can
>> >         >> be used
>> >         >> >> > for
>> >         >> >> >     both the number of headers included in the message
>> and
>> > the
>> >         >> lengths of
>> >         >> >> > the
>> >         >> >> >     header keys and values. We could do this instead:
>> >         >> >> >
>> >         >> >> >     Headers => VarSizeArray[HeaderKey HeaderValue]
>> >         >> >> >       HeaderKey => VarSizeString
>> >         >> >> >       HeaderValue => VarSizeBytes
>> >         >> >> >
>> >         >> >> >     Combining the savings from the use of variable
>> length
>> > fields, the
>> >         >> >> > benefit
>> >         >> >> >     of using the attributes to represent null seems
>> pretty
>> > small.
>> >         >> >> >
>> >         >> >> >     3. Whichever way we go (whether we use the
>> attributes
>> > or not), we
>> >         >> >> > should at
>> >         >> >> >     least be consistent between this KIP and KIP-98. It
>> > would be very
>> >         >> >> > strange
>> >         >> >> >     to have two ways to represent null values in the
>> same
>> > schema.
>> >         >> Either
>> >         >> >> > way is
>> >         >> >> >     OK with me. I think some message-level optimizations
>> > are
>> >         >> justifiable,
>> >         >> >> > but
>> >         >> >> >     the savings here seem minimal (a few bytes per
>> > message), so
>> >         >> maybe it's
>> >         >> >> > not
>> >         >> >> >     worth the cost of letting the message diverge even
>> > further from
>> >         >> the
>> >         >> >> > rest of
>> >         >> >> >     the protocol.
>> >         >> >> >
>> >         >> >> >     -Jason
>> >         >> >> >
>> >         >> >> >
>> >         >> >> >     On Wed, Feb 15, 2017 at 8:52 AM, radai <
>> >         >> radai.rosenbl...@gmail.com>
>> >         >> >> > wrote:
>> >         >> >> >
>> >         >> >> >     > I've trimmed the inline contents as this mail is
>> > getting too
>> >         >> big for
>> >         >> >> > the
>> >         >> >> >     > apache mailing list software to deliver :-(
>> >         >> >> >     >
>> >         >> >> >     > 1. the important thing for interoperability is for
>> > different
>> >         >> >> > "interested
>> >         >> >> >     > parties" (plugins, infra layers/wrappers,
>> user-code)
>> > to be
>> >         >> able to
>> >         >> >> > stick
>> >         >> >> >     > pieces of metadata onto msgs without getting in
>> each
>> > other's
>> >         >> way. a
>> >         >> >> > common
>> >         >> >> >     > key scheme (Strings, as of the time of this
>> > writing?) is all
>> >         >> thats
>> >         >> >> > required
>> >         >> >> >     > for that. it is assumed that the other end
>> > interested in any
>> >         >> such
>> >         >> >> > piece of
>> >         >> >> >     > metadata knows the encoding, and byte[] provides
>> for
>> > the most
>> >         >> >> > flexibility.
>> >         >> >> >     > i believe this is the same logic behind core kafka
>> > being
>> >         >> >> > byte[]/byte[] -
>> >         >> >> >     > Strings are more "usable" but bytes are flexible
>> and
>> > so were
>> >         >> chosen.
>> >         >> >> >     > Also - core kafka doesnt even do that good of a
>> job
>> > on
>> >         >> usability of
>> >         >> >> > the
>> >         >> >> >     > payload (example - i have to specify the nop
>> byte[]
>> > "decoders"
>> >         >> >> > explicitly
>> >         >> >> >     > in conf), and again sacrificies usability for the
>> > sake of
>> >         >> >> > performance (no
>> >         >> >> >     > convenient single-record processing as poll is a
>> > batch, lots of
>> >         >> >> > obscure
>> >         >> >> >     > little config details exposing internals of the
>> > batching
>> >         >> mechanism,
>> >         >> >> > etc)
>> >         >> >> >     >
>> >         >> >> >     > this is also why i really dislike the idea of a
>> > "type system"
>> >         >> for
>> >         >> >> > header
>> >         >> >> >     > values, it further degrades the usability, adds
>> > complexity and
>> >         >> will
>> >         >> >> >     > eventually get in people's way, also, it would be
>> > the 2nd/3rd
>> >         >> >> > home-group
>> >         >> >> >     > serialization mechanism in core kafka (counting 2
>> > iterations
>> >         >> of the
>> >         >> >> > "type
>> >         >> >> >     > definition DSL")
>> >         >> >> >     >
>> >         >> >> >     > 2. this is an implementation detail, and not even
>> a
>> > very "user
>> >         >> >> > facing" one?
>> >         >> >> >     > to the best of my understanding the vote process
>> is
>> > on proposed
>> >         >> >> >     > API/behaviour. also - since we're willing to go
>> with
>> > strings
>> >         >> just
>> >         >> >> > serialize
>> >         >> >> >     > a 0-sized header blob and IIUC you dont need any
>> > optionals
>> >         >> anymore.
>> >         >> >> >     >
>> >         >> >> >     > 3. yes, we can :-)
>> >         >> >> >     >
>> >         >> >> >     > On Tue, Feb 14, 2017 at 11:56 PM, Michael Pearce <
>> >         >> >> > michael.pea...@ig.com>
>> >         >> >> >     > wrote:
>> >         >> >> >     >
>> >         >> >> >     > > Hi Jay,
>> >         >> >> >     > >
>> >         >> >> >     > > 1) There was some initial debate on the value
>> > part, as youll
>> >         >> note
>> >         >> >> > String,
>> >         >> >> >     > > String headers were discounted early on. The
>> > reason for this
>> >         >> is
>> >         >> >> >     > flexibility
>> >         >> >> >     > > and keeping in line with the flexibility of key,
>> > value of the
>> >         >> >> > message
>> >         >> >> >     > > object itself. I don’t think it takes away from
>> an
>> > ecosystem
>> >         >> as
>> >         >> >> > each
>> >         >> >> >     > plugin
>> >         >> >> >     > > will care for their own key, this way ints,
>> > booleans , exotic
>> >         >> >> > custom
>> >         >> >> >     > binary
>> >         >> >> >     > > can all be catered for=.
>> >         >> >> >     > > a. If you really wanted to push for a typed
>> value
>> > interface,
>> >         >> I
>> >         >> >> > wouldn’t
>> >         >> >> >     > > want just String values supported, but the the
>> > primatives
>> >         >> plus
>> >         >> >> > string and
>> >         >> >> >     > > also still keeping the ability to have a binary
>> > for custom
>> >         >> >> > binaries that
>> >         >> >> >     > > some organisations may have.
>> >         >> >> >     > > i. I have written this slight alternative here,
>> >         >> >> >     > https://cwiki.apache.org/
>> >         >> >> >     > > confluence/display/KAFKA/KIP-
>> > 82+-+Add+Record+Headers+-+Typed
>> >         >> >> >     > > ii. Essentially the value bytes, has a leading
>> > byte overhead.
>> >         >> >> >     > > 1.  This tells you what type the value is,
>> before
>> > reading
>> >         >> the rest
>> >         >> >> > of the
>> >         >> >> >     > > bytes, allowing serialisation/deserialization to
>> > and from the
>> >         >> >> > primitives,
>> >         >> >> >     > > string and byte[]. This is akin to some other
>> > messaging
>> >         >> systems.
>> >         >> >> >     > > 2) We are making it optional, so that for those
>> > not wanting
>> >         >> >> > headers have
>> >         >> >> >     > 0
>> >         >> >> >     > > bytes overhead (think of it as a feature flag),
>> I
>> > don’t
>> >         >> think this
>> >         >> >> > is
>> >         >> >> >     > > complex, especially if comparing to changes
>> > proposed in
>> >         >> other kips
>> >         >> >> > like
>> >         >> >> >     > > kip-98.
>> >         >> >> >     > > a. If you really really don’t like this, we can
>> > drop it, but
>> >         >> it
>> >         >> >> > would
>> >         >> >> >     > mean
>> >         >> >> >     > > buying into 4 bytes extra overhead for users who
>> > do not want
>> >         >> to use
>> >         >> >> >     > headers.
>> >         >> >> >     > > 3) In the summary yes, it is at a higher level,
>> > but I think
>> >         >> this
>> >         >> >> > is well
>> >         >> >> >     > > documented in the proposed changes section.
>> >         >> >> >     > > a. Added getHeaders method to Producer/Consumer
>> > record (that
>> >         >> is it)
>> >         >> >> >     > > b. We’ve also detailed the new Headers class
>> that
>> > this method
>> >         >> >> > returns
>> >         >> >> >     > that
>> >         >> >> >     > > encapsulates the headers protocol and logic.
>> >         >> >> >     > >
>> >         >> >> >     > > Best,
>> >         >> >> >     > > Mike
>> >         >> >> >     > >
>> >         >> >> >     > > ==Original questions from the vote thread from
>> > Jay.==
>> >         >> >> >     > >
>> >         >> >> >     > > Couple of things I think we still need to work
>> out:
>> >         >> >> >     > >
>> >         >> >> >     > >    1. I think we agree about the key, but I
>> think
>> > we haven't
>> >         >> >> > talked about
>> >         >> >> >     > >    the value yet. I think if our goal is an open
>> > ecosystem
>> >         >> of these
>> >         >> >> >     > header
>> >         >> >> >     > >    spread across many plugins from many systems
>> we
>> > should
>> >         >> consider
>> >         >> >> > making
>> >         >> >> >     > > this
>> >         >> >> >     > >    a string as well so it can be printed, set
>> via
>> > a UI, set
>> >         >> in
>> >         >> >> > config,
>> >         >> >> >     > etc.
>> >         >> >> >     > >    Basically encouraging pluggable serialization
>> > formats
>> >         >> here will
>> >         >> >> > lead
>> >         >> >> >     > to
>> >         >> >> >     > > a
>> >         >> >> >     > >    bit of a tower of babel.
>> >         >> >> >     > >    2. This proposal still includes a pretty big
>> > change to our
>> >         >> >> >     > serialization
>> >         >> >> >     > >    and protocol definition layer. Essentially
>> it is
>> >         >> introducing an
>> >         >> >> >     > optional
>> >         >> >> >     > >    type, where the format is data dependent. I
>> > think this is
>> >         >> >> > actually a
>> >         >> >> >     > big
>> >         >> >> >     > >    change though it doesn't seem like it. It
>> means
>> > you can no
>> >         >> >> > longer
>> >         >> >> >     > > specify
>> >         >> >> >     > >    this type with our type definition DSL, and
>> > likewise it
>> >         >> requires
>> >         >> >> >     > custom
>> >         >> >> >     > >    handling in client libs. This isn't a huge
>> > thing, since
>> >         >> the
>> >         >> >> > Record
>> >         >> >> >     > >    definition is custom anyway, but I think this
>> > kind of
>> >         >> protocol
>> >         >> >> >     > >    inconsistency is very non-desirable and ties
>> > you to
>> >         >> hand-coding
>> >         >> >> >     > things.
>> >         >> >> >     > > I
>> >         >> >> >     > >    think the type should instead by [Key Value]
>> in
>> > our BNF,
>> >         >> where
>> >         >> >> > key and
>> >         >> >> >     > >    value are both short strings as used
>> elsewhere.
>> > This
>> >         >> brings it
>> >         >> >> > in line
>> >         >> >> >     > > with
>> >         >> >> >     > >    the rest of the protocol.
>> >         >> >> >     > >    3. Could we get more specific about the exact
>> > Java API
>> >         >> change to
>> >         >> >> >     > >    ProducerRecord, ConsumerRecord, Record, etc?
>> >         >> >> >     > >
>> >         >> >> >     > > -Jay
>> >         >> >> >     > >
>> >         >> >> >     >
>> >         >> >> >
>> >         >> >> >
>> >         >> >> > The information contained in this email is strictly
>> > confidential and
>> >         >> for
>> >         >> >> > the use of the addressee only, unless otherwise
>> indicated.
>> > If you
>> >         >> are not
>> >         >> >> > the intended recipient, please do not read, copy, use or
>> > disclose to
>> >         >> others
>> >         >> >> > this message or any attachment. Please also notify the
>> > sender by
>> >         >> replying
>> >         >> >> > to this email or by telephone (+44(020 7896 0011) and
>> then
>> > delete
>> >         >> the email
>> >         >> >> > and any copies of it. Opinions, conclusion (etc) that do
>> > not relate
>> >         >> to the
>> >         >> >> > official business of this company shall be understood as
>> > neither
>> >         >> given nor
>> >         >> >> > endorsed by it. IG is a trading name of IG Markets
>> Limited
>> > (a company
>> >         >> >> > registered in England and Wales, company number
>> 04008957)
>> > and IG
>> >         >> Index
>> >         >> >> > Limited (a company registered in England and Wales,
>> > company number
>> >         >> >> > 01190902). Registered address at Cannon Bridge House, 25
>> > Dowgate
>> >         >> Hill,
>> >         >> >> > London EC4R 2YA. Both IG Markets Limited (register
>> number
>> > 195355)
>> >         >> and IG
>> >         >> >> > Index Limited (register number 114059) are authorised
>> and
>> > regulated
>> >         >> by the
>> >         >> >> > Financial Conduct Authority.
>> >         >> >> >
>> >         >>
>> >         >>
>> >         >>
>> >         >> --
>> >         >> Nacho - Ignacio Solis - iso...@igso.net
>> >         >>
>> >         >
>> >         >
>> >
>> >
>> >
>> >
>> > The information contained in this email is strictly confidential and for
>> > the use of the addressee only, unless otherwise indicated. If you are
>> not
>> > the intended recipient, please do not read, copy, use or disclose to
>> others
>> > this message or any attachment. Please also notify the sender by
>> replying
>> > to this email or by telephone (+44(020 7896 0011) and then delete the
>> email
>> > and any copies of it. Opinions, conclusion (etc) that do not relate to
>> the
>> > official business of this company shall be understood as neither given
>> nor
>> > endorsed by it. IG is a trading name of IG Markets Limited (a company
>> > registered in England and Wales, company number 04008957) and IG Index
>> > Limited (a company registered in England and Wales, company number
>> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> > Index Limited (register number 114059) are authorised and regulated by
>> the
>> > Financial Conduct Authority.
>> >
>> The information contained in this email is strictly confidential and for
>> the use of the addressee only, unless otherwise indicated. If you are not
>> the intended recipient, please do not read, copy, use or disclose to others
>> this message or any attachment. Please also notify the sender by replying
>> to this email or by telephone (+44(020 7896 0011) and then delete the email
>> and any copies of it. Opinions, conclusion (etc) that do not relate to the
>> official business of this company shall be understood as neither given nor
>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>> registered in England and Wales, company number 04008957) and IG Index
>> Limited (a company registered in England and Wales, company number
>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> Index Limited (register number 114059) are authorised and regulated by the
>> Financial Conduct Authority.
>>
>
>

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to