Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Fri, 17 Feb 2017 11:55:47 -0800

I am happy to move the definition of the header into the message body, but 
would cause us not to lazy initialise/parse the headers, as obviously, we would 
have to traverse these reading the message.


This was actually one of Jay’s requests:

“    2. I think we should think about creating the Map lazily to avoid
    parsing out all the headers into little objects. HashMaps themselves are
    kind of expensive and the consumer is very perf sensitive so and making
    gazillions of hashmaps that may or may not get used is probably a bad idea.”



  

On 17/02/2017, 19:44, "Michael Pearce" <michael.pea...@ig.com> wrote:

    Yes exactly we have access to the records thus why the header should be 
accessible via it and not hidden for only interceptors to access.
    
    Sent using OWA for iPhone
    ________________________________________
    From: Magnus Edenhill <mag...@edenhill.se>
    Sent: Friday, February 17, 2017 7:34:49 PM
    To: dev@kafka.apache.org
    Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
    
    Big +1 on VarInts.
    CPUs are fast, memory is slow.
    
    I agree with Jason that we'll want to continue verifying messages,
    including their headers, so while I appreciate the idea of the opaque
    header blob it won't be useful in practice.
    
    /Magnus
    
    2017-02-17 10:41 GMT-08:00 Jason Gustafson <ja...@confluent.io>:
    
    > Sorry, my mistake. The consumer interceptor is per batch, though I'm not
    > sure that's an actual limitation since you still have access to the
    > individual records.
    >
    > -Jason
    >
    > On Fri, Feb 17, 2017 at 10:39 AM, Jason Gustafson <ja...@confluent.io>
    > wrote:
    >
    > > Re headers as byte array and future use by broker. This doesn't take 
away
    > >> from that at all. Nor makes it difficult at all in my opinion.
    > >
    > >
    > > Yeah, I didn't say it was difficult, only awkward. You wouldn't write 
the
    > > schema that way if you were planning to use it on the brokers from the
    > > beginning. Note also that one of the benefits of letting the broker
    > > understand headers is that it can validate that they are properly
    > > formatted. If cost is the only concern, we should confirm its impact
    > > through performance testing.
    > >
    > > One of the key use cases requires access on consume at per event/message
    > >> level at the point that message is being processed, as such the batch
    > >> interceptors and batch consume api isn't suitable. It needs to be at 
the
    > >> record level.
    > >
    > >
    > > I'm not sure I understand the point about batching. Interceptors are
    > > applied per-message, right?
    > >
    > > My intent on interceptors is to keep the usage of headers well-defined 
so
    > > that they don't start leaking unnecessarily into applications. My guess
    > is
    > > that it's probably inevitable, but isolating it in the interceptors 
would
    > > at least give people a second thought before deciding to use it. The 
main
    > > challenge in my mind is figuring out how an MM use case would work. It
    > > would be more cumbersome to replicate headers through an interceptor,
    > > though arguably MM should be working at a lower level anyway.
    > >
    > > -Jason
    > >
    > > On Fri, Feb 17, 2017 at 10:16 AM, Michael Pearce <michael.pea...@ig.com>
    > > wrote:
    > >
    > >> Re headers available on the record  va interceptors only
    > >>
    > >> One of the key use cases requires access on consume at per 
event/message
    > >> level at the point that message is being processed, as such the batch
    > >> interceptors and batch consume api isn't suitable. It needs to be at 
the
    > >> record level.
    > >>
    > >> This anyhow is similar to jms/http/amqp where headers are available to
    > >> consuming applications.
    > >>
    > >> Re headers as byte array and future use by broker. This doesn't take
    > away
    > >> from that at all. Nor makes it difficult at all in my opinion.
    > >>
    > >>
    > >>
    > >> Sent using OWA for iPhone
    > >> ________________________________________
    > >> From: Jason Gustafson <ja...@confluent.io>
    > >> Sent: Friday, February 17, 2017 5:55:42 PM
    > >> To: dev@kafka.apache.org
    > >> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
    > >>
    > >> >
    > >> > Would you be proposing in KIP-98 to convert the other message int’s
    > (key
    > >> > length, value length) also to varint to keep it uniform.
    > >> > Also I assume there will be a static or helper method made to
    > write/read
    > >> > these in the client and server.
    > >>
    > >>
    > >> Yes, that is what we are proposing, so using varints for headers would
    > be
    > >> consistent with the rest of the message. We have used static helper
    > >> methods
    > >> in our prototype implementation.
    > >>
    > >> The cost of parsing, we want to parse/interpret the headers lazily 
(this
    > >> is
    > >> > a key point brought up earlier in discussions)
    > >>
    > >>
    > >> I'm a bit skeptical of this. Has anyone done the performance testing? I
    > >> can
    > >> probably implement it and test it if no one else has. I was also under
    > the
    > >> impression that there may be use cases down the road where the broker
    > >> would
    > >> need to interpret headers. That wouldn't be off the table in the future
    > if
    > >> it's represented as bytes, but it would be quite a bit more awkward,
    > >> right?
    > >>
    > >> By the way, one question I have been wondering about. My understanding
    > is
    > >> that headers are primarily for use cases where a third-party components
    > >> wants to enrich messages without needing to understand or modify the
    > >> schema
    > >> of the message key and value. For the applications which directly
    > produce
    > >> and consume the messages and control the key/value schema directly, it
    > >> seems we would rather have them implement headers directly in their own
    > >> schema. Supposing for the sake of argument that it was possible, my
    > >> question is whether it be sufficient to expose the headers in the
    > >> interceptor API and not in the common API?
    > >>
    > >> -Jason
    > >>
    > >> On Fri, Feb 17, 2017 at 3:26 AM, Michael Pearce <michael.pea...@ig.com>
    > >> wrote:
    > >>
    > >> > On the point of varInts
    > >> >
    > >> > Would you be proposing in KIP-98 to convert the other message int’s
    > (key
    > >> > length, value length) also to varint to keep it uniform.
    > >> > Also I assume there will be a static or helper method made to
    > write/read
    > >> > these in the client and server.
    > >> >
    > >> > Cheers
    > >> > Mike
    > >> >
    > >> >
    > >> >
    > >> > On 17/02/2017, 11:22, "Michael Pearce" <michael.pea...@ig.com> wrote:
    > >> >
    > >> >     On the point re: headers in the message protocol being a byte
    > array
    > >> > and not a count of elements followed by the elements. Again this was
    > >> > discussed/argued previously.
    > >> >
    > >> >     It was agreed on for a few reasons some of which you have
    > obviously
    > >> > picked up on:
    > >> >
    > >> >     Broker is able to pass it through opaquely
    > >> >     The cost of parsing, we want to parse/interpret the headers 
lazily
    > >> > (this is a key point brought up earlier in discussions)
    > >> >     Headers can be copied from consumer record to producer record 
(aka
    > >> > mirror makers etc) without parsing if no changes are being made or
    > being
    > >> > looked at.
    > >> >     Keeps the broker agnostic to the format
    > >> >     You need an int32 either for the byte size of the headers, or for
    > >> the
    > >> > count of elements, so overheads are the same, but with going with an
    > >> opaque
    > >> > byte array has the above advantages.
    > >> >
    > >> >     Cheers
    > >> >     Mike
    > >> >
    > >> >
    > >> >     On 17/02/2017, 02:50, "Jason Gustafson" <ja...@confluent.io>
    > wrote:
    > >> >
    > >> >         Sorry, should have noted that the performance testing was 
done
    > >> > using the
    > >> >         producer performance tool shipped with Kafka.
    > >> >
    > >> >         -Jason
    > >> >
    > >> >         On Thu, Feb 16, 2017 at 6:44 PM, Jason Gustafson <
    > >> > ja...@confluent.io> wrote:
    > >> >
    > >> >         > Hey Nacho,
    > >> >         >
    > >> >         > I've compared performance of our KIP-98 implementation with
    > >> and
    > >> > without
    > >> >         > varints. For messages around 128 bytes, we see an increase
    > in
    > >> > throughput of
    > >> >         > about 30% using the default configuration settings. At 256
    > >> > bytes, the
    > >> >         > increase is around 16%. Obviously the performance converge
    > as
    > >> > messages get
    > >> >         > larger, but it seems well worth the cost. Note that we are
    > >> also
    > >> > seeing a
    > >> >         > substantial performance increase against trunk primarily
    > >> because
    > >> > of the
    > >> >         > much more efficient packing that varints provide us.
    > Anything
    > >> > adding to
    > >> >         > message overhead, such as record headers, would only
    > increase
    > >> > the relative
    > >> >         > difference. (Of course take these numbers with a grain of
    > salt
    > >> > since I have
    > >> >         > only used the default settings with both the producer and
    > >> broker
    > >> > on my
    > >> >         > local machine. We intend to provide more extensive
    > performance
    > >> > details as
    > >> >         > part of the work for KIP-98.)
    > >> >         >
    > >> >         > The implementation we are using is from protobuf (
    > >> >         > https://developers.google.com/
    > protocol-buffers/docs/encoding
    > >> ),
    > >> > which is
    > >> >         > also used in HBase. It is trivial to implement and as far
    > as I
    > >> > know doesn't
    > >> >         > suffer from the aliasing problem you are describing. I
    > checked
    > >> > with Magnus
    > >> >         > (the author of librdkafka) and he agreed that the savings
    > >> seemed
    > >> > worth the
    > >> >         > cost of implementation.
    > >> >         >
    > >> >         > -Jason
    > >> >         >
    > >> >         > On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis <
    > >> iso...@igso.net>
    > >> > wrote:
    > >> >         >
    > >> >         >> -VarInts
    > >> >         >>
    > >> >         >> I'm one of the people (if not the most) opposed to 
VarInts.
    > >> > VarInts
    > >> >         >> have a place, but this is not it.   (We had a large
    > >> discussion
    > >> > about
    > >> >         >> them at the beginning of KIP-82 time)
    > >> >         >>
    > >> >         >> If anybody has real life performance numbers of VarInts
    > >> > improving
    > >> >         >> things or significantly reducing resources I would like to
    > >> know
    > >> > what
    > >> >         >> that case may be. Yes, you can save some bytes here and
    > >> there,
    > >> > but
    > >> >         >> this is probably insignificant to the overall system
    > behavior
    > >> > and
    > >> >         >> storage requirements.  -- I say this with respect to using
    > >> > VarInts in
    > >> >         >> the protocol itself, not as part of the data.
    > >> >         >>
    > >> >         >> VarInts require you to parse the Int before using it and
    > >> > depending on
    > >> >         >> the encoding they can suffer from aliasing (multiple
    > >> > representations
    > >> >         >> for the same value).
    > >> >         >>
    > >> >         >> Why add complexity?
    > >> >         >>
    > >> >         >> Nacho
    > >> >         >>
    > >> >         >>
    > >> >         >> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe <
    > >> > cmcc...@apache.org>
    > >> >         >> wrote:
    > >> >         >> > +1 for varints here-- it would save quite a bit of 
space.
    > >> > They are
    > >> >         >> > pretty quick to implement as well.
    > >> >         >> >
    > >> >         >> > I think it makes sense for values to be byte arrays.
    > Users
    > >> > might want
    > >> >         >> > to attach arbitrary payloads; they shouldn't be forced 
to
    > >> > serialize
    > >> >         >> > everything to Java strings.
    > >> >         >> >
    > >> >         >> > best,
    > >> >         >> > Colin
    > >> >         >> >
    > >> >         >> >
    > >> >         >> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
    > >> >         >> >> Hey Michael,
    > >> >         >> >>
    > >> >         >> >> Hmm, I guess the point of representing it as bytes is 
to
    > >> > allow the
    > >> >         >> broker
    > >> >         >> >> to pass it through opaquely? Is the cost of parsing
    > them a
    > >> > concern, or
    > >> >         >> >> are
    > >> >         >> >> we simply trying to ensure that the broker stays
    > agnostic
    > >> to
    > >> > the
    > >> >         >> format?
    > >> >         >> >>
    > >> >         >> >> On varints, I think adding support for them makes less
    > >> sense
    > >> > for an
    > >> >         >> >> isolated use case, but as part of a more holistic 
change
    > >> > (such as what
    > >> >         >> we
    > >> >         >> >> have proposed in KIP-98), I think they are justifiable.
    > If
    > >> > we add them,
    > >> >         >> >> then the need to use attributes becomes quite a bit
    > >> weaker,
    > >> > right? The
    > >> >         >> >> other thing I find slightly odd is the fact that null
    > >> > headers has no
    > >> >         >> >> actual
    > >> >         >> >> semantic meaning for the message (unlike null keys and
    > >> > values). It is
    > >> >         >> >> just
    > >> >         >> >> a space optimization. It seems a bit better to always
    > use
    > >> > size 0 to
    > >> >         >> >> indicate having no headers.
    > >> >         >> >>
    > >> >         >> >> Overall, the main point is ensuring that the message
    > >> schema
    > >> > remains
    > >> >         >> >> consistent, either within the larger protocol, or at a
    > >> > minimum within
    > >> >         >> the
    > >> >         >> >> message itself.
    > >> >         >> >>
    > >> >         >> >> -Jason
    > >> >         >> >>
    > >> >         >> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce <
    > >> > michael.pea...@ig.com
    > >> >         >> >
    > >> >         >> >> wrote:
    > >> >         >> >>
    > >> >         >> >> > Hi Jason,
    > >> >         >> >> >
    > >> >         >> >> > On point 1) in the message protocol the headers are
    > >> simply
    > >> > a byte
    > >> >         >> array,
    > >> >         >> >> > as like the key or value, this is to clearly 
demarcate
    > >> the
    > >> > header in
    > >> >         >> the
    > >> >         >> >> > core message. Then the header byte array in the core
    > >> > message is an
    > >> >         >> array of
    > >> >         >> >> > key, value pairs. This is what it is denoting.
    > >> >         >> >> >
    > >> >         >> >> > Then this would be I guess in the given notation:
    > >> >         >> >> >
    > >> >         >> >> > Headers => [KeyLength, Key, ValueLength, Value]
    > >> >         >> >> >     KeyLength => int32 <-----------------NEW size of
    > the
    > >> > byte[] of
    > >> >         >> the
    > >> >         >> >> > serialised key value
    > >> >         >> >> >     Key => bytes <---------------------- NEW
    > serialised
    > >> > string (UTF8)
    > >> >         >> >> > bytes of the header key
    > >> >         >> >> >     ValueLength => int32 <-------------- NEW size of
    > the
    > >> > byte[] of
    > >> >         >> the
    > >> >         >> >> > serialised header value
    > >> >         >> >> >     Value => bytes <-------------------- NEW
    > serialised
    > >> > form of the
    > >> >         >> header
    > >> >         >> >> > value
    > >> >         >> >> >
    > >> >         >> >> > The key length and value length is matching the way
    > the
    > >> > protocol is
    > >> >         >> >> > defined in the core message currently.
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> > On point 2)
    > >> >         >> >> > Var sized ints, this was discussed much earlier on, 
in
    > >> > fact I had
    > >> >         >> >> > suggested it myself (with Hadoop references), the
    > >> > complexity of this
    > >> >         >> >> > compared to having a simpler protocol was argued and
    > >> > agreed it
    > >> >         >> wasn’t worth
    > >> >         >> >> > the complexity as all other clients in other 
languages
    > >> > would need to
    > >> >         >> ensure
    > >> >         >> >> > theyre using the right var size algorithm, as there
    > is a
    > >> > few.
    > >> >         >> >> >
    > >> >         >> >> > On point 3)
    > >> >         >> >> > We did the attributes, optional approach as 
originally
    > >> > there was
    > >> >         >> marked
    > >> >         >> >> > concern that headers would cause a message size
    > overhead
    > >> > for others,
    > >> >         >> who
    > >> >         >> >> > don’t want them. As such this is the clean solution 
to
    > >> > achieve that.
    > >> >         >> If
    > >> >         >> >> > that no longer holds, and we don’t care that we add
    > >> 4bytes
    > >> > overhead,
    > >> >         >> then
    > >> >         >> >> > im happy to remove.
    > >> >         >> >> >
    > >> >         >> >> > I’m personally in favour of keeping the message as
    > small
    > >> > as possible
    > >> >         >> so
    > >> >         >> >> > people don’t get shocks in perf and throughputs dues
    > to
    > >> > message size,
    > >> >         >> >> > unless they actively use the feature, as such I do
    > >> prefer
    > >> > the
    > >> >         >> attribute bit
    > >> >         >> >> > wise feature flag approach myself.
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> > On 16/02/2017, 05:40, "Jason Gustafson" <
    > >> > ja...@confluent.io> wrote:
    > >> >         >> >> >
    > >> >         >> >> >     We have proposed a few significant changes to the
    > >> > message format
    > >> >         >> in
    > >> >         >> >> > KIP-98
    > >> >         >> >> >     which now seems likely to pass (perhaps with some
    > >> > iterations on
    > >> >         >> >> >     implementation details). It would be good to try
    > and
    > >> > coordinate
    > >> >         >> the
    > >> >         >> >> > changes
    > >> >         >> >> >     in both of the proposals to make sure they are
    > >> > consistent and
    > >> >         >> >> > compatible.
    > >> >         >> >> >
    > >> >         >> >> >     I think using the attributes to indicate null
    > >> headers
    > >> > is a
    > >> >         >> reasonable
    > >> >         >> >> >     approach. We have proposed to do the same thing
    > for
    > >> > the message
    > >> >         >> key and
    > >> >         >> >> >     value. That said, I sympathize with Jay's
    > argument.
    > >> > Having
    > >> >         >> multiple
    > >> >         >> >> > ways to
    > >> >         >> >> >     specify a null value increases the overall
    > >> complexity
    > >> > of the
    > >> >         >> protocol.
    > >> >         >> >> > You
    > >> >         >> >> >     can see this just from the fact that you need the
    > >> > extra verbiage
    > >> >         >> in the
    > >> >         >> >> >     protocol specification in this KIP and in KIP-98
    > to
    > >> > describe the
    > >> >         >> >> > dependence
    > >> >         >> >> >     between the fields and the attributes. It seems
    > >> like a
    > >> > slippery
    > >> >         >> slope
    > >> >         >> >> > if
    > >> >         >> >> >     you start allowing different request types to
    > >> > implement the
    > >> >         >> protocol
    > >> >         >> >> >     specification differently.
    > >> >         >> >> >
    > >> >         >> >> >     You can also argue that the messages already are
    > and
    > >> > are likely
    > >> >         >> to
    > >> >         >> >> > remain a
    > >> >         >> >> >     special case. For example, there is currently no
    > >> > generality in
    > >> >         >> how
    > >> >         >> >> >     compressed message sets are represented that 
would
    > >> be
    > >> > applicable
    > >> >         >> for
    > >> >         >> >> > other
    > >> >         >> >> >     request types. Some might see this divergence as
    > an
    > >> > unfortunate
    > >> >         >> >> > protocol
    > >> >         >> >> >     deficiency which should be fixed; others might 
see
    > >> it
    > >> > as sort of
    > >> >         >> the
    > >> >         >> >> >     inevitability of needing to optimize where it
    > counts
    > >> > most. I'm
    > >> >         >> probably
    > >> >         >> >> >     somewhere in between, but I think we probably all
    > >> > share the
    > >> >         >> intuition
    > >> >         >> >> > that
    > >> >         >> >> >     the protocol should be kept as consistent as
    > >> possible.
    > >> > With that
    > >> >         >> in
    > >> >         >> >> > mind,
    > >> >         >> >> >     here are a few comments:
    > >> >         >> >> >
    > >> >         >> >> >     1. One thing I found a little odd when reading 
the
    > >> > current
    > >> >         >> proposal is
    > >> >         >> >> > that
    > >> >         >> >> >     the headers are both represented as an array of
    > >> bytes
    > >> > and as an
    > >> >         >> array
    > >> >         >> >> > of
    > >> >         >> >> >     key/value pairs. I'd probably suggest something
    > like
    > >> > this:
    > >> >         >> >> >
    > >> >         >> >> >     Headers => [HeaderKey HeaderValue]
    > >> >         >> >> >      HeaderKey => String
    > >> >         >> >> >      HeaderValue => Bytes
    > >> >         >> >> >
    > >> >         >> >> >     An array in the Kafka protocol is represented as 
a
    > >> > 4-byte integer
    > >> >         >> >> >     indicating the number of elements in the array
    > >> > followed by the
    > >> >         >> >> >     serialization of the elements. Unless I'm
    > >> > misunderstanding, what
    > >> >         >> you
    > >> >         >> >> > have
    > >> >         >> >> >     instead is the total size of the headers in bytes
    > >> > followed by the
    > >> >         >> >> > elements.
    > >> >         >> >> >     I'm not sure I see any reason for this
    > >> inconsistency.
    > >> >         >> >> >
    > >> >         >> >> >     2. In KIP-98, we've introduced variable-length
    > >> integer
    > >> > fields.
    > >> >         >> >> > Effectively,
    > >> >         >> >> >     we've enriched (or "complicated" as Jay might say
    > ;)
    > >> > the protocol
    > >> >         >> >> >     specification to include the following types:
    > >> VarInt,
    > >> > VarLong,
    > >> >         >> >> >     UnsignedVarInt and UnsignedVarLong.
    > >> >         >> >> >
    > >> >         >> >> >     Along with these primitives, we could introduce
    > the
    > >> > following
    > >> >         >> types:
    > >> >         >> >> >
    > >> >         >> >> >     VarSizeArray => NumberOfItems Item1 Item2 .. 
ItemN
    > >> >         >> >> >       NumberOfItems => UnsignedVarInt
    > >> >         >> >> >
    > >> >         >> >> >     VarSizeNullableArray => NumberOfItemsOrNull Item1
    > >> > Item2 .. ItemN
    > >> >         >> >> >       NumberOfItemsOrNull => VarInt (-1 means null)
    > >> >         >> >> >
    > >> >         >> >> >     And similarly for the `String` and `Bytes` types.
    > >> > These types
    > >> >         >> can save
    > >> >         >> >> > a
    > >> >         >> >> >     considerable amount of space in this proposal
    > >> because
    > >> > they can
    > >> >         >> be used
    > >> >         >> >> > for
    > >> >         >> >> >     both the number of headers included in the 
message
    > >> and
    > >> > the
    > >> >         >> lengths of
    > >> >         >> >> > the
    > >> >         >> >> >     header keys and values. We could do this instead:
    > >> >         >> >> >
    > >> >         >> >> >     Headers => VarSizeArray[HeaderKey HeaderValue]
    > >> >         >> >> >       HeaderKey => VarSizeString
    > >> >         >> >> >       HeaderValue => VarSizeBytes
    > >> >         >> >> >
    > >> >         >> >> >     Combining the savings from the use of variable
    > >> length
    > >> > fields, the
    > >> >         >> >> > benefit
    > >> >         >> >> >     of using the attributes to represent null seems
    > >> pretty
    > >> > small.
    > >> >         >> >> >
    > >> >         >> >> >     3. Whichever way we go (whether we use the
    > >> attributes
    > >> > or not), we
    > >> >         >> >> > should at
    > >> >         >> >> >     least be consistent between this KIP and KIP-98.
    > It
    > >> > would be very
    > >> >         >> >> > strange
    > >> >         >> >> >     to have two ways to represent null values in the
    > >> same
    > >> > schema.
    > >> >         >> Either
    > >> >         >> >> > way is
    > >> >         >> >> >     OK with me. I think some message-level
    > optimizations
    > >> > are
    > >> >         >> justifiable,
    > >> >         >> >> > but
    > >> >         >> >> >     the savings here seem minimal (a few bytes per
    > >> > message), so
    > >> >         >> maybe it's
    > >> >         >> >> > not
    > >> >         >> >> >     worth the cost of letting the message diverge 
even
    > >> > further from
    > >> >         >> the
    > >> >         >> >> > rest of
    > >> >         >> >> >     the protocol.
    > >> >         >> >> >
    > >> >         >> >> >     -Jason
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> >     On Wed, Feb 15, 2017 at 8:52 AM, radai <
    > >> >         >> radai.rosenbl...@gmail.com>
    > >> >         >> >> > wrote:
    > >> >         >> >> >
    > >> >         >> >> >     > I've trimmed the inline contents as this mail 
is
    > >> > getting too
    > >> >         >> big for
    > >> >         >> >> > the
    > >> >         >> >> >     > apache mailing list software to deliver :-(
    > >> >         >> >> >     >
    > >> >         >> >> >     > 1. the important thing for interoperability is
    > for
    > >> > different
    > >> >         >> >> > "interested
    > >> >         >> >> >     > parties" (plugins, infra layers/wrappers,
    > >> user-code)
    > >> > to be
    > >> >         >> able to
    > >> >         >> >> > stick
    > >> >         >> >> >     > pieces of metadata onto msgs without getting in
    > >> each
    > >> > other's
    > >> >         >> way. a
    > >> >         >> >> > common
    > >> >         >> >> >     > key scheme (Strings, as of the time of this
    > >> > writing?) is all
    > >> >         >> thats
    > >> >         >> >> > required
    > >> >         >> >> >     > for that. it is assumed that the other end
    > >> > interested in any
    > >> >         >> such
    > >> >         >> >> > piece of
    > >> >         >> >> >     > metadata knows the encoding, and byte[] 
provides
    > >> for
    > >> > the most
    > >> >         >> >> > flexibility.
    > >> >         >> >> >     > i believe this is the same logic behind core
    > kafka
    > >> > being
    > >> >         >> >> > byte[]/byte[] -
    > >> >         >> >> >     > Strings are more "usable" but bytes are 
flexible
    > >> and
    > >> > so were
    > >> >         >> chosen.
    > >> >         >> >> >     > Also - core kafka doesnt even do that good of a
    > >> job
    > >> > on
    > >> >         >> usability of
    > >> >         >> >> > the
    > >> >         >> >> >     > payload (example - i have to specify the nop
    > >> byte[]
    > >> > "decoders"
    > >> >         >> >> > explicitly
    > >> >         >> >> >     > in conf), and again sacrificies usability for
    > the
    > >> > sake of
    > >> >         >> >> > performance (no
    > >> >         >> >> >     > convenient single-record processing as poll is 
a
    > >> > batch, lots of
    > >> >         >> >> > obscure
    > >> >         >> >> >     > little config details exposing internals of the
    > >> > batching
    > >> >         >> mechanism,
    > >> >         >> >> > etc)
    > >> >         >> >> >     >
    > >> >         >> >> >     > this is also why i really dislike the idea of a
    > >> > "type system"
    > >> >         >> for
    > >> >         >> >> > header
    > >> >         >> >> >     > values, it further degrades the usability, adds
    > >> > complexity and
    > >> >         >> will
    > >> >         >> >> >     > eventually get in people's way, also, it would
    > be
    > >> > the 2nd/3rd
    > >> >         >> >> > home-group
    > >> >         >> >> >     > serialization mechanism in core kafka (counting
    > 2
    > >> > iterations
    > >> >         >> of the
    > >> >         >> >> > "type
    > >> >         >> >> >     > definition DSL")
    > >> >         >> >> >     >
    > >> >         >> >> >     > 2. this is an implementation detail, and not
    > even
    > >> a
    > >> > very "user
    > >> >         >> >> > facing" one?
    > >> >         >> >> >     > to the best of my understanding the vote 
process
    > >> is
    > >> > on proposed
    > >> >         >> >> >     > API/behaviour. also - since we're willing to go
    > >> with
    > >> > strings
    > >> >         >> just
    > >> >         >> >> > serialize
    > >> >         >> >> >     > a 0-sized header blob and IIUC you dont need 
any
    > >> > optionals
    > >> >         >> anymore.
    > >> >         >> >> >     >
    > >> >         >> >> >     > 3. yes, we can :-)
    > >> >         >> >> >     >
    > >> >         >> >> >     > On Tue, Feb 14, 2017 at 11:56 PM, Michael
    > Pearce <
    > >> >         >> >> > michael.pea...@ig.com>
    > >> >         >> >> >     > wrote:
    > >> >         >> >> >     >
    > >> >         >> >> >     > > Hi Jay,
    > >> >         >> >> >     > >
    > >> >         >> >> >     > > 1) There was some initial debate on the value
    > >> > part, as youll
    > >> >         >> note
    > >> >         >> >> > String,
    > >> >         >> >> >     > > String headers were discounted early on. The
    > >> > reason for this
    > >> >         >> is
    > >> >         >> >> >     > flexibility
    > >> >         >> >> >     > > and keeping in line with the flexibility of
    > key,
    > >> > value of the
    > >> >         >> >> > message
    > >> >         >> >> >     > > object itself. I don’t think it takes away
    > from
    > >> an
    > >> > ecosystem
    > >> >         >> as
    > >> >         >> >> > each
    > >> >         >> >> >     > plugin
    > >> >         >> >> >     > > will care for their own key, this way ints,
    > >> > booleans , exotic
    > >> >         >> >> > custom
    > >> >         >> >> >     > binary
    > >> >         >> >> >     > > can all be catered for=.
    > >> >         >> >> >     > > a. If you really wanted to push for a typed
    > >> value
    > >> > interface,
    > >> >         >> I
    > >> >         >> >> > wouldn’t
    > >> >         >> >> >     > > want just String values supported, but the 
the
    > >> > primatives
    > >> >         >> plus
    > >> >         >> >> > string and
    > >> >         >> >> >     > > also still keeping the ability to have a
    > binary
    > >> > for custom
    > >> >         >> >> > binaries that
    > >> >         >> >> >     > > some organisations may have.
    > >> >         >> >> >     > > i. I have written this slight alternative
    > here,
    > >> >         >> >> >     > https://cwiki.apache.org/
    > >> >         >> >> >     > > confluence/display/KAFKA/KIP-
    > >> > 82+-+Add+Record+Headers+-+Typed
    > >> >         >> >> >     > > ii. Essentially the value bytes, has a 
leading
    > >> > byte overhead.
    > >> >         >> >> >     > > 1.  This tells you what type the value is,
    > >> before
    > >> > reading
    > >> >         >> the rest
    > >> >         >> >> > of the
    > >> >         >> >> >     > > bytes, allowing serialisation/deserialization
    > to
    > >> > and from the
    > >> >         >> >> > primitives,
    > >> >         >> >> >     > > string and byte[]. This is akin to some other
    > >> > messaging
    > >> >         >> systems.
    > >> >         >> >> >     > > 2) We are making it optional, so that for
    > those
    > >> > not wanting
    > >> >         >> >> > headers have
    > >> >         >> >> >     > 0
    > >> >         >> >> >     > > bytes overhead (think of it as a feature
    > flag),
    > >> I
    > >> > don’t
    > >> >         >> think this
    > >> >         >> >> > is
    > >> >         >> >> >     > > complex, especially if comparing to changes
    > >> > proposed in
    > >> >         >> other kips
    > >> >         >> >> > like
    > >> >         >> >> >     > > kip-98.
    > >> >         >> >> >     > > a. If you really really don’t like this, we
    > can
    > >> > drop it, but
    > >> >         >> it
    > >> >         >> >> > would
    > >> >         >> >> >     > mean
    > >> >         >> >> >     > > buying into 4 bytes extra overhead for users
    > who
    > >> > do not want
    > >> >         >> to use
    > >> >         >> >> >     > headers.
    > >> >         >> >> >     > > 3) In the summary yes, it is at a higher
    > level,
    > >> > but I think
    > >> >         >> this
    > >> >         >> >> > is well
    > >> >         >> >> >     > > documented in the proposed changes section.
    > >> >         >> >> >     > > a. Added getHeaders method to
    > Producer/Consumer
    > >> > record (that
    > >> >         >> is it)
    > >> >         >> >> >     > > b. We’ve also detailed the new Headers class
    > >> that
    > >> > this method
    > >> >         >> >> > returns
    > >> >         >> >> >     > that
    > >> >         >> >> >     > > encapsulates the headers protocol and logic.
    > >> >         >> >> >     > >
    > >> >         >> >> >     > > Best,
    > >> >         >> >> >     > > Mike
    > >> >         >> >> >     > >
    > >> >         >> >> >     > > ==Original questions from the vote thread 
from
    > >> > Jay.==
    > >> >         >> >> >     > >
    > >> >         >> >> >     > > Couple of things I think we still need to 
work
    > >> out:
    > >> >         >> >> >     > >
    > >> >         >> >> >     > >    1. I think we agree about the key, but I
    > >> think
    > >> > we haven't
    > >> >         >> >> > talked about
    > >> >         >> >> >     > >    the value yet. I think if our goal is an
    > open
    > >> > ecosystem
    > >> >         >> of these
    > >> >         >> >> >     > header
    > >> >         >> >> >     > >    spread across many plugins from many
    > systems
    > >> we
    > >> > should
    > >> >         >> consider
    > >> >         >> >> > making
    > >> >         >> >> >     > > this
    > >> >         >> >> >     > >    a string as well so it can be printed, set
    > >> via
    > >> > a UI, set
    > >> >         >> in
    > >> >         >> >> > config,
    > >> >         >> >> >     > etc.
    > >> >         >> >> >     > >    Basically encouraging pluggable
    > serialization
    > >> > formats
    > >> >         >> here will
    > >> >         >> >> > lead
    > >> >         >> >> >     > to
    > >> >         >> >> >     > > a
    > >> >         >> >> >     > >    bit of a tower of babel.
    > >> >         >> >> >     > >    2. This proposal still includes a pretty
    > big
    > >> > change to our
    > >> >         >> >> >     > serialization
    > >> >         >> >> >     > >    and protocol definition layer. Essentially
    > >> it is
    > >> >         >> introducing an
    > >> >         >> >> >     > optional
    > >> >         >> >> >     > >    type, where the format is data dependent. 
I
    > >> > think this is
    > >> >         >> >> > actually a
    > >> >         >> >> >     > big
    > >> >         >> >> >     > >    change though it doesn't seem like it. It
    > >> means
    > >> > you can no
    > >> >         >> >> > longer
    > >> >         >> >> >     > > specify
    > >> >         >> >> >     > >    this type with our type definition DSL, 
and
    > >> > likewise it
    > >> >         >> requires
    > >> >         >> >> >     > custom
    > >> >         >> >> >     > >    handling in client libs. This isn't a huge
    > >> > thing, since
    > >> >         >> the
    > >> >         >> >> > Record
    > >> >         >> >> >     > >    definition is custom anyway, but I think
    > this
    > >> > kind of
    > >> >         >> protocol
    > >> >         >> >> >     > >    inconsistency is very non-desirable and
    > ties
    > >> > you to
    > >> >         >> hand-coding
    > >> >         >> >> >     > things.
    > >> >         >> >> >     > > I
    > >> >         >> >> >     > >    think the type should instead by [Key
    > Value]
    > >> in
    > >> > our BNF,
    > >> >         >> where
    > >> >         >> >> > key and
    > >> >         >> >> >     > >    value are both short strings as used
    > >> elsewhere.
    > >> > This
    > >> >         >> brings it
    > >> >         >> >> > in line
    > >> >         >> >> >     > > with
    > >> >         >> >> >     > >    the rest of the protocol.
    > >> >         >> >> >     > >    3. Could we get more specific about the
    > exact
    > >> > Java API
    > >> >         >> change to
    > >> >         >> >> >     > >    ProducerRecord, ConsumerRecord, Record,
    > etc?
    > >> >         >> >> >     > >
    > >> >         >> >> >     > > -Jay
    > >> >         >> >> >     > >
    > >> >         >> >> >     >
    > >> >         >> >> >
    > >> >         >> >> >
    > >> >         >> >> > The information contained in this email is strictly
    > >> > confidential and
    > >> >         >> for
    > >> >         >> >> > the use of the addressee only, unless otherwise
    > >> indicated.
    > >> > If you
    > >> >         >> are not
    > >> >         >> >> > the intended recipient, please do not read, copy, use
    > or
    > >> > disclose to
    > >> >         >> others
    > >> >         >> >> > this message or any attachment. Please also notify 
the
    > >> > sender by
    > >> >         >> replying
    > >> >         >> >> > to this email or by telephone (+44(020 7896 0011) and
    > >> then
    > >> > delete
    > >> >         >> the email
    > >> >         >> >> > and any copies of it. Opinions, conclusion (etc) that
    > do
    > >> > not relate
    > >> >         >> to the
    > >> >         >> >> > official business of this company shall be understood
    > as
    > >> > neither
    > >> >         >> given nor
    > >> >         >> >> > endorsed by it. IG is a trading name of IG Markets
    > >> Limited
    > >> > (a company
    > >> >         >> >> > registered in England and Wales, company number
    > >> 04008957)
    > >> > and IG
    > >> >         >> Index
    > >> >         >> >> > Limited (a company registered in England and Wales,
    > >> > company number
    > >> >         >> >> > 01190902). Registered address at Cannon Bridge House,
    > 25
    > >> > Dowgate
    > >> >         >> Hill,
    > >> >         >> >> > London EC4R 2YA. Both IG Markets Limited (register
    > >> number
    > >> > 195355)
    > >> >         >> and IG
    > >> >         >> >> > Index Limited (register number 114059) are authorised
    > >> and
    > >> > regulated
    > >> >         >> by the
    > >> >         >> >> > Financial Conduct Authority.
    > >> >         >> >> >
    > >> >         >>
    > >> >         >>
    > >> >         >>
    > >> >         >> --
    > >> >         >> Nacho - Ignacio Solis - iso...@igso.net
    > >> >         >>
    > >> >         >
    > >> >         >
    > >> >
    > >> >
    > >> >
    > >> >
    > >> > The information contained in this email is strictly confidential and
    > for
    > >> > the use of the addressee only, unless otherwise indicated. If you are
    > >> not
    > >> > the intended recipient, please do not read, copy, use or disclose to
    > >> others
    > >> > this message or any attachment. Please also notify the sender by
    > >> replying
    > >> > to this email or by telephone (+44(020 7896 0011) and then delete the
    > >> email
    > >> > and any copies of it. Opinions, conclusion (etc) that do not relate 
to
    > >> the
    > >> > official business of this company shall be understood as neither 
given
    > >> nor
    > >> > endorsed by it. IG is a trading name of IG Markets Limited (a company
    > >> > registered in England and Wales, company number 04008957) and IG 
Index
    > >> > Limited (a company registered in England and Wales, company number
    > >> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate 
Hill,
    > >> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and
    > IG
    > >> > Index Limited (register number 114059) are authorised and regulated 
by
    > >> the
    > >> > Financial Conduct Authority.
    > >> >
    > >> The information contained in this email is strictly confidential and 
for
    > >> the use of the addressee only, unless otherwise indicated. If you are
    > not
    > >> the intended recipient, please do not read, copy, use or disclose to
    > others
    > >> this message or any attachment. Please also notify the sender by
    > replying
    > >> to this email or by telephone (+44(020 7896 0011) and then delete the
    > email
    > >> and any copies of it. Opinions, conclusion (etc) that do not relate to
    > the
    > >> official business of this company shall be understood as neither given
    > nor
    > >> endorsed by it. IG is a trading name of IG Markets Limited (a company
    > >> registered in England and Wales, company number 04008957) and IG Index
    > >> Limited (a company registered in England and Wales, company number
    > >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
    > >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and 
IG
    > >> Index Limited (register number 114059) are authorised and regulated by
    > the
    > >> Financial Conduct Authority.
    > >>
    > >
    > >
    >
    The information contained in this email is strictly confidential and for 
the use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to