Hey Nacho,

I've compared performance of our KIP-98 implementation with and without
varints. For messages around 128 bytes, we see an increase in throughput of
about 30% using the default configuration settings. At 256 bytes, the
increase is around 16%. Obviously the performance converge as messages get
larger, but it seems well worth the cost. Note that we are also seeing a
substantial performance increase against trunk primarily because of the
much more efficient packing that varints provide us. Anything adding to
message overhead, such as record headers, would only increase the relative
difference. (Of course take these numbers with a grain of salt since I have
only used the default settings with both the producer and broker on my
local machine. We intend to provide more extensive performance details as
part of the work for KIP-98.)

The implementation we are using is from protobuf (
https://developers.google.com/protocol-buffers/docs/encoding), which is
also used in HBase. It is trivial to implement and as far as I know doesn't
suffer from the aliasing problem you are describing. I checked with Magnus
(the author of librdkafka) and he agreed that the savings seemed worth the
cost of implementation.

-Jason

On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis <iso...@igso.net> wrote:

> -VarInts
>
> I'm one of the people (if not the most) opposed to VarInts.  VarInts
> have a place, but this is not it.   (We had a large discussion about
> them at the beginning of KIP-82 time)
>
> If anybody has real life performance numbers of VarInts improving
> things or significantly reducing resources I would like to know what
> that case may be. Yes, you can save some bytes here and there, but
> this is probably insignificant to the overall system behavior and
> storage requirements.  -- I say this with respect to using VarInts in
> the protocol itself, not as part of the data.
>
> VarInts require you to parse the Int before using it and depending on
> the encoding they can suffer from aliasing (multiple representations
> for the same value).
>
> Why add complexity?
>
> Nacho
>
>
> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe <cmcc...@apache.org> wrote:
> > +1 for varints here-- it would save quite a bit of space.  They are
> > pretty quick to implement as well.
> >
> > I think it makes sense for values to be byte arrays.  Users might want
> > to attach arbitrary payloads; they shouldn't be forced to serialize
> > everything to Java strings.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote:
> >> Hey Michael,
> >>
> >> Hmm, I guess the point of representing it as bytes is to allow the
> broker
> >> to pass it through opaquely? Is the cost of parsing them a concern, or
> >> are
> >> we simply trying to ensure that the broker stays agnostic to the format?
> >>
> >> On varints, I think adding support for them makes less sense for an
> >> isolated use case, but as part of a more holistic change (such as what
> we
> >> have proposed in KIP-98), I think they are justifiable. If we add them,
> >> then the need to use attributes becomes quite a bit weaker, right? The
> >> other thing I find slightly odd is the fact that null headers has no
> >> actual
> >> semantic meaning for the message (unlike null keys and values). It is
> >> just
> >> a space optimization. It seems a bit better to always use size 0 to
> >> indicate having no headers.
> >>
> >> Overall, the main point is ensuring that the message schema remains
> >> consistent, either within the larger protocol, or at a minimum within
> the
> >> message itself.
> >>
> >> -Jason
> >>
> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce <michael.pea...@ig.com>
> >> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > On point 1) in the message protocol the headers are simply a byte
> array,
> >> > as like the key or value, this is to clearly demarcate the header in
> the
> >> > core message. Then the header byte array in the core message is an
> array of
> >> > key, value pairs. This is what it is denoting.
> >> >
> >> > Then this would be I guess in the given notation:
> >> >
> >> > Headers => [KeyLength, Key, ValueLength, Value]
> >> >     KeyLength => int32 <-----------------NEW size of the byte[] of the
> >> > serialised key value
> >> >     Key => bytes <---------------------- NEW serialised string (UTF8)
> >> > bytes of the header key
> >> >     ValueLength => int32 <-------------- NEW size of the byte[] of the
> >> > serialised header value
> >> >     Value => bytes <-------------------- NEW serialised form of the
> header
> >> > value
> >> >
> >> > The key length and value length is matching the way the protocol is
> >> > defined in the core message currently.
> >> >
> >> >
> >> >
> >> >
> >> > On point 2)
> >> > Var sized ints, this was discussed much earlier on, in fact I had
> >> > suggested it myself (with Hadoop references), the complexity of this
> >> > compared to having a simpler protocol was argued and agreed it wasn’t
> worth
> >> > the complexity as all other clients in other languages would need to
> ensure
> >> > theyre using the right var size algorithm, as there is a few.
> >> >
> >> > On point 3)
> >> > We did the attributes, optional approach as originally there was
> marked
> >> > concern that headers would cause a message size overhead for others,
> who
> >> > don’t want them. As such this is the clean solution to achieve that.
> If
> >> > that no longer holds, and we don’t care that we add 4bytes overhead,
> then
> >> > im happy to remove.
> >> >
> >> > I’m personally in favour of keeping the message as small as possible
> so
> >> > people don’t get shocks in perf and throughputs dues to message size,
> >> > unless they actively use the feature, as such I do prefer the
> attribute bit
> >> > wise feature flag approach myself.
> >> >
> >> >
> >> >
> >> >
> >> > On 16/02/2017, 05:40, "Jason Gustafson" <ja...@confluent.io> wrote:
> >> >
> >> >     We have proposed a few significant changes to the message format
> in
> >> > KIP-98
> >> >     which now seems likely to pass (perhaps with some iterations on
> >> >     implementation details). It would be good to try and coordinate
> the
> >> > changes
> >> >     in both of the proposals to make sure they are consistent and
> >> > compatible.
> >> >
> >> >     I think using the attributes to indicate null headers is a
> reasonable
> >> >     approach. We have proposed to do the same thing for the message
> key and
> >> >     value. That said, I sympathize with Jay's argument. Having
> multiple
> >> > ways to
> >> >     specify a null value increases the overall complexity of the
> protocol.
> >> > You
> >> >     can see this just from the fact that you need the extra verbiage
> in the
> >> >     protocol specification in this KIP and in KIP-98 to describe the
> >> > dependence
> >> >     between the fields and the attributes. It seems like a slippery
> slope
> >> > if
> >> >     you start allowing different request types to implement the
> protocol
> >> >     specification differently.
> >> >
> >> >     You can also argue that the messages already are and are likely to
> >> > remain a
> >> >     special case. For example, there is currently no generality in how
> >> >     compressed message sets are represented that would be applicable
> for
> >> > other
> >> >     request types. Some might see this divergence as an unfortunate
> >> > protocol
> >> >     deficiency which should be fixed; others might see it as sort of
> the
> >> >     inevitability of needing to optimize where it counts most. I'm
> probably
> >> >     somewhere in between, but I think we probably all share the
> intuition
> >> > that
> >> >     the protocol should be kept as consistent as possible. With that
> in
> >> > mind,
> >> >     here are a few comments:
> >> >
> >> >     1. One thing I found a little odd when reading the current
> proposal is
> >> > that
> >> >     the headers are both represented as an array of bytes and as an
> array
> >> > of
> >> >     key/value pairs. I'd probably suggest something like this:
> >> >
> >> >     Headers => [HeaderKey HeaderValue]
> >> >      HeaderKey => String
> >> >      HeaderValue => Bytes
> >> >
> >> >     An array in the Kafka protocol is represented as a 4-byte integer
> >> >     indicating the number of elements in the array followed by the
> >> >     serialization of the elements. Unless I'm misunderstanding, what
> you
> >> > have
> >> >     instead is the total size of the headers in bytes followed by the
> >> > elements.
> >> >     I'm not sure I see any reason for this inconsistency.
> >> >
> >> >     2. In KIP-98, we've introduced variable-length integer fields.
> >> > Effectively,
> >> >     we've enriched (or "complicated" as Jay might say ;) the protocol
> >> >     specification to include the following types: VarInt, VarLong,
> >> >     UnsignedVarInt and UnsignedVarLong.
> >> >
> >> >     Along with these primitives, we could introduce the following
> types:
> >> >
> >> >     VarSizeArray => NumberOfItems Item1 Item2 .. ItemN
> >> >       NumberOfItems => UnsignedVarInt
> >> >
> >> >     VarSizeNullableArray => NumberOfItemsOrNull Item1 Item2 .. ItemN
> >> >       NumberOfItemsOrNull => VarInt (-1 means null)
> >> >
> >> >     And similarly for the `String` and `Bytes` types. These types can
> save
> >> > a
> >> >     considerable amount of space in this proposal because they can be
> used
> >> > for
> >> >     both the number of headers included in the message and the
> lengths of
> >> > the
> >> >     header keys and values. We could do this instead:
> >> >
> >> >     Headers => VarSizeArray[HeaderKey HeaderValue]
> >> >       HeaderKey => VarSizeString
> >> >       HeaderValue => VarSizeBytes
> >> >
> >> >     Combining the savings from the use of variable length fields, the
> >> > benefit
> >> >     of using the attributes to represent null seems pretty small.
> >> >
> >> >     3. Whichever way we go (whether we use the attributes or not), we
> >> > should at
> >> >     least be consistent between this KIP and KIP-98. It would be very
> >> > strange
> >> >     to have two ways to represent null values in the same schema.
> Either
> >> > way is
> >> >     OK with me. I think some message-level optimizations are
> justifiable,
> >> > but
> >> >     the savings here seem minimal (a few bytes per message), so maybe
> it's
> >> > not
> >> >     worth the cost of letting the message diverge even further from
> the
> >> > rest of
> >> >     the protocol.
> >> >
> >> >     -Jason
> >> >
> >> >
> >> >     On Wed, Feb 15, 2017 at 8:52 AM, radai <
> radai.rosenbl...@gmail.com>
> >> > wrote:
> >> >
> >> >     > I've trimmed the inline contents as this mail is getting too
> big for
> >> > the
> >> >     > apache mailing list software to deliver :-(
> >> >     >
> >> >     > 1. the important thing for interoperability is for different
> >> > "interested
> >> >     > parties" (plugins, infra layers/wrappers, user-code) to be able
> to
> >> > stick
> >> >     > pieces of metadata onto msgs without getting in each other's
> way. a
> >> > common
> >> >     > key scheme (Strings, as of the time of this writing?) is all
> thats
> >> > required
> >> >     > for that. it is assumed that the other end interested in any
> such
> >> > piece of
> >> >     > metadata knows the encoding, and byte[] provides for the most
> >> > flexibility.
> >> >     > i believe this is the same logic behind core kafka being
> >> > byte[]/byte[] -
> >> >     > Strings are more "usable" but bytes are flexible and so were
> chosen.
> >> >     > Also - core kafka doesnt even do that good of a job on
> usability of
> >> > the
> >> >     > payload (example - i have to specify the nop byte[] "decoders"
> >> > explicitly
> >> >     > in conf), and again sacrificies usability for the sake of
> >> > performance (no
> >> >     > convenient single-record processing as poll is a batch, lots of
> >> > obscure
> >> >     > little config details exposing internals of the batching
> mechanism,
> >> > etc)
> >> >     >
> >> >     > this is also why i really dislike the idea of a "type system"
> for
> >> > header
> >> >     > values, it further degrades the usability, adds complexity and
> will
> >> >     > eventually get in people's way, also, it would be the 2nd/3rd
> >> > home-group
> >> >     > serialization mechanism in core kafka (counting 2 iterations of
> the
> >> > "type
> >> >     > definition DSL")
> >> >     >
> >> >     > 2. this is an implementation detail, and not even a very "user
> >> > facing" one?
> >> >     > to the best of my understanding the vote process is on proposed
> >> >     > API/behaviour. also - since we're willing to go with strings
> just
> >> > serialize
> >> >     > a 0-sized header blob and IIUC you dont need any optionals
> anymore.
> >> >     >
> >> >     > 3. yes, we can :-)
> >> >     >
> >> >     > On Tue, Feb 14, 2017 at 11:56 PM, Michael Pearce <
> >> > michael.pea...@ig.com>
> >> >     > wrote:
> >> >     >
> >> >     > > Hi Jay,
> >> >     > >
> >> >     > > 1) There was some initial debate on the value part, as youll
> note
> >> > String,
> >> >     > > String headers were discounted early on. The reason for this
> is
> >> >     > flexibility
> >> >     > > and keeping in line with the flexibility of key, value of the
> >> > message
> >> >     > > object itself. I don’t think it takes away from an ecosystem
> as
> >> > each
> >> >     > plugin
> >> >     > > will care for their own key, this way ints, booleans , exotic
> >> > custom
> >> >     > binary
> >> >     > > can all be catered for=.
> >> >     > > a. If you really wanted to push for a typed value interface, I
> >> > wouldn’t
> >> >     > > want just String values supported, but the the primatives plus
> >> > string and
> >> >     > > also still keeping the ability to have a binary for custom
> >> > binaries that
> >> >     > > some organisations may have.
> >> >     > > i. I have written this slight alternative here,
> >> >     > https://cwiki.apache.org/
> >> >     > > confluence/display/KAFKA/KIP-82+-+Add+Record+Headers+-+Typed
> >> >     > > ii. Essentially the value bytes, has a leading byte overhead.
> >> >     > > 1.  This tells you what type the value is, before reading the
> rest
> >> > of the
> >> >     > > bytes, allowing serialisation/deserialization to and from the
> >> > primitives,
> >> >     > > string and byte[]. This is akin to some other messaging
> systems.
> >> >     > > 2) We are making it optional, so that for those not wanting
> >> > headers have
> >> >     > 0
> >> >     > > bytes overhead (think of it as a feature flag), I don’t think
> this
> >> > is
> >> >     > > complex, especially if comparing to changes proposed in other
> kips
> >> > like
> >> >     > > kip-98.
> >> >     > > a. If you really really don’t like this, we can drop it, but
> it
> >> > would
> >> >     > mean
> >> >     > > buying into 4 bytes extra overhead for users who do not want
> to use
> >> >     > headers.
> >> >     > > 3) In the summary yes, it is at a higher level, but I think
> this
> >> > is well
> >> >     > > documented in the proposed changes section.
> >> >     > > a. Added getHeaders method to Producer/Consumer record (that
> is it)
> >> >     > > b. We’ve also detailed the new Headers class that this method
> >> > returns
> >> >     > that
> >> >     > > encapsulates the headers protocol and logic.
> >> >     > >
> >> >     > > Best,
> >> >     > > Mike
> >> >     > >
> >> >     > > ==Original questions from the vote thread from Jay.==
> >> >     > >
> >> >     > > Couple of things I think we still need to work out:
> >> >     > >
> >> >     > >    1. I think we agree about the key, but I think we haven't
> >> > talked about
> >> >     > >    the value yet. I think if our goal is an open ecosystem of
> these
> >> >     > header
> >> >     > >    spread across many plugins from many systems we should
> consider
> >> > making
> >> >     > > this
> >> >     > >    a string as well so it can be printed, set via a UI, set in
> >> > config,
> >> >     > etc.
> >> >     > >    Basically encouraging pluggable serialization formats here
> will
> >> > lead
> >> >     > to
> >> >     > > a
> >> >     > >    bit of a tower of babel.
> >> >     > >    2. This proposal still includes a pretty big change to our
> >> >     > serialization
> >> >     > >    and protocol definition layer. Essentially it is
> introducing an
> >> >     > optional
> >> >     > >    type, where the format is data dependent. I think this is
> >> > actually a
> >> >     > big
> >> >     > >    change though it doesn't seem like it. It means you can no
> >> > longer
> >> >     > > specify
> >> >     > >    this type with our type definition DSL, and likewise it
> requires
> >> >     > custom
> >> >     > >    handling in client libs. This isn't a huge thing, since the
> >> > Record
> >> >     > >    definition is custom anyway, but I think this kind of
> protocol
> >> >     > >    inconsistency is very non-desirable and ties you to
> hand-coding
> >> >     > things.
> >> >     > > I
> >> >     > >    think the type should instead by [Key Value] in our BNF,
> where
> >> > key and
> >> >     > >    value are both short strings as used elsewhere. This
> brings it
> >> > in line
> >> >     > > with
> >> >     > >    the rest of the protocol.
> >> >     > >    3. Could we get more specific about the exact Java API
> change to
> >> >     > >    ProducerRecord, ConsumerRecord, Record, etc?
> >> >     > >
> >> >     > > -Jay
> >> >     > >
> >> >     >
> >> >
> >> >
> >> > The information contained in this email is strictly confidential and
> for
> >> > the use of the addressee only, unless otherwise indicated. If you are
> not
> >> > the intended recipient, please do not read, copy, use or disclose to
> others
> >> > this message or any attachment. Please also notify the sender by
> replying
> >> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> >> > and any copies of it. Opinions, conclusion (etc) that do not relate
> to the
> >> > official business of this company shall be understood as neither
> given nor
> >> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> >> > registered in England and Wales, company number 04008957) and IG Index
> >> > Limited (a company registered in England and Wales, company number
> >> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> >> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and
> IG
> >> > Index Limited (register number 114059) are authorised and regulated
> by the
> >> > Financial Conduct Authority.
> >> >
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>

Reply via email to