Sorry, should have noted that the performance testing was done using the producer performance tool shipped with Kafka.
-Jason On Thu, Feb 16, 2017 at 6:44 PM, Jason Gustafson <ja...@confluent.io> wrote: > Hey Nacho, > > I've compared performance of our KIP-98 implementation with and without > varints. For messages around 128 bytes, we see an increase in throughput of > about 30% using the default configuration settings. At 256 bytes, the > increase is around 16%. Obviously the performance converge as messages get > larger, but it seems well worth the cost. Note that we are also seeing a > substantial performance increase against trunk primarily because of the > much more efficient packing that varints provide us. Anything adding to > message overhead, such as record headers, would only increase the relative > difference. (Of course take these numbers with a grain of salt since I have > only used the default settings with both the producer and broker on my > local machine. We intend to provide more extensive performance details as > part of the work for KIP-98.) > > The implementation we are using is from protobuf ( > https://developers.google.com/protocol-buffers/docs/encoding), which is > also used in HBase. It is trivial to implement and as far as I know doesn't > suffer from the aliasing problem you are describing. I checked with Magnus > (the author of librdkafka) and he agreed that the savings seemed worth the > cost of implementation. > > -Jason > > On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis <iso...@igso.net> wrote: > >> -VarInts >> >> I'm one of the people (if not the most) opposed to VarInts. VarInts >> have a place, but this is not it. (We had a large discussion about >> them at the beginning of KIP-82 time) >> >> If anybody has real life performance numbers of VarInts improving >> things or significantly reducing resources I would like to know what >> that case may be. Yes, you can save some bytes here and there, but >> this is probably insignificant to the overall system behavior and >> storage requirements. -- I say this with respect to using VarInts in >> the protocol itself, not as part of the data. >> >> VarInts require you to parse the Int before using it and depending on >> the encoding they can suffer from aliasing (multiple representations >> for the same value). >> >> Why add complexity? >> >> Nacho >> >> >> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe <cmcc...@apache.org> >> wrote: >> > +1 for varints here-- it would save quite a bit of space. They are >> > pretty quick to implement as well. >> > >> > I think it makes sense for values to be byte arrays. Users might want >> > to attach arbitrary payloads; they shouldn't be forced to serialize >> > everything to Java strings. >> > >> > best, >> > Colin >> > >> > >> > On Thu, Feb 16, 2017, at 09:52, Jason Gustafson wrote: >> >> Hey Michael, >> >> >> >> Hmm, I guess the point of representing it as bytes is to allow the >> broker >> >> to pass it through opaquely? Is the cost of parsing them a concern, or >> >> are >> >> we simply trying to ensure that the broker stays agnostic to the >> format? >> >> >> >> On varints, I think adding support for them makes less sense for an >> >> isolated use case, but as part of a more holistic change (such as what >> we >> >> have proposed in KIP-98), I think they are justifiable. If we add them, >> >> then the need to use attributes becomes quite a bit weaker, right? The >> >> other thing I find slightly odd is the fact that null headers has no >> >> actual >> >> semantic meaning for the message (unlike null keys and values). It is >> >> just >> >> a space optimization. It seems a bit better to always use size 0 to >> >> indicate having no headers. >> >> >> >> Overall, the main point is ensuring that the message schema remains >> >> consistent, either within the larger protocol, or at a minimum within >> the >> >> message itself. >> >> >> >> -Jason >> >> >> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael Pearce <michael.pea...@ig.com >> > >> >> wrote: >> >> >> >> > Hi Jason, >> >> > >> >> > On point 1) in the message protocol the headers are simply a byte >> array, >> >> > as like the key or value, this is to clearly demarcate the header in >> the >> >> > core message. Then the header byte array in the core message is an >> array of >> >> > key, value pairs. This is what it is denoting. >> >> > >> >> > Then this would be I guess in the given notation: >> >> > >> >> > Headers => [KeyLength, Key, ValueLength, Value] >> >> > KeyLength => int32 <-----------------NEW size of the byte[] of >> the >> >> > serialised key value >> >> > Key => bytes <---------------------- NEW serialised string (UTF8) >> >> > bytes of the header key >> >> > ValueLength => int32 <-------------- NEW size of the byte[] of >> the >> >> > serialised header value >> >> > Value => bytes <-------------------- NEW serialised form of the >> header >> >> > value >> >> > >> >> > The key length and value length is matching the way the protocol is >> >> > defined in the core message currently. >> >> > >> >> > >> >> > >> >> > >> >> > On point 2) >> >> > Var sized ints, this was discussed much earlier on, in fact I had >> >> > suggested it myself (with Hadoop references), the complexity of this >> >> > compared to having a simpler protocol was argued and agreed it >> wasn’t worth >> >> > the complexity as all other clients in other languages would need to >> ensure >> >> > theyre using the right var size algorithm, as there is a few. >> >> > >> >> > On point 3) >> >> > We did the attributes, optional approach as originally there was >> marked >> >> > concern that headers would cause a message size overhead for others, >> who >> >> > don’t want them. As such this is the clean solution to achieve that. >> If >> >> > that no longer holds, and we don’t care that we add 4bytes overhead, >> then >> >> > im happy to remove. >> >> > >> >> > I’m personally in favour of keeping the message as small as possible >> so >> >> > people don’t get shocks in perf and throughputs dues to message size, >> >> > unless they actively use the feature, as such I do prefer the >> attribute bit >> >> > wise feature flag approach myself. >> >> > >> >> > >> >> > >> >> > >> >> > On 16/02/2017, 05:40, "Jason Gustafson" <ja...@confluent.io> wrote: >> >> > >> >> > We have proposed a few significant changes to the message format >> in >> >> > KIP-98 >> >> > which now seems likely to pass (perhaps with some iterations on >> >> > implementation details). It would be good to try and coordinate >> the >> >> > changes >> >> > in both of the proposals to make sure they are consistent and >> >> > compatible. >> >> > >> >> > I think using the attributes to indicate null headers is a >> reasonable >> >> > approach. We have proposed to do the same thing for the message >> key and >> >> > value. That said, I sympathize with Jay's argument. Having >> multiple >> >> > ways to >> >> > specify a null value increases the overall complexity of the >> protocol. >> >> > You >> >> > can see this just from the fact that you need the extra verbiage >> in the >> >> > protocol specification in this KIP and in KIP-98 to describe the >> >> > dependence >> >> > between the fields and the attributes. It seems like a slippery >> slope >> >> > if >> >> > you start allowing different request types to implement the >> protocol >> >> > specification differently. >> >> > >> >> > You can also argue that the messages already are and are likely >> to >> >> > remain a >> >> > special case. For example, there is currently no generality in >> how >> >> > compressed message sets are represented that would be applicable >> for >> >> > other >> >> > request types. Some might see this divergence as an unfortunate >> >> > protocol >> >> > deficiency which should be fixed; others might see it as sort of >> the >> >> > inevitability of needing to optimize where it counts most. I'm >> probably >> >> > somewhere in between, but I think we probably all share the >> intuition >> >> > that >> >> > the protocol should be kept as consistent as possible. With that >> in >> >> > mind, >> >> > here are a few comments: >> >> > >> >> > 1. One thing I found a little odd when reading the current >> proposal is >> >> > that >> >> > the headers are both represented as an array of bytes and as an >> array >> >> > of >> >> > key/value pairs. I'd probably suggest something like this: >> >> > >> >> > Headers => [HeaderKey HeaderValue] >> >> > HeaderKey => String >> >> > HeaderValue => Bytes >> >> > >> >> > An array in the Kafka protocol is represented as a 4-byte integer >> >> > indicating the number of elements in the array followed by the >> >> > serialization of the elements. Unless I'm misunderstanding, what >> you >> >> > have >> >> > instead is the total size of the headers in bytes followed by the >> >> > elements. >> >> > I'm not sure I see any reason for this inconsistency. >> >> > >> >> > 2. In KIP-98, we've introduced variable-length integer fields. >> >> > Effectively, >> >> > we've enriched (or "complicated" as Jay might say ;) the protocol >> >> > specification to include the following types: VarInt, VarLong, >> >> > UnsignedVarInt and UnsignedVarLong. >> >> > >> >> > Along with these primitives, we could introduce the following >> types: >> >> > >> >> > VarSizeArray => NumberOfItems Item1 Item2 .. ItemN >> >> > NumberOfItems => UnsignedVarInt >> >> > >> >> > VarSizeNullableArray => NumberOfItemsOrNull Item1 Item2 .. ItemN >> >> > NumberOfItemsOrNull => VarInt (-1 means null) >> >> > >> >> > And similarly for the `String` and `Bytes` types. These types >> can save >> >> > a >> >> > considerable amount of space in this proposal because they can >> be used >> >> > for >> >> > both the number of headers included in the message and the >> lengths of >> >> > the >> >> > header keys and values. We could do this instead: >> >> > >> >> > Headers => VarSizeArray[HeaderKey HeaderValue] >> >> > HeaderKey => VarSizeString >> >> > HeaderValue => VarSizeBytes >> >> > >> >> > Combining the savings from the use of variable length fields, the >> >> > benefit >> >> > of using the attributes to represent null seems pretty small. >> >> > >> >> > 3. Whichever way we go (whether we use the attributes or not), we >> >> > should at >> >> > least be consistent between this KIP and KIP-98. It would be very >> >> > strange >> >> > to have two ways to represent null values in the same schema. >> Either >> >> > way is >> >> > OK with me. I think some message-level optimizations are >> justifiable, >> >> > but >> >> > the savings here seem minimal (a few bytes per message), so >> maybe it's >> >> > not >> >> > worth the cost of letting the message diverge even further from >> the >> >> > rest of >> >> > the protocol. >> >> > >> >> > -Jason >> >> > >> >> > >> >> > On Wed, Feb 15, 2017 at 8:52 AM, radai < >> radai.rosenbl...@gmail.com> >> >> > wrote: >> >> > >> >> > > I've trimmed the inline contents as this mail is getting too >> big for >> >> > the >> >> > > apache mailing list software to deliver :-( >> >> > > >> >> > > 1. the important thing for interoperability is for different >> >> > "interested >> >> > > parties" (plugins, infra layers/wrappers, user-code) to be >> able to >> >> > stick >> >> > > pieces of metadata onto msgs without getting in each other's >> way. a >> >> > common >> >> > > key scheme (Strings, as of the time of this writing?) is all >> thats >> >> > required >> >> > > for that. it is assumed that the other end interested in any >> such >> >> > piece of >> >> > > metadata knows the encoding, and byte[] provides for the most >> >> > flexibility. >> >> > > i believe this is the same logic behind core kafka being >> >> > byte[]/byte[] - >> >> > > Strings are more "usable" but bytes are flexible and so were >> chosen. >> >> > > Also - core kafka doesnt even do that good of a job on >> usability of >> >> > the >> >> > > payload (example - i have to specify the nop byte[] "decoders" >> >> > explicitly >> >> > > in conf), and again sacrificies usability for the sake of >> >> > performance (no >> >> > > convenient single-record processing as poll is a batch, lots of >> >> > obscure >> >> > > little config details exposing internals of the batching >> mechanism, >> >> > etc) >> >> > > >> >> > > this is also why i really dislike the idea of a "type system" >> for >> >> > header >> >> > > values, it further degrades the usability, adds complexity and >> will >> >> > > eventually get in people's way, also, it would be the 2nd/3rd >> >> > home-group >> >> > > serialization mechanism in core kafka (counting 2 iterations >> of the >> >> > "type >> >> > > definition DSL") >> >> > > >> >> > > 2. this is an implementation detail, and not even a very "user >> >> > facing" one? >> >> > > to the best of my understanding the vote process is on proposed >> >> > > API/behaviour. also - since we're willing to go with strings >> just >> >> > serialize >> >> > > a 0-sized header blob and IIUC you dont need any optionals >> anymore. >> >> > > >> >> > > 3. yes, we can :-) >> >> > > >> >> > > On Tue, Feb 14, 2017 at 11:56 PM, Michael Pearce < >> >> > michael.pea...@ig.com> >> >> > > wrote: >> >> > > >> >> > > > Hi Jay, >> >> > > > >> >> > > > 1) There was some initial debate on the value part, as youll >> note >> >> > String, >> >> > > > String headers were discounted early on. The reason for this >> is >> >> > > flexibility >> >> > > > and keeping in line with the flexibility of key, value of the >> >> > message >> >> > > > object itself. I don’t think it takes away from an ecosystem >> as >> >> > each >> >> > > plugin >> >> > > > will care for their own key, this way ints, booleans , exotic >> >> > custom >> >> > > binary >> >> > > > can all be catered for=. >> >> > > > a. If you really wanted to push for a typed value interface, >> I >> >> > wouldn’t >> >> > > > want just String values supported, but the the primatives >> plus >> >> > string and >> >> > > > also still keeping the ability to have a binary for custom >> >> > binaries that >> >> > > > some organisations may have. >> >> > > > i. I have written this slight alternative here, >> >> > > https://cwiki.apache.org/ >> >> > > > confluence/display/KAFKA/KIP-82+-+Add+Record+Headers+-+Typed >> >> > > > ii. Essentially the value bytes, has a leading byte overhead. >> >> > > > 1. This tells you what type the value is, before reading >> the rest >> >> > of the >> >> > > > bytes, allowing serialisation/deserialization to and from the >> >> > primitives, >> >> > > > string and byte[]. This is akin to some other messaging >> systems. >> >> > > > 2) We are making it optional, so that for those not wanting >> >> > headers have >> >> > > 0 >> >> > > > bytes overhead (think of it as a feature flag), I don’t >> think this >> >> > is >> >> > > > complex, especially if comparing to changes proposed in >> other kips >> >> > like >> >> > > > kip-98. >> >> > > > a. If you really really don’t like this, we can drop it, but >> it >> >> > would >> >> > > mean >> >> > > > buying into 4 bytes extra overhead for users who do not want >> to use >> >> > > headers. >> >> > > > 3) In the summary yes, it is at a higher level, but I think >> this >> >> > is well >> >> > > > documented in the proposed changes section. >> >> > > > a. Added getHeaders method to Producer/Consumer record (that >> is it) >> >> > > > b. We’ve also detailed the new Headers class that this method >> >> > returns >> >> > > that >> >> > > > encapsulates the headers protocol and logic. >> >> > > > >> >> > > > Best, >> >> > > > Mike >> >> > > > >> >> > > > ==Original questions from the vote thread from Jay.== >> >> > > > >> >> > > > Couple of things I think we still need to work out: >> >> > > > >> >> > > > 1. I think we agree about the key, but I think we haven't >> >> > talked about >> >> > > > the value yet. I think if our goal is an open ecosystem >> of these >> >> > > header >> >> > > > spread across many plugins from many systems we should >> consider >> >> > making >> >> > > > this >> >> > > > a string as well so it can be printed, set via a UI, set >> in >> >> > config, >> >> > > etc. >> >> > > > Basically encouraging pluggable serialization formats >> here will >> >> > lead >> >> > > to >> >> > > > a >> >> > > > bit of a tower of babel. >> >> > > > 2. This proposal still includes a pretty big change to our >> >> > > serialization >> >> > > > and protocol definition layer. Essentially it is >> introducing an >> >> > > optional >> >> > > > type, where the format is data dependent. I think this is >> >> > actually a >> >> > > big >> >> > > > change though it doesn't seem like it. It means you can no >> >> > longer >> >> > > > specify >> >> > > > this type with our type definition DSL, and likewise it >> requires >> >> > > custom >> >> > > > handling in client libs. This isn't a huge thing, since >> the >> >> > Record >> >> > > > definition is custom anyway, but I think this kind of >> protocol >> >> > > > inconsistency is very non-desirable and ties you to >> hand-coding >> >> > > things. >> >> > > > I >> >> > > > think the type should instead by [Key Value] in our BNF, >> where >> >> > key and >> >> > > > value are both short strings as used elsewhere. This >> brings it >> >> > in line >> >> > > > with >> >> > > > the rest of the protocol. >> >> > > > 3. Could we get more specific about the exact Java API >> change to >> >> > > > ProducerRecord, ConsumerRecord, Record, etc? >> >> > > > >> >> > > > -Jay >> >> > > > >> >> > > >> >> > >> >> > >> >> > The information contained in this email is strictly confidential and >> for >> >> > the use of the addressee only, unless otherwise indicated. If you >> are not >> >> > the intended recipient, please do not read, copy, use or disclose to >> others >> >> > this message or any attachment. Please also notify the sender by >> replying >> >> > to this email or by telephone (+44(020 7896 0011) and then delete >> the email >> >> > and any copies of it. Opinions, conclusion (etc) that do not relate >> to the >> >> > official business of this company shall be understood as neither >> given nor >> >> > endorsed by it. IG is a trading name of IG Markets Limited (a company >> >> > registered in England and Wales, company number 04008957) and IG >> Index >> >> > Limited (a company registered in England and Wales, company number >> >> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate >> Hill, >> >> > London EC4R 2YA. Both IG Markets Limited (register number 195355) >> and IG >> >> > Index Limited (register number 114059) are authorised and regulated >> by the >> >> > Financial Conduct Authority. >> >> > >> >> >> >> -- >> Nacho - Ignacio Solis - iso...@igso.net >> > >