"(By the way, doesn't it feel a bit odd that we seem to be designing a feature which is optimized for people not using?)"
very. this (i claim :-) ) is the result of very intense opposition to the usefulness of the feature early on, and not a real design goal On Sun, Feb 19, 2017 at 2:11 PM, Michael Pearce <michael.pea...@ig.com> wrote: > Whilst if having an array of a Header[] would mean getting this through, > im happy changing this, as already done. > > Just going through how this is, I don’t believe this would provide much > saving. If issue trying to resolve is around object/garbage creation of > Map.Entry objects. > > All you’re doing here is replacing the equiv of a HashMap.Node (Map.Entry) > which simply holds reference to key and value objects with a custom variant. > > > > > > On 19/02/2017, 20:22, "Michael Pearce" <michael.pea...@ig.com> wrote: > > On point 1 & 2 Ive updated KIP to show varints (and removed the bit > flag). (on the assumption KIP 98 is getting the agreement the protocol is > moving from int32 to varInts as standard) > > On point 3 ive updated to use an array of Header class, instead of a > MultiMap in the Headers class object > > > > On 19/02/2017, 20:06, "Michael Pearce" <michael.pea...@ig.com> wrote: > > On points 1 and 2 I agree. > > This also affects kip-98, I should expect this resolved before > that vote also passes. If it is accepted there (I’m assuming this is > getting discussed on that KIP? As you’re driving the move to VarInts), I am > happy to make this KIP will simply follow suit to whatever is agreed in > KIP-98. > > On 3) Agreed this is a simpler form, as long as no one is > expecting hashmap lookup performance ( O(1) ), I still would prefer an > encapsulated class, so if we find that holding it as an array in future is > causing some perf issues, the internals are not exposed to end users, > allowing the internal structure to move to a map. > > Can we compromise on? > > class ConsumerRecord<K, V> { > K key; > V value; > Headers headers; > } > > class Headers { > Header[] headers; > > add(String key, byte[] value) > Collection<byte[]> get(String key) > } > > class Header { > String key; > byte[] value; > } > > > > On 19/02/2017, 18:54, "Jason Gustafson" <ja...@confluent.io> > wrote: > > > > > headers dont "leak" into application code. they are useful > to application > > code as well. > > > This is exactly what I have been trying to get at. The use > cases documented > here are middleware: > https://cwiki.apache.org/confluence/display/KAFKA/A+ > Case+for+Kafka+Headers. > If headers are intended for the application space as well, the > document > should be updated accordingly so that it is an explicit design > goal and not > an unfortunate design byproduct. There may be some > disagreement on whether > it _should_ be a design goal, but it may not matter much since > the current > interceptor API is probably insufficient for middleware > applications (I > haven't thought of a way to do this that isn't cumbersome). > > That aside, the three unresolved points for me are the > following: > > 1. The use of varints. As a one-off for use only with record > headers, the > case was weak, but if we are using them throughout the message > format, then > we should do so here as well. The additional complexity is > minimal and > early performance testing fully justifies their use. > > 2. If varints are used, the case for using attributes to > indicate null > headers also becomes weak. We only add one additional byte in > each message > if there are no headers. Whatever the case, let us be > consistent with how > we treat null keys and values. > > 3. We have apparently agreed that the broker will validate > headers (please > add this to the KIP). That being the case, I would prefer to > use Kafka's > conventional format for arrays. The downside is that it takes > more work to > skip over the headers on the consumer, though it's unclear if > the cost > would matter in practice. Some concern was previously > expressed about the > allocation of maps. An alternative would be to use arrays, i.e. > > class ConsumerRecord<K, V> { > K key; > V value; > Header[] headers; > } > > class Header { > String key; > byte[] value; > } > > This would work nicely with the conventional array format and > my guess is > it would obviate the need do any lazy initialization. If we > use the map as > is currently documented, then it is possible with either > representation to > slice the headers and initialize them lazily. Either way, it > might be a > good idea to use a separate object to represent the headers in > case we need > to modify it in the future in some way. > > (By the way, doesn't it feel a bit odd that we seem to be > designing a > feature which is optimized for people not using?) > > > If we can resolve these points, then at least you will get my > vote. > > Thanks, > Jason > > On Sun, Feb 19, 2017 at 7:30 AM, radai < > radai.rosenbl...@gmail.com> wrote: > > > headers dont "leak" into application code. they are useful > to application > > code as well. > > IIUC samze currently has headers "in-V" and would just > switch over to kafka > > headers if they exist. > > im sure plenty of other users of kafka would have a use for > headers. > > im pretty sure use cases exist around shuffling data > into/out-of kafka > > (kafka connect or equivalent) where metadata from one end > could copied over > > to the other (S3, for example uses http headers for > user-accessible > > metadata). it will be kafka client code getting/setting > those headers. not > > an interceptor. > > > > On Fri, Feb 17, 2017 at 1:41 PM, Michael Pearce < > michael.pea...@ig.com> > > wrote: > > > > > For APM single event tracing, need access to the header at > the point of > > > processing on the processing thread. > > > > > > As such interceptors will not work/be suitable for these, > due to the fact > > > they act on the ConsumerRecords as a batch, before the > handling thread > > can > > > split out and process per message which is the point these > tools will > > need > > > to continue to transaction tracing. > > > > > > Like wise tools and infra pieces will need access to the > message outside > > > the interceptor realm. > > > > > > > > > > > > On 17/02/2017, 21:26, "Jason Gustafson" < > ja...@confluent.io> wrote: > > > > > > > > > > > That’s exactly what we’re doing the headers are a > slice of bytes, > > > which > > > > then gets parsed later if needed, or can be parsed > right away, the > > > headers > > > > is part of the protocol, so can still be validated > if wanted. > > > > If you had a header count then you would have to go > through each > > > header > > > > key and value length value to work out how much to > skip to get to > > > say the > > > > value or any future component in the message after > the headers. > > > Having it > > > > as a byte[] with length value makes this a lot > easier to skip. > > > > > > > > > So the broker will parse the headers and validate > them. Good. The > > only > > > reason remaining that I can see to leave the headers > as a byte array > > > is to > > > make it easier for the client to skip past them. Are > we sure this is > > > not > > > premature optimization? Are there any performance > results which show > > > that > > > this is worthwhile? > > > > > > What’s the issue with exposing a method getHeaders on > the > > > producer/consumer > > > > record? It doesn’t break anything. We don’t need any > special > > version. > > > > > > > > > See my previous explanation. What I am trying to > resist is the > > headers > > > becoming a general application-level facility. The > primary use case > > as > > > far > > > as I can tell is middleware, which is the use case > that the > > > interceptors > > > are providing. > > > > > > Current batch consumer model and consumer interceptors > don’t work > > where > > > > headers need to be acted on at per message level at > time of > > > processing, > > > > very case is APM (the core one), where the header > value is used to > > > continue > > > > tracing. > > > > > > > > > I still don't understand the point about batching. The > consumer > > > records are > > > exposed as a batch in the consumer interceptor, but > you can still > > > iterate > > > them individually. It is no different for the consumer > API itself. > > > > > > -Jason > > > > > > On Fri, Feb 17, 2017 at 12:48 PM, Michael Pearce < > > > michael.pea...@ig.com> > > > wrote: > > > > > > > Re: > > > > > > > > “ The point about creation of maps seems > orthogonal. We can > > still > > > > represent > > > > the headers as a slice of bytes until the time > it is accessed.” > > > > > > > > That’s exactly what we’re doing the headers are a > slice of bytes, > > > which > > > > then gets parsed later if needed, or can be parsed > right away, the > > > headers > > > > is part of the protocol, so can still be validated > if wanted. > > > > > > > > If you had a header count then you would have to go > through each > > > header > > > > key and value length value to work out how much to > skip to get to > > > say the > > > > value or any future component in the message after > the headers. > > > Having it > > > > as a byte[] with length value makes this a lot > easier to skip. > > > > > > > > > > > > On 17/02/2017, 20:37, "Michael Pearce" < > michael.pea...@ig.com> > > > wrote: > > > > > > > > What’s the issue with exposing a method > getHeaders on the > > > > producer/consumer record? It doesn’t break anything. > We don’t need > > > any > > > > special version. > > > > > > > > Current batch consumer model and consumer > interceptors don’t > > work > > > > where headers need to be acted on at per message > level at time of > > > > processing, very case is APM (the core one), where > the header value > > > is used > > > > to continue tracing. JMS/HTTP etc all expose these, > without issues. > > > I would > > > > NOT want to lock this down to only be usable > accessible via > > > interceptors, > > > > as we’d fail on one of the main goals. > > > > > > > > Regards > > > > Mike > > > > > > > > > > > > > > > > > > > > On 17/02/2017, 20:21, "Jason Gustafson" < > ja...@confluent.io> > > > wrote: > > > > > > > > The point about creation of maps seems > orthogonal. We can > > > still > > > > represent > > > > the headers as a slice of bytes until the > time it is > > > accessed. > > > > > > > > > > > > > Yes exactly we have access to the records > thus why the > > > header > > > > should be > > > > > accessible via it and not hidden for only > interceptors to > > > access. > > > > > > > > > > > > As explained above, the point is to make the > intended usage > > > clear. > > > > Applications should continue to rely on the > key/value > > fields > > > to > > > > serialize > > > > their own headers, and it would be more > ideal if we can > > avoid > > > > leaking > > > > third-party headers into applications. This > is difficult to > > > do > > > > with the > > > > current interceptors because they share the > record objects > > > with > > > > the common > > > > API. What I had in mind is something like an > extension of > > the > > > > current > > > > interceptors which exposed a different > object (e.g. > > > > `RecordAndHeaders`). > > > > The challenge is for MM-like use cases. Let > me see if I can > > > come > > > > up with a > > > > concrete proposal for that problem. > > > > > > > > -Jason > > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 11:55 AM, Michael > Pearce < > > > > michael.pea...@ig.com> > > > > wrote: > > > > > > > > > I am happy to move the definition of the > header into the > > > message > > > > body, but > > > > > would cause us not to lazy > initialise/parse the headers, > > as > > > > obviously, we > > > > > would have to traverse these reading the > message. > > > > > > > > > > This was actually one of Jay’s requests: > > > > > > > > > > “ 2. I think we should think about > creating the Map > > > lazily to > > > > avoid > > > > > parsing out all the headers into > little objects. > > > HashMaps > > > > themselves > > > > > are > > > > > kind of expensive and the consumer is > very perf > > > sensitive so > > > > and making > > > > > gazillions of hashmaps that may or may > not get used > > is > > > > probably a bad > > > > > idea.” > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 17/02/2017, 19:44, "Michael Pearce" < > > > michael.pea...@ig.com> > > > > wrote: > > > > > > > > > > Yes exactly we have access to the > records thus why > > the > > > > header should > > > > > be accessible via it and not hidden for > only interceptors > > > to > > > > access. > > > > > > > > > > Sent using OWA for iPhone > > > > > ______________________________ > __________ > > > > > From: Magnus Edenhill < > mag...@edenhill.se> > > > > > Sent: Friday, February 17, 2017 > 7:34:49 PM > > > > > To: dev@kafka.apache.org > > > > > Subject: Re: [DISCUSS] KIP-82 - Add > Record Headers > > > > > > > > > > Big +1 on VarInts. > > > > > CPUs are fast, memory is slow. > > > > > > > > > > I agree with Jason that we'll want to > continue > > > verifying > > > > messages, > > > > > including their headers, so while I > appreciate the > > > idea of > > > > the opaque > > > > > header blob it won't be useful in > practice. > > > > > > > > > > /Magnus > > > > > > > > > > 2017-02-17 10:41 GMT-08:00 Jason > Gustafson < > > > > ja...@confluent.io>: > > > > > > > > > > > Sorry, my mistake. The consumer > interceptor is per > > > batch, > > > > though I'm > > > > > not > > > > > > sure that's an actual limitation > since you still > > have > > > > access to the > > > > > > individual records. > > > > > > > > > > > > -Jason > > > > > > > > > > > > On Fri, Feb 17, 2017 at 10:39 AM, > Jason Gustafson < > > > > > ja...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > Re headers as byte array and > future use by > > broker. > > > This > > > > doesn't > > > > > take away > > > > > > >> from that at all. Nor makes it > difficult at all > > > in my > > > > opinion. > > > > > > > > > > > > > > > > > > > > > Yeah, I didn't say it was > difficult, only > > awkward. > > > You > > > > wouldn't > > > > > write the > > > > > > > schema that way if you were > planning to use it on > > > the > > > > brokers from > > > > > the > > > > > > > beginning. Note also that one of > the benefits of > > > letting > > > > the broker > > > > > > > understand headers is that it can > validate that > > > they are > > > > properly > > > > > > > formatted. If cost is the only > concern, we should > > > > confirm its > > > > > impact > > > > > > > through performance testing. > > > > > > > > > > > > > > One of the key use cases requires > access on > > > consume at > > > > per > > > > > event/message > > > > > > >> level at the point that message > is being > > > processed, as > > > > such the > > > > > batch > > > > > > >> interceptors and batch consume > api isn't > > > suitable. It > > > > needs to be > > > > > at the > > > > > > >> record level. > > > > > > > > > > > > > > > > > > > > > I'm not sure I understand the > point about > > batching. > > > > Interceptors > > > > > are > > > > > > > applied per-message, right? > > > > > > > > > > > > > > My intent on interceptors is to > keep the usage of > > > headers > > > > > well-defined so > > > > > > > that they don't start leaking > unnecessarily into > > > > applications. My > > > > > guess > > > > > > is > > > > > > > that it's probably inevitable, but > isolating it > > in > > > the > > > > > interceptors would > > > > > > > at least give people a second > thought before > > > deciding to > > > > use it. > > > > > The main > > > > > > > challenge in my mind is figuring > out how an MM > > use > > > case > > > > would > > > > > work. It > > > > > > > would be more cumbersome to > replicate headers > > > through an > > > > > interceptor, > > > > > > > though arguably MM should be > working at a lower > > > level > > > > anyway. > > > > > > > > > > > > > > -Jason > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 10:16 AM, > Michael Pearce > > < > > > > > michael.pea...@ig.com> > > > > > > > wrote: > > > > > > > > > > > > > >> Re headers available on the > record va > > > interceptors only > > > > > > >> > > > > > > >> One of the key use cases requires > access on > > > consume at > > > > per > > > > > event/message > > > > > > >> level at the point that message > is being > > > processed, as > > > > such the > > > > > batch > > > > > > >> interceptors and batch consume > api isn't > > > suitable. It > > > > needs to be > > > > > at the > > > > > > >> record level. > > > > > > >> > > > > > > >> This anyhow is similar to > jms/http/amqp where > > > headers > > > > are > > > > > available to > > > > > > >> consuming applications. > > > > > > >> > > > > > > >> Re headers as byte array and > future use by > > > broker. This > > > > doesn't > > > > > take > > > > > > away > > > > > > >> from that at all. Nor makes it > difficult at all > > > in my > > > > opinion. > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Sent using OWA for iPhone > > > > > > >> ______________________________ > __________ > > > > > > >> From: Jason Gustafson < > ja...@confluent.io> > > > > > > >> Sent: Friday, February 17, 2017 > 5:55:42 PM > > > > > > >> To: dev@kafka.apache.org > > > > > > >> Subject: Re: [DISCUSS] KIP-82 - > Add Record > > Headers > > > > > > >> > > > > > > >> > > > > > > > >> > Would you be proposing in > KIP-98 to convert > > the > > > other > > > > message > > > > > int’s > > > > > > (key > > > > > > >> > length, value length) also to > varint to keep > > it > > > > uniform. > > > > > > >> > Also I assume there will be a > static or helper > > > method > > > > made to > > > > > > write/read > > > > > > >> > these in the client and server. > > > > > > >> > > > > > > >> > > > > > > >> Yes, that is what we are > proposing, so using > > > varints > > > > for headers > > > > > would > > > > > > be > > > > > > >> consistent with the rest of the > message. We have > > > used > > > > static > > > > > helper > > > > > > >> methods > > > > > > >> in our prototype implementation. > > > > > > >> > > > > > > >> The cost of parsing, we want to > parse/interpret > > > the > > > > headers > > > > > lazily (this > > > > > > >> is > > > > > > >> > a key point brought up earlier > in discussions) > > > > > > >> > > > > > > >> > > > > > > >> I'm a bit skeptical of this. Has > anyone done the > > > > performance > > > > > testing? I > > > > > > >> can > > > > > > >> probably implement it and test it > if no one else > > > has. I > > > > was also > > > > > under > > > > > > the > > > > > > >> impression that there may be use > cases down the > > > road > > > > where the > > > > > broker > > > > > > >> would > > > > > > >> need to interpret headers. That > wouldn't be off > > > the > > > > table in the > > > > > future > > > > > > if > > > > > > >> it's represented as bytes, but it > would be quite > > > a bit > > > > more > > > > > awkward, > > > > > > >> right? > > > > > > >> > > > > > > >> By the way, one question I have > been wondering > > > about. My > > > > > understanding > > > > > > is > > > > > > >> that headers are primarily for > use cases where a > > > > third-party > > > > > components > > > > > > >> wants to enrich messages without > needing to > > > understand > > > > or modify > > > > > the > > > > > > >> schema > > > > > > >> of the message key and value. For > the > > applications > > > > which directly > > > > > > produce > > > > > > >> and consume the messages and > control the > > key/value > > > > schema > > > > > directly, it > > > > > > >> seems we would rather have them > implement > > headers > > > > directly in > > > > > their own > > > > > > >> schema. Supposing for the sake of > argument that > > > it was > > > > possible, > > > > > my > > > > > > >> question is whether it be > sufficient to expose > > the > > > > headers in the > > > > > > >> interceptor API and not in the > common API? > > > > > > >> > > > > > > >> -Jason > > > > > > >> > > > > > > >> On Fri, Feb 17, 2017 at 3:26 AM, > Michael Pearce > > < > > > > > michael.pea...@ig.com> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > On the point of varInts > > > > > > >> > > > > > > > >> > Would you be proposing in > KIP-98 to convert > > the > > > other > > > > message > > > > > int’s > > > > > > (key > > > > > > >> > length, value length) also to > varint to keep > > it > > > > uniform. > > > > > > >> > Also I assume there will be a > static or helper > > > method > > > > made to > > > > > > write/read > > > > > > >> > these in the client and server. > > > > > > >> > > > > > > > >> > Cheers > > > > > > >> > Mike > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > On 17/02/2017, 11:22, "Michael > Pearce" < > > > > michael.pea...@ig.com> > > > > > wrote: > > > > > > >> > > > > > > > >> > On the point re: headers in > the message > > > protocol > > > > being a > > > > > byte > > > > > > array > > > > > > >> > and not a count of elements > followed by the > > > elements. > > > > Again > > > > > this was > > > > > > >> > discussed/argued previously. > > > > > > >> > > > > > > > >> > It was agreed on for a few > reasons some of > > > which > > > > you have > > > > > > obviously > > > > > > >> > picked up on: > > > > > > >> > > > > > > > >> > Broker is able to pass it > through opaquely > > > > > > >> > The cost of parsing, we > want to > > > parse/interpret > > > > the headers > > > > > lazily > > > > > > >> > (this is a key point brought up > earlier in > > > > discussions) > > > > > > >> > Headers can be copied from > consumer record > > > to > > > > producer > > > > > record (aka > > > > > > >> > mirror makers etc) without > parsing if no > > > changes are > > > > being made > > > > > or > > > > > > being > > > > > > >> > looked at. > > > > > > >> > Keeps the broker agnostic > to the format > > > > > > >> > You need an int32 either > for the byte size > > > of the > > > > headers, > > > > > or for > > > > > > >> the > > > > > > >> > count of elements, so overheads > are the same, > > > but > > > > with going > > > > > with an > > > > > > >> opaque > > > > > > >> > byte array has the above > advantages. > > > > > > >> > > > > > > > >> > Cheers > > > > > > >> > Mike > > > > > > >> > > > > > > > >> > > > > > > > >> > On 17/02/2017, 02:50, > "Jason Gustafson" < > > > > ja...@confluent.io > > > > > > > > > > > > wrote: > > > > > > >> > > > > > > > >> > Sorry, should have > noted that the > > > performance > > > > testing > > > > > was done > > > > > > >> > using the > > > > > > >> > producer performance > tool shipped with > > > Kafka. > > > > > > >> > > > > > > > >> > -Jason > > > > > > >> > > > > > > > >> > On Thu, Feb 16, 2017 at > 6:44 PM, Jason > > > > Gustafson < > > > > > > >> > ja...@confluent.io> wrote: > > > > > > >> > > > > > > > >> > > Hey Nacho, > > > > > > >> > > > > > > > > >> > > I've compared > performance of our > > > KIP-98 > > > > > implementation with > > > > > > >> and > > > > > > >> > without > > > > > > >> > > varints. For messages > around 128 > > > bytes, we > > > > see an > > > > > increase > > > > > > in > > > > > > >> > throughput of > > > > > > >> > > about 30% using the > default > > > configuration > > > > settings. > > > > > At 256 > > > > > > >> > bytes, the > > > > > > >> > > increase is around > 16%. Obviously > > the > > > > performance > > > > > converge > > > > > > as > > > > > > >> > messages get > > > > > > >> > > larger, but it seems > well worth the > > > cost. > > > > Note that > > > > > we are > > > > > > >> also > > > > > > >> > seeing a > > > > > > >> > > substantial > performance increase > > > against > > > > trunk > > > > > primarily > > > > > > >> because > > > > > > >> > of the > > > > > > >> > > much more efficient > packing that > > > varints > > > > provide us. > > > > > > Anything > > > > > > >> > adding to > > > > > > >> > > message overhead, > such as record > > > headers, > > > > would only > > > > > > increase > > > > > > >> > the relative > > > > > > >> > > difference. (Of > course take these > > > numbers > > > > with a > > > > > grain of > > > > > > salt > > > > > > >> > since I have > > > > > > >> > > only used the default > settings with > > > both > > > > the producer > > > > > and > > > > > > >> broker > > > > > > >> > on my > > > > > > >> > > local machine. We > intend to provide > > > more > > > > extensive > > > > > > performance > > > > > > >> > details as > > > > > > >> > > part of the work for > KIP-98.) > > > > > > >> > > > > > > > > >> > > The implementation we > are using is > > > from > > > > protobuf ( > > > > > > >> > > > https://developers.google.com/ > > > > > > protocol-buffers/docs/encoding > > > > > > >> ), > > > > > > >> > which is > > > > > > >> > > also used in HBase. > It is trivial to > > > > implement and as > > > > > far > > > > > > as I > > > > > > >> > know doesn't > > > > > > >> > > suffer from the > aliasing problem you > > > are > > > > describing. I > > > > > > checked > > > > > > >> > with Magnus > > > > > > >> > > (the author of > librdkafka) and he > > > agreed > > > > that the > > > > > savings > > > > > > >> seemed > > > > > > >> > worth the > > > > > > >> > > cost of > implementation. > > > > > > >> > > > > > > > > >> > > -Jason > > > > > > >> > > > > > > > > >> > > On Thu, Feb 16, 2017 > at 4:32 PM, > > > Ignacio > > > > Solis < > > > > > > >> iso...@igso.net> > > > > > > >> > wrote: > > > > > > >> > > > > > > > > >> > >> -VarInts > > > > > > >> > >> > > > > > > >> > >> I'm one of the > people (if not the > > > most) > > > > opposed to > > > > > VarInts. > > > > > > >> > VarInts > > > > > > >> > >> have a place, but > this is not it. > > > (We > > > > had a large > > > > > > >> discussion > > > > > > >> > about > > > > > > >> > >> them at the > beginning of KIP-82 > > time) > > > > > > >> > >> > > > > > > >> > >> If anybody has real > life > > performance > > > > numbers of > > > > > VarInts > > > > > > >> > improving > > > > > > >> > >> things or > significantly reducing > > > resources > > > > I would > > > > > like to > > > > > > >> know > > > > > > >> > what > > > > > > >> > >> that case may be. > Yes, you can save > > > some > > > > bytes here > > > > > and > > > > > > >> there, > > > > > > >> > but > > > > > > >> > >> this is probably > insignificant to > > the > > > > overall system > > > > > > behavior > > > > > > >> > and > > > > > > >> > >> storage > requirements. -- I say > > this > > > with > > > > respect to > > > > > using > > > > > > >> > VarInts in > > > > > > >> > >> the protocol itself, > not as part of > > > the > > > > data. > > > > > > >> > >> > > > > > > >> > >> VarInts require you > to parse the > > Int > > > > before using it > > > > > and > > > > > > >> > depending on > > > > > > >> > >> the encoding they > can suffer from > > > aliasing > > > > (multiple > > > > > > >> > representations > > > > > > >> > >> for the same value). > > > > > > >> > >> > > > > > > >> > >> Why add complexity? > > > > > > >> > >> > > > > > > >> > >> Nacho > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> On Thu, Feb 16, 2017 > at 10:29 AM, > > > Colin > > > > McCabe < > > > > > > >> > cmcc...@apache.org> > > > > > > >> > >> wrote: > > > > > > >> > >> > +1 for varints > here-- it would > > save > > > > quite a bit of > > > > > space. > > > > > > >> > They are > > > > > > >> > >> > pretty quick to > implement as > > well. > > > > > > >> > >> > > > > > > > >> > >> > I think it makes > sense for values > > > to be > > > > byte > > > > > arrays. > > > > > > Users > > > > > > >> > might want > > > > > > >> > >> > to attach > arbitrary payloads; > > they > > > > shouldn't be > > > > > forced to > > > > > > >> > serialize > > > > > > >> > >> > everything to Java > strings. > > > > > > >> > >> > > > > > > > >> > >> > best, > > > > > > >> > >> > Colin > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > On Thu, Feb 16, > 2017, at 09:52, > > > Jason > > > > Gustafson > > > > > wrote: > > > > > > >> > >> >> Hey Michael, > > > > > > >> > >> >> > > > > > > >> > >> >> Hmm, I guess the > point of > > > representing > > > > it as > > > > > bytes is to > > > > > > >> > allow the > > > > > > >> > >> broker > > > > > > >> > >> >> to pass it > through opaquely? Is > > > the > > > > cost of > > > > > parsing > > > > > > them a > > > > > > >> > concern, or > > > > > > >> > >> >> are > > > > > > >> > >> >> we simply trying > to ensure that > > > the > > > > broker stays > > > > > > agnostic > > > > > > >> to > > > > > > >> > the > > > > > > >> > >> format? > > > > > > >> > >> >> > > > > > > >> > >> >> On varints, I > think adding > > > support for > > > > them makes > > > > > less > > > > > > >> sense > > > > > > >> > for an > > > > > > >> > >> >> isolated use > case, but as part > > of > > > a > > > > more holistic > > > > > change > > > > > > >> > (such as what > > > > > > >> > >> we > > > > > > >> > >> >> have proposed in > KIP-98), I > > think > > > they > > > > are > > > > > justifiable. > > > > > > If > > > > > > >> > we add them, > > > > > > >> > >> >> then the need to > use attributes > > > becomes > > > > quite a > > > > > bit > > > > > > >> weaker, > > > > > > >> > right? The > > > > > > >> > >> >> other thing I > find slightly odd > > > is the > > > > fact that > > > > > null > > > > > > >> > headers has no > > > > > > >> > >> >> actual > > > > > > >> > >> >> semantic meaning > for the message > > > > (unlike null > > > > > keys and > > > > > > >> > values). It is > > > > > > >> > >> >> just > > > > > > >> > >> >> a space > optimization. It seems a > > > bit > > > > better to > > > > > always > > > > > > use > > > > > > >> > size 0 to > > > > > > >> > >> >> indicate having > no headers. > > > > > > >> > >> >> > > > > > > >> > >> >> Overall, the main > point is > > > ensuring > > > > that the > > > > > message > > > > > > >> schema > > > > > > >> > remains > > > > > > >> > >> >> consistent, > either within the > > > larger > > > > protocol, or > > > > > at a > > > > > > >> > minimum within > > > > > > >> > >> the > > > > > > >> > >> >> message itself. > > > > > > >> > >> >> > > > > > > >> > >> >> -Jason > > > > > > >> > >> >> > > > > > > >> > >> >> On Thu, Feb 16, > 2017 at 6:39 AM, > > > > Michael Pearce < > > > > > > >> > michael.pea...@ig.com > > > > > > >> > >> > > > > > > > >> > >> >> wrote: > > > > > > >> > >> >> > > > > > > >> > >> >> > Hi Jason, > > > > > > >> > >> >> > > > > > > > >> > >> >> > On point 1) in > the message > > > protocol > > > > the headers > > > > > are > > > > > > >> simply > > > > > > >> > a byte > > > > > > >> > >> array, > > > > > > >> > >> >> > as like the key > or value, this > > > is to > > > > clearly > > > > > demarcate > > > > > > >> the > > > > > > >> > header in > > > > > > >> > >> the > > > > > > >> > >> >> > core message. > Then the header > > > byte > > > > array in the > > > > > core > > > > > > >> > message is an > > > > > > >> > >> array of > > > > > > >> > >> >> > key, value > pairs. This is what > > > it is > > > > denoting. > > > > > > >> > >> >> > > > > > > > >> > >> >> > Then this would > be I guess in > > > the > > > > given > > > > > notation: > > > > > > >> > >> >> > > > > > > > >> > >> >> > Headers => > [KeyLength, Key, > > > > ValueLength, Value] > > > > > > >> > >> >> > KeyLength > => int32 > > > > <-----------------NEW > > > > > size of > > > > > > the > > > > > > >> > byte[] of > > > > > > >> > >> the > > > > > > >> > >> >> > serialised key > value > > > > > > >> > >> >> > Key => bytes > > > > <---------------------- NEW > > > > > > serialised > > > > > > >> > string (UTF8) > > > > > > >> > >> >> > bytes of the > header key > > > > > > >> > >> >> > ValueLength > => int32 > > > > <-------------- NEW > > > > > size of > > > > > > the > > > > > > >> > byte[] of > > > > > > >> > >> the > > > > > > >> > >> >> > serialised > header value > > > > > > >> > >> >> > Value => > bytes > > > > <-------------------- NEW > > > > > > serialised > > > > > > >> > form of the > > > > > > >> > >> header > > > > > > >> > >> >> > value > > > > > > >> > >> >> > > > > > > > >> > >> >> > The key length > and value > > length > > > is > > > > matching the > > > > > way > > > > > > the > > > > > > >> > protocol is > > > > > > >> > >> >> > defined in the > core message > > > currently. > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > On point 2) > > > > > > >> > >> >> > Var sized ints, > this was > > > discussed > > > > much earlier > > > > > on, in > > > > > > >> > fact I had > > > > > > >> > >> >> > suggested it > myself (with > > Hadoop > > > > references), > > > > > the > > > > > > >> > complexity of this > > > > > > >> > >> >> > compared to > having a simpler > > > protocol > > > > was > > > > > argued and > > > > > > >> > agreed it > > > > > > >> > >> wasn’t worth > > > > > > >> > >> >> > the complexity > as all other > > > clients > > > > in other > > > > > languages > > > > > > >> > would need to > > > > > > >> > >> ensure > > > > > > >> > >> >> > theyre using > the right var > > size > > > > algorithm, as > > > > > there > > > > > > is a > > > > > > >> > few. > > > > > > >> > >> >> > > > > > > > >> > >> >> > On point 3) > > > > > > >> > >> >> > We did the > attributes, > > optional > > > > approach as > > > > > originally > > > > > > >> > there was > > > > > > >> > >> marked > > > > > > >> > >> >> > concern that > headers would > > > cause a > > > > message size > > > > > > overhead > > > > > > >> > for others, > > > > > > >> > >> who > > > > > > >> > >> >> > don’t want > them. As such this > > > is the > > > > clean > > > > > solution to > > > > > > >> > achieve that. > > > > > > >> > >> If > > > > > > >> > >> >> > that no longer > holds, and we > > > don’t > > > > care that we > > > > > add > > > > > > >> 4bytes > > > > > > >> > overhead, > > > > > > >> > >> then > > > > > > >> > >> >> > im happy to > remove. > > > > > > >> > >> >> > > > > > > > >> > >> >> > I’m personally > in favour of > > > keeping > > > > the message > > > > > as > > > > > > small > > > > > > >> > as possible > > > > > > >> > >> so > > > > > > >> > >> >> > people don’t > get shocks in > > perf > > > and > > > > throughputs > > > > > dues > > > > > > to > > > > > > >> > message size, > > > > > > >> > >> >> > unless they > actively use the > > > feature, > > > > as such I > > > > > do > > > > > > >> prefer > > > > > > >> > the > > > > > > >> > >> attribute bit > > > > > > >> > >> >> > wise feature > flag approach > > > myself. > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > On 16/02/2017, > 05:40, "Jason > > > > Gustafson" < > > > > > > >> > ja...@confluent.io> wrote: > > > > > > >> > >> >> > > > > > > > >> > >> >> > We have > proposed a few > > > > significant changes > > > > > to the > > > > > > >> > message format > > > > > > >> > >> in > > > > > > >> > >> >> > KIP-98 > > > > > > >> > >> >> > which now > seems likely to > > > pass > > > > (perhaps > > > > > with some > > > > > > >> > iterations on > > > > > > >> > >> >> > > implementation details). > > It > > > would > > > > be good > > > > > to try > > > > > > and > > > > > > >> > coordinate > > > > > > >> > >> the > > > > > > >> > >> >> > changes > > > > > > >> > >> >> > in both of > the proposals > > to > > > make > > > > sure they > > > > > are > > > > > > >> > consistent and > > > > > > >> > >> >> > compatible. > > > > > > >> > >> >> > > > > > > > >> > >> >> > I think > using the > > > attributes to > > > > indicate > > > > > null > > > > > > >> headers > > > > > > >> > is a > > > > > > >> > >> reasonable > > > > > > >> > >> >> > approach. > We have proposed > > > to do > > > > the same > > > > > thing > > > > > > for > > > > > > >> > the message > > > > > > >> > >> key and > > > > > > >> > >> >> > value. That > said, I > > > sympathize > > > > with Jay's > > > > > > argument. > > > > > > >> > Having > > > > > > >> > >> multiple > > > > > > >> > >> >> > ways to > > > > > > >> > >> >> > specify a > null value > > > increases > > > > the overall > > > > > > >> complexity > > > > > > >> > of the > > > > > > >> > >> protocol. > > > > > > >> > >> >> > You > > > > > > >> > >> >> > can see > this just from the > > > fact > > > > that you > > > > > need the > > > > > > >> > extra verbiage > > > > > > >> > >> in the > > > > > > >> > >> >> > protocol > specification in > > > this > > > > KIP and in > > > > > KIP-98 > > > > > > to > > > > > > >> > describe the > > > > > > >> > >> >> > dependence > > > > > > >> > >> >> > between the > fields and the > > > > attributes. It > > > > > seems > > > > > > >> like a > > > > > > >> > slippery > > > > > > >> > >> slope > > > > > > >> > >> >> > if > > > > > > >> > >> >> > you start > allowing > > different > > > > request types > > > > > to > > > > > > >> > implement the > > > > > > >> > >> protocol > > > > > > >> > >> >> > > specification differently. > > > > > > >> > >> >> > > > > > > > >> > >> >> > You can > also argue that > > the > > > > messages > > > > > already are > > > > > > and > > > > > > >> > are likely > > > > > > >> > >> to > > > > > > >> > >> >> > remain a > > > > > > >> > >> >> > special > case. For example, > > > there > > > > is > > > > > currently no > > > > > > >> > generality in > > > > > > >> > >> how > > > > > > >> > >> >> > compressed > message sets > > are > > > > represented > > > > > that would > > > > > > >> be > > > > > > >> > applicable > > > > > > >> > >> for > > > > > > >> > >> >> > other > > > > > > >> > >> >> > request > types. Some might > > > see this > > > > > divergence as > > > > > > an > > > > > > >> > unfortunate > > > > > > >> > >> >> > protocol > > > > > > >> > >> >> > deficiency > which should be > > > fixed; > > > > others > > > > > might see > > > > > > >> it > > > > > > >> > as sort of > > > > > > >> > >> the > > > > > > >> > >> >> > > inevitability of needing > > to > > > > optimize where > > > > > it > > > > > > counts > > > > > > >> > most. I'm > > > > > > >> > >> probably > > > > > > >> > >> >> > somewhere > in between, but > > I > > > think > > > > we > > > > > probably all > > > > > > >> > share the > > > > > > >> > >> intuition > > > > > > >> > >> >> > that > > > > > > >> > >> >> > the > protocol should be > > kept > > > as > > > > consistent as > > > > > > >> possible. > > > > > > >> > With that > > > > > > >> > >> in > > > > > > >> > >> >> > mind, > > > > > > >> > >> >> > here are a > few comments: > > > > > > >> > >> >> > > > > > > > >> > >> >> > 1. One > thing I found a > > > little odd > > > > when > > > > > reading the > > > > > > >> > current > > > > > > >> > >> proposal is > > > > > > >> > >> >> > that > > > > > > >> > >> >> > the headers > are both > > > represented > > > > as an > > > > > array of > > > > > > >> bytes > > > > > > >> > and as an > > > > > > >> > >> array > > > > > > >> > >> >> > of > > > > > > >> > >> >> > key/value > pairs. I'd > > > probably > > > > suggest > > > > > something > > > > > > like > > > > > > >> > this: > > > > > > >> > >> >> > > > > > > > >> > >> >> > Headers => > [HeaderKey > > > HeaderValue] > > > > > > >> > >> >> > HeaderKey > => String > > > > > > >> > >> >> > > HeaderValue => Bytes > > > > > > >> > >> >> > > > > > > > >> > >> >> > An array in > the Kafka > > > protocol is > > > > > represented as a > > > > > > >> > 4-byte integer > > > > > > >> > >> >> > indicating > the number of > > > elements > > > > in the > > > > > array > > > > > > >> > followed by the > > > > > > >> > >> >> > > serialization of the > > > elements. > > > > Unless I'm > > > > > > >> > misunderstanding, what > > > > > > >> > >> you > > > > > > >> > >> >> > have > > > > > > >> > >> >> > instead is > the total size > > > of the > > > > headers in > > > > > bytes > > > > > > >> > followed by the > > > > > > >> > >> >> > elements. > > > > > > >> > >> >> > I'm not > sure I see any > > > reason for > > > > this > > > > > > >> inconsistency. > > > > > > >> > >> >> > > > > > > > >> > >> >> > 2. In > KIP-98, we've > > > introduced > > > > > variable-length > > > > > > >> integer > > > > > > >> > fields. > > > > > > >> > >> >> > Effectively, > > > > > > >> > >> >> > we've > enriched (or > > > "complicated" > > > > as Jay > > > > > might say > > > > > > ;) > > > > > > >> > the protocol > > > > > > >> > >> >> > > specification to include > > the > > > > following > > > > > types: > > > > > > >> VarInt, > > > > > > >> > VarLong, > > > > > > >> > >> >> > > UnsignedVarInt and > > > > UnsignedVarLong. > > > > > > >> > >> >> > > > > > > > >> > >> >> > Along with > these > > > primitives, we > > > > could > > > > > introduce > > > > > > the > > > > > > >> > following > > > > > > >> > >> types: > > > > > > >> > >> >> > > > > > > > >> > >> >> > > VarSizeArray => > > > NumberOfItems > > > > Item1 Item2 > > > > > .. ItemN > > > > > > >> > >> >> > > NumberOfItems => > > > UnsignedVarInt > > > > > > >> > >> >> > > > > > > > >> > >> >> > > VarSizeNullableArray => > > > > NumberOfItemsOrNull > > > > > Item1 > > > > > > >> > Item2 .. ItemN > > > > > > >> > >> >> > > NumberOfItemsOrNull => > > > VarInt > > > > (-1 means > > > > > null) > > > > > > >> > >> >> > > > > > > > >> > >> >> > And > similarly for the > > > `String` > > > > and `Bytes` > > > > > types. > > > > > > >> > These types > > > > > > >> > >> can save > > > > > > >> > >> >> > a > > > > > > >> > >> >> > > considerable amount of > > > space in > > > > this > > > > > proposal > > > > > > >> because > > > > > > >> > they can > > > > > > >> > >> be used > > > > > > >> > >> >> > for > > > > > > >> > >> >> > both the > number of headers > > > > included in the > > > > > message > > > > > > >> and > > > > > > >> > the > > > > > > >> > >> lengths of > > > > > > >> > >> >> > the > > > > > > >> > >> >> > header keys > and values. We > > > could > > > > do this > > > > > instead: > > > > > > >> > >> >> > > > > > > > >> > >> >> > Headers => > > > VarSizeArray[HeaderKey > > > > > HeaderValue] > > > > > > >> > >> >> > HeaderKey > => > > VarSizeString > > > > > > >> > >> >> > > HeaderValue => > > > VarSizeBytes > > > > > > >> > >> >> > > > > > > > >> > >> >> > Combining > the savings from > > > the > > > > use of > > > > > variable > > > > > > >> length > > > > > > >> > fields, the > > > > > > >> > >> >> > benefit > > > > > > >> > >> >> > of using > the attributes to > > > > represent null > > > > > seems > > > > > > >> pretty > > > > > > >> > small. > > > > > > >> > >> >> > > > > > > > >> > >> >> > 3. > Whichever way we go > > > (whether > > > > we use the > > > > > > >> attributes > > > > > > >> > or not), we > > > > > > >> > >> >> > should at > > > > > > >> > >> >> > least be > consistent > > between > > > this > > > > KIP and > > > > > KIP-98. > > > > > > It > > > > > > >> > would be very > > > > > > >> > >> >> > strange > > > > > > >> > >> >> > to have two > ways to > > > represent > > > > null values > > > > > in the > > > > > > >> same > > > > > > >> > schema. > > > > > > >> > >> Either > > > > > > >> > >> >> > way is > > > > > > >> > >> >> > OK with me. > I think some > > > > message-level > > > > > > optimizations > > > > > > >> > are > > > > > > >> > >> justifiable, > > > > > > >> > >> >> > but > > > > > > >> > >> >> > the savings > here seem > > > minimal (a > > > > few bytes > > > > > per > > > > > > >> > message), so > > > > > > >> > >> maybe it's > > > > > > >> > >> >> > not > > > > > > >> > >> >> > worth the > cost of letting > > > the > > > > message > > > > > diverge even > > > > > > >> > further from > > > > > > >> > >> the > > > > > > >> > >> >> > rest of > > > > > > >> > >> >> > the > protocol. > > > > > > >> > >> >> > > > > > > > >> > >> >> > -Jason > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > On Wed, Feb > 15, 2017 at > > > 8:52 AM, > > > > radai < > > > > > > >> > >> > radai.rosenbl...@gmail.com> > > > > > > >> > >> >> > wrote: > > > > > > >> > >> >> > > > > > > > >> > >> >> > > I've > trimmed the inline > > > > contents as this > > > > > mail is > > > > > > >> > getting too > > > > > > >> > >> big for > > > > > > >> > >> >> > the > > > > > > >> > >> >> > > apache > mailing list > > > software to > > > > deliver > > > > > :-( > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > 1. the > important thing > > for > > > > > interoperability is > > > > > > for > > > > > > >> > different > > > > > > >> > >> >> > "interested > > > > > > >> > >> >> > > parties" > (plugins, infra > > > > layers/wrappers, > > > > > > >> user-code) > > > > > > >> > to be > > > > > > >> > >> able to > > > > > > >> > >> >> > stick > > > > > > >> > >> >> > > pieces of > metadata onto > > > msgs > > > > without > > > > > getting in > > > > > > >> each > > > > > > >> > other's > > > > > > >> > >> way. a > > > > > > >> > >> >> > common > > > > > > >> > >> >> > > key > scheme (Strings, as > > > of the > > > > time of > > > > > this > > > > > > >> > writing?) is all > > > > > > >> > >> thats > > > > > > >> > >> >> > required > > > > > > >> > >> >> > > for that. > it is assumed > > > that > > > > the other end > > > > > > >> > interested in any > > > > > > >> > >> such > > > > > > >> > >> >> > piece of > > > > > > >> > >> >> > > metadata > knows the > > > encoding, > > > > and byte[] > > > > > provides > > > > > > >> for > > > > > > >> > the most > > > > > > >> > >> >> > flexibility. > > > > > > >> > >> >> > > i believe > this is the > > same > > > > logic behind > > > > > core > > > > > > kafka > > > > > > >> > being > > > > > > >> > >> >> > byte[]/byte[] - > > > > > > >> > >> >> > > Strings > are more > > "usable" > > > but > > > > bytes are > > > > > flexible > > > > > > >> and > > > > > > >> > so were > > > > > > >> > >> chosen. > > > > > > >> > >> >> > > Also - > core kafka doesnt > > > even > > > > do that > > > > > good of a > > > > > > >> job > > > > > > >> > on > > > > > > >> > >> usability of > > > > > > >> > >> >> > the > > > > > > >> > >> >> > > payload > (example - i > > have > > > to > > > > specify the > > > > > nop > > > > > > >> byte[] > > > > > > >> > "decoders" > > > > > > >> > >> >> > explicitly > > > > > > >> > >> >> > > in conf), > and again > > > sacrificies > > > > usability > > > > > for > > > > > > the > > > > > > >> > sake of > > > > > > >> > >> >> > performance (no > > > > > > >> > >> >> > > > convenient single-record > > > > processing as > > > > > poll is a > > > > > > >> > batch, lots of > > > > > > >> > >> >> > obscure > > > > > > >> > >> >> > > little > config details > > > exposing > > > > internals > > > > > of the > > > > > > >> > batching > > > > > > >> > >> mechanism, > > > > > > >> > >> >> > etc) > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > this is > also why i > > really > > > > dislike the > > > > > idea of a > > > > > > >> > "type system" > > > > > > >> > >> for > > > > > > >> > >> >> > header > > > > > > >> > >> >> > > values, > it further > > > degrades the > > > > > usability, adds > > > > > > >> > complexity and > > > > > > >> > >> will > > > > > > >> > >> >> > > > eventually get in > > > people's way, > > > > also, it > > > > > would > > > > > > be > > > > > > >> > the 2nd/3rd > > > > > > >> > >> >> > home-group > > > > > > >> > >> >> > > > serialization mechanism > > > in core > > > > kafka > > > > > (counting > > > > > > 2 > > > > > > >> > iterations > > > > > > >> > >> of the > > > > > > >> > >> >> > "type > > > > > > >> > >> >> > > > definition DSL") > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > 2. this > is an > > > implementation > > > > detail, and > > > > > not > > > > > > even > > > > > > >> a > > > > > > >> > very "user > > > > > > >> > >> >> > facing" one? > > > > > > >> > >> >> > > to the > best of my > > > understanding > > > > the vote > > > > > process > > > > > > >> is > > > > > > >> > on proposed > > > > > > >> > >> >> > > > API/behaviour. also - > > > since > > > > we're willing > > > > > to go > > > > > > >> with > > > > > > >> > strings > > > > > > >> > >> just > > > > > > >> > >> >> > serialize > > > > > > >> > >> >> > > a 0-sized > header blob > > and > > > IIUC > > > > you dont > > > > > need any > > > > > > >> > optionals > > > > > > >> > >> anymore. > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > 3. yes, > we can :-) > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > On Tue, > Feb 14, 2017 at > > > 11:56 > > > > PM, Michael > > > > > > Pearce < > > > > > > >> > >> >> > > michael.pea...@ig.com> > > > > > > >> > >> >> > > wrote: > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > Hi Jay, > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > 1) > There was some > > > initial > > > > debate on the > > > > > value > > > > > > >> > part, as youll > > > > > > >> > >> note > > > > > > >> > >> >> > String, > > > > > > >> > >> >> > > > String > headers were > > > > discounted early > > > > > on. The > > > > > > >> > reason for this > > > > > > >> > >> is > > > > > > >> > >> >> > > > flexibility > > > > > > >> > >> >> > > > and > keeping in line > > > with the > > > > > flexibility of > > > > > > key, > > > > > > >> > value of the > > > > > > >> > >> >> > message > > > > > > >> > >> >> > > > object > itself. I don’t > > > think > > > > it takes > > > > > away > > > > > > from > > > > > > >> an > > > > > > >> > ecosystem > > > > > > >> > >> as > > > > > > >> > >> >> > each > > > > > > >> > >> >> > > plugin > > > > > > >> > >> >> > > > will > care for their > > own > > > key, > > > > this way > > > > > ints, > > > > > > >> > booleans , exotic > > > > > > >> > >> >> > custom > > > > > > >> > >> >> > > binary > > > > > > >> > >> >> > > > can all > be catered > > for=. > > > > > > >> > >> >> > > > a. If > you really > > wanted > > > to > > > > push for a > > > > > typed > > > > > > >> value > > > > > > >> > interface, > > > > > > >> > >> I > > > > > > >> > >> >> > wouldn’t > > > > > > >> > >> >> > > > want > just String > > values > > > > supported, but > > > > > the the > > > > > > >> > primatives > > > > > > >> > >> plus > > > > > > >> > >> >> > string and > > > > > > >> > >> >> > > > also > still keeping the > > > > ability to have a > > > > > > binary > > > > > > >> > for custom > > > > > > >> > >> >> > binaries that > > > > > > >> > >> >> > > > some > organisations may > > > have. > > > > > > >> > >> >> > > > i. I > have written this > > > slight > > > > > alternative > > > > > > here, > > > > > > >> > >> >> > > > > https://cwiki.apache.org/ > > > > > > >> > >> >> > > > > > > confluence/display/KAFKA/KIP- > > > > > > >> > 82+-+Add+Record+Headers+-+Typed > > > > > > >> > >> >> > > > ii. > Essentially the > > > value > > > > bytes, has a > > > > > leading > > > > > > >> > byte overhead. > > > > > > >> > >> >> > > > 1. > This tells you > > what > > > type > > > > the value > > > > > is, > > > > > > >> before > > > > > > >> > reading > > > > > > >> > >> the rest > > > > > > >> > >> >> > of the > > > > > > >> > >> >> > > > bytes, > allowing > > > > > serialisation/deserialization > > > > > > to > > > > > > >> > and from the > > > > > > >> > >> >> > primitives, > > > > > > >> > >> >> > > > string > and byte[]. > > This > > > is > > > > akin to some > > > > > other > > > > > > >> > messaging > > > > > > >> > >> systems. > > > > > > >> > >> >> > > > 2) We > are making it > > > optional, > > > > so that > > > > > for > > > > > > those > > > > > > >> > not wanting > > > > > > >> > >> >> > headers have > > > > > > >> > >> >> > > 0 > > > > > > >> > >> >> > > > bytes > overhead (think > > > of it > > > > as a feature > > > > > > flag), > > > > > > >> I > > > > > > >> > don’t > > > > > > >> > >> think this > > > > > > >> > >> >> > is > > > > > > >> > >> >> > > > > complex, especially if > > > > comparing to > > > > > changes > > > > > > >> > proposed in > > > > > > >> > >> other kips > > > > > > >> > >> >> > like > > > > > > >> > >> >> > > > kip-98. > > > > > > >> > >> >> > > > a. If > you really > > really > > > don’t > > > > like > > > > > this, we > > > > > > can > > > > > > >> > drop it, but > > > > > > >> > >> it > > > > > > >> > >> >> > would > > > > > > >> > >> >> > > mean > > > > > > >> > >> >> > > > buying > into 4 bytes > > > extra > > > > overhead for > > > > > users > > > > > > who > > > > > > >> > do not want > > > > > > >> > >> to use > > > > > > >> > >> >> > > headers. > > > > > > >> > >> >> > > > 3) In > the summary yes, > > > it is > > > > at a higher > > > > > > level, > > > > > > >> > but I think > > > > > > >> > >> this > > > > > > >> > >> >> > is well > > > > > > >> > >> >> > > > > documented in the > > > proposed > > > > changes > > > > > section. > > > > > > >> > >> >> > > > a. > Added getHeaders > > > method to > > > > > > Producer/Consumer > > > > > > >> > record (that > > > > > > >> > >> is it) > > > > > > >> > >> >> > > > b. > We’ve also detailed > > > the > > > > new Headers > > > > > class > > > > > > >> that > > > > > > >> > this method > > > > > > >> > >> >> > returns > > > > > > >> > >> >> > > that > > > > > > >> > >> >> > > > > encapsulates the > > headers > > > > protocol and > > > > > logic. > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > Best, > > > > > > >> > >> >> > > > Mike > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > ==Original questions > > > from the > > > > vote > > > > > thread from > > > > > > >> > Jay.== > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > Couple > of things I > > > think we > > > > still need > > > > > to work > > > > > > >> out: > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > 1. I > think we agree > > > about > > > > the key, > > > > > but I > > > > > > >> think > > > > > > >> > we haven't > > > > > > >> > >> >> > talked about > > > > > > >> > >> >> > > > the > value yet. I > > > think if > > > > our goal > > > > > is an > > > > > > open > > > > > > >> > ecosystem > > > > > > >> > >> of these > > > > > > >> > >> >> > > header > > > > > > >> > >> >> > > > > spread across many > > > plugins > > > > from many > > > > > > systems > > > > > > >> we > > > > > > >> > should > > > > > > >> > >> consider > > > > > > >> > >> >> > making > > > > > > >> > >> >> > > > this > > > > > > >> > >> >> > > > a > string as well so > > > it can > > > > be > > > > > printed, set > > > > > > >> via > > > > > > >> > a UI, set > > > > > > >> > >> in > > > > > > >> > >> >> > config, > > > > > > >> > >> >> > > etc. > > > > > > >> > >> >> > > > > Basically > > encouraging > > > > pluggable > > > > > > serialization > > > > > > >> > formats > > > > > > >> > >> here will > > > > > > >> > >> >> > lead > > > > > > >> > >> >> > > to > > > > > > >> > >> >> > > > a > > > > > > >> > >> >> > > > bit > of a tower of > > > babel. > > > > > > >> > >> >> > > > 2. > This proposal > > > still > > > > includes a > > > > > pretty > > > > > > big > > > > > > >> > change to our > > > > > > >> > >> >> > > > serialization > > > > > > >> > >> >> > > > and > protocol > > > definition > > > > layer. > > > > > Essentially > > > > > > >> it is > > > > > > >> > >> introducing an > > > > > > >> > >> >> > > optional > > > > > > >> > >> >> > > > > type, where the > > > format is > > > > data > > > > > dependent. I > > > > > > >> > think this is > > > > > > >> > >> >> > actually a > > > > > > >> > >> >> > > big > > > > > > >> > >> >> > > > > change though it > > > doesn't > > > > seem like > > > > > it. It > > > > > > >> means > > > > > > >> > you can no > > > > > > >> > >> >> > longer > > > > > > >> > >> >> > > > specify > > > > > > >> > >> >> > > > this > type with our > > > type > > > > definition > > > > > DSL, and > > > > > > >> > likewise it > > > > > > >> > >> requires > > > > > > >> > >> >> > > custom > > > > > > >> > >> >> > > > > handling in client > > > libs. > > > > This isn't > > > > > a huge > > > > > > >> > thing, since > > > > > > >> > >> the > > > > > > >> > >> >> > Record > > > > > > >> > >> >> > > > > definition is > > custom > > > > anyway, but I > > > > > think > > > > > > this > > > > > > >> > kind of > > > > > > >> > >> protocol > > > > > > >> > >> >> > > > > inconsistency is > > very > > > > non-desirable > > > > > and > > > > > > ties > > > > > > >> > you to > > > > > > >> > >> hand-coding > > > > > > >> > >> >> > > things. > > > > > > >> > >> >> > > > I > > > > > > >> > >> >> > > > > think the type > > should > > > > instead by [Key > > > > > > Value] > > > > > > >> in > > > > > > >> > our BNF, > > > > > > >> > >> where > > > > > > >> > >> >> > key and > > > > > > >> > >> >> > > > > value are both > > short > > > > strings as used > > > > > > >> elsewhere. > > > > > > >> > This > > > > > > >> > >> brings it > > > > > > >> > >> >> > in line > > > > > > >> > >> >> > > > with > > > > > > >> > >> >> > > > the > rest of the > > > protocol. > > > > > > >> > >> >> > > > 3. > Could we get > > more > > > > specific about > > > > > the > > > > > > exact > > > > > > >> > Java API > > > > > > >> > >> change to > > > > > > >> > >> >> > > > > ProducerRecord, > > > > ConsumerRecord, > > > > > Record, > > > > > > etc? > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > -Jay > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > The information > contained in > > > this > > > > email is > > > > > strictly > > > > > > >> > confidential and > > > > > > >> > >> for > > > > > > >> > >> >> > the use of the > addressee only, > > > unless > > > > otherwise > > > > > > >> indicated. > > > > > > >> > If you > > > > > > >> > >> are not > > > > > > >> > >> >> > the intended > recipient, please > > > do not > > > > read, > > > > > copy, use > > > > > > or > > > > > > >> > disclose to > > > > > > >> > >> others > > > > > > >> > >> >> > this message or > any > > attachment. > > > > Please also > > > > > notify the > > > > > > >> > sender by > > > > > > >> > >> replying > > > > > > >> > >> >> > to this email > or by telephone > > > > (+44(020 7896 > > > > > 0011) and > > > > > > >> then > > > > > > >> > delete > > > > > > >> > >> the email > > > > > > >> > >> >> > and any copies > of it. > > Opinions, > > > > conclusion > > > > > (etc) that > > > > > > do > > > > > > >> > not relate > > > > > > >> > >> to the > > > > > > >> > >> >> > official > business of this > > > company > > > > shall be > > > > > understood > > > > > > as > > > > > > >> > neither > > > > > > >> > >> given nor > > > > > > >> > >> >> > endorsed by it. > IG is a > > trading > > > name > > > > of IG > > > > > Markets > > > > > > >> Limited > > > > > > >> > (a company > > > > > > >> > >> >> > registered in > England and > > Wales, > > > > company number > > > > > > >> 04008957) > > > > > > >> > and IG > > > > > > >> > >> Index > > > > > > >> > >> >> > Limited (a > company registered > > in > > > > England and > > > > > Wales, > > > > > > >> > company number > > > > > > >> > >> >> > 01190902). > Registered address > > at > > > > Cannon Bridge > > > > > House, > > > > > > 25 > > > > > > >> > Dowgate > > > > > > >> > >> Hill, > > > > > > >> > >> >> > London EC4R > 2YA. Both IG > > Markets > > > > Limited > > > > > (register > > > > > > >> number > > > > > > >> > 195355) > > > > > > >> > >> and IG > > > > > > >> > >> >> > Index Limited > (register number > > > > 114059) are > > > > > authorised > > > > > > >> and > > > > > > >> > regulated > > > > > > >> > >> by the > > > > > > >> > >> >> > Financial > Conduct Authority. > > > > > > >> > >> >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> -- > > > > > > >> > >> Nacho - Ignacio > Solis - > > > iso...@igso.net > > > > > > >> > >> > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > The information contained in > this email is > > > strictly > > > > > confidential and > > > > > > for > > > > > > >> > the use of the addressee only, > unless > > otherwise > > > > indicated. If > > > > > you are > > > > > > >> not > > > > > > >> > the intended recipient, please > do not read, > > > copy, use > > > > or > > > > > disclose to > > > > > > >> others > > > > > > >> > this message or any attachment. > Please also > > > notify > > > > the sender by > > > > > > >> replying > > > > > > >> > to this email or by telephone > (+44(020 7896 > > > 0011) and > > > > then > > > > > delete the > > > > > > >> email > > > > > > >> > and any copies of it. Opinions, > conclusion > > > (etc) that > > > > do not > > > > > relate to > > > > > > >> the > > > > > > >> > official business of this > company shall be > > > understood > > > > as > > > > > neither given > > > > > > >> nor > > > > > > >> > endorsed by it. IG is a trading > name of IG > > > Markets > > > > Limited (a > > > > > company > > > > > > >> > registered in England and > Wales, company > > number > > > > 04008957) and > > > > > IG Index > > > > > > >> > Limited (a company registered > in England and > > > Wales, > > > > company > > > > > number > > > > > > >> > 01190902). Registered address > at Cannon Bridge > > > House, > > > > 25 > > > > > Dowgate Hill, > > > > > > >> > London EC4R 2YA. Both IG > Markets Limited > > > (register > > > > number > > > > > 195355) and > > > > > > IG > > > > > > >> > Index Limited (register number > 114059) are > > > authorised > > > > and > > > > > regulated by > > > > > > >> the > > > > > > >> > Financial Conduct Authority. > > > > > > >> > > > > > > > >> The information contained in this > email is > > > strictly > > > > confidential > > > > > and for > > > > > > >> the use of the addressee only, > unless otherwise > > > > indicated. If you > > > > > are > > > > > > not > > > > > > >> the intended recipient, please do > not read, > > copy, > > > use > > > > or disclose > > > > > to > > > > > > others > > > > > > >> this message or any attachment. > Please also > > > notify the > > > > sender by > > > > > > replying > > > > > > >> to this email or by telephone > (+44(020 7896 > > 0011) > > > and > > > > then delete > > > > > the > > > > > > email > > > > > > >> and any copies of it. Opinions, > conclusion (etc) > > > that > > > > do not > > > > > relate to > > > > > > the > > > > > > >> official business of this company > shall be > > > understood > > > > as neither > > > > > given > > > > > > nor > > > > > > >> endorsed by it. IG is a trading > name of IG > > Markets > > > > Limited (a > > > > > company > > > > > > >> registered in England and Wales, > company number > > > > 04008957) and IG > > > > > Index > > > > > > >> Limited (a company registered in > England and > > > Wales, > > > > company number > > > > > > >> 01190902). Registered address at > Cannon Bridge > > > House, > > > > 25 Dowgate > > > > > Hill, > > > > > > >> London EC4R 2YA. Both IG Markets > Limited > > (register > > > > number 195355) > > > > > and IG > > > > > > >> Index Limited (register number > 114059) are > > > authorised > > > > and > > > > > regulated by > > > > > > the > > > > > > >> Financial Conduct Authority. > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > The information contained in this > email is strictly > > > > confidential and > > > > > for the use of the addressee only, unless > otherwise > > > indicated. > > > > If you are > > > > > not the intended recipient, please do not > read, copy, use > > > or > > > > disclose to > > > > > others this message or any attachment. > Please also notify > > > the > > > > sender by > > > > > replying to this email or by telephone > (+44(020 7896 > > 0011) > > > and > > > > then delete > > > > > the email and any copies of it. Opinions, > conclusion > > (etc) > > > that > > > > do not > > > > > relate to the official business of this > company shall be > > > > understood as > > > > > neither given nor endorsed by it. IG is a > trading name of > > > IG > > > > Markets > > > > > Limited (a company registered in England > and Wales, > > company > > > > number > > > > > 04008957) and IG Index Limited (a company > registered in > > > England > > > > and Wales, > > > > > company number 01190902). Registered > address at Cannon > > > Bridge > > > > House, 25 > > > > > Dowgate Hill, London EC4R 2YA. Both IG > Markets Limited > > > (register > > > > number > > > > > 195355) and IG Index Limited (register > number 114059) are > > > > authorised and > > > > > regulated by the Financial Conduct > Authority. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The information contained in this email is > strictly > > confidential > > > and > > > > for the use of the addressee only, unless otherwise > indicated. If > > > you are > > > > not the intended recipient, please do not read, > copy, use or > > > disclose to > > > > others this message or any attachment. Please also > notify the > > sender > > > by > > > > replying to this email or by telephone (+44(020 7896 > 0011) and then > > > delete > > > > the email and any copies of it. Opinions, conclusion > (etc) that do > > > not > > > > relate to the official business of this company > shall be understood > > > as > > > > neither given nor endorsed by it. IG is a trading > name of IG > > Markets > > > > Limited (a company registered in England and Wales, > company number > > > > 04008957) and IG Index Limited (a company registered > in England and > > > Wales, > > > > company number 01190902). Registered address at > Cannon Bridge > > House, > > > 25 > > > > Dowgate Hill, London EC4R 2YA. Both IG Markets > Limited (register > > > number > > > > 195355) and IG Index Limited (register number > 114059) are > > authorised > > > and > > > > regulated by the Financial Conduct Authority. > > > > > > > > > > > > > > > > > > > > > The information contained in this email is strictly > confidential and for > > > the use of the addressee only, unless otherwise indicated. > If you are not > > > the intended recipient, please do not read, copy, use or > disclose to > > others > > > this message or any attachment. Please also notify the > sender by replying > > > to this email or by telephone (+44(020 7896 0011) and then > delete the > > email > > > and any copies of it. Opinions, conclusion (etc) that do > not relate to > > the > > > official business of this company shall be understood as > neither given > > nor > > > endorsed by it. IG is a trading name of IG Markets Limited > (a company > > > registered in England and Wales, company number 04008957) > and IG Index > > > Limited (a company registered in England and Wales, > company number > > > 01190902). Registered address at Cannon Bridge House, 25 > Dowgate Hill, > > > London EC4R 2YA. Both IG Markets Limited (register number > 195355) and IG > > > Index Limited (register number 114059) are authorised and > regulated by > > the > > > Financial Conduct Authority. > > > > > > > > > > The information contained in this email is strictly confidential and > for the use of the addressee only, unless otherwise indicated. If you are > not the intended recipient, please do not read, copy, use or disclose to > others this message or any attachment. Please also notify the sender by > replying to this email or by telephone (+44(020 7896 0011) and then delete > the email and any copies of it. Opinions, conclusion (etc) that do not > relate to the official business of this company shall be understood as > neither given nor endorsed by it. IG is a trading name of IG Markets > Limited (a company registered in England and Wales, company number > 04008957) and IG Index Limited (a company registered in England and Wales, > company number 01190902). Registered address at Cannon Bridge House, 25 > Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number > 195355) and IG Index Limited (register number 114059) are authorised and > regulated by the Financial Conduct Authority. > > >