I don't have a lot of feedback on this, but at Zendesk we could definitely use a standardized header system. Using ints as keys sounds tedious, but if that's a necessary tradeoff I'd be okay with it.
On Fri, Dec 2, 2016 at 5:44 AM Todd Palino <tpal...@gmail.com> wrote: > Come on, I’ve done at least 2 talks on this one :) > > Producing counts to a topic is part of it, but that’s only part. So you > count you have 100 messages in topic A. When you mirror topic A to another > cluster, you have 99 messages. Where was your problem? Or worse, you have > 100 messages, but one producer duplicated messages and another one lost > messages. You need details about where the message came from in order to > pinpoint problems when they happen. Source producer info, where it was > produced into your infrastructure, and when it was produced. This requires > you to add the information to the message. > > And yes, you still need to maintain your clients. So maybe my original > example was not the best. My thoughts on not wanting to be responsible for > message formats stands, because that’s very much separate from the client. > As you know, we have our own internal client library that can insert the > right headers, and right now inserts the right audit information into the > message fields. If they exist, and assuming the message is Avro encoded. > What if someone wants to use JSON instead for a good reason? What if user X > wants to encrypt messages, but user Y does not? Maintaining the client > library is still much easier than maintaining the message formats. > > > -Todd > > > > On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira <g...@confluent.io> wrote: > > > Based on your last sentence, consider me convinced :) > > > > I get why headers are critical for Mirroring (you need tags to prevent > > loops and sometimes to route messages to the correct destination). > > But why do you need headers to audit? We are auditing by producing > > counts to a side topic (and I was under the impression you do the > > same), so we never need to modify the message. > > > > Another thing - after we added headers, wouldn't you be in the > > business of making sure everyone uses them properly? Making sure > > everyone includes the right headers you need, not using the header > > names you intend to use, etc. I don't think the "policing" business > > will ever go away. > > > > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino <tpal...@gmail.com> wrote: > > > Got it. As an ops guy, I'm not very happy with the workaround. Avro > means > > > that I have to be concerned with the format of the messages in order to > > run > > > the infrastructure (audit, mirroring, etc.). That means that I have to > > > handle the schemas, and I have to enforce rules about good formats. > This > > is > > > not something I want to be in the business of, because I should be able > > to > > > run a service infrastructure without needing to be in the weeds of > > dealing > > > with customer data formats. > > > > > > Trust me, a sizable portion of my support time is spent dealing with > > schema > > > issues. I really would like to get away from that. Maybe I'd have more > > time > > > for other hobbies. Like writing. ;) > > > > > > -Todd > > > > > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <g...@confluent.io> wrote: > > > > > >> I'm pretty satisfied with the current workarounds (Avro container > > >> format), so I'm not too excited about the extra work required to do > > >> headers in Kafka. I absolutely don't mind it if you do it... > > >> I think the Apache convention for "good idea, but not willing to put > > >> any work toward it" is +0.5? anyway, that's what I was trying to > > >> convey :) > > >> > > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <tpal...@gmail.com> > wrote: > > >> > Well I guess my question for you, then, is what is holding you back > > from > > >> > full support for headers? What’s the bit that you’re missing that > has > > you > > >> > under a full +1? > > >> > > > >> > -Todd > > >> > > > >> > > > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <g...@confluent.io> > > wrote: > > >> > > > >> >> I know why people who support headers support them, and I've seen > > what > > >> >> the discussion is like. > > >> >> > > >> >> This is why I'm asking people who are against headers (especially > > >> >> committers) what will make them change their mind - so we can get > > this > > >> >> part over one way or another. > > >> >> > > >> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am > > >> >> just looking for something concrete we can do to move the > discussion > > >> >> along to the yummy design details (which is the argument I really > am > > >> >> looking forward to). > > >> >> > > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <tpal...@gmail.com> > > wrote: > > >> >> > So, Gwen, to your question (even though I’m not a committer)... > > >> >> > > > >> >> > I have always been a strong supporter of introducing the concept > > of an > > >> >> > envelope to messages, which headers accomplishes. The message key > > is > > >> >> > already an example of a piece of envelope information. By > > providing a > > >> >> means > > >> >> > to do this within Kafka itself, and not relying on use-case > > specific > > >> >> > implementations, you make it much easier for components to > > >> interoperate. > > >> >> It > > >> >> > simplifies development of all these things (message routing, > > auditing, > > >> >> > encryption, etc.) because each one does not have to reinvent the > > >> wheel. > > >> >> > > > >> >> > It also makes it much easier from a client point of view if the > > >> headers > > >> >> are > > >> >> > defined as part of the protocol and/or message format in general > > >> because > > >> >> > you can easily produce and consume messages without having to > take > > >> into > > >> >> > account specific cases. For example, I want to route messages, > but > > >> >> client A > > >> >> > doesn’t support the way audit implemented headers, and client B > > >> doesn’t > > >> >> > support the way encryption or routing implemented headers, so now > > my > > >> >> > application has to create some really fragile (my autocorrect > just > > >> tried > > >> >> to > > >> >> > make that “tragic”, which is probably appropriate too) code to > > strip > > >> >> > everything off, rather than just consuming the messages, picking > > out > > >> the > > >> >> 1 > > >> >> > or 2 headers it’s interested in, and performing its function. > > >> >> > > > >> >> > Honestly, this discussion has been going on for a long time, and > > it’s > > >> >> > always “Oh, you came up with 2 use cases, and yeah, those use > cases > > >> are > > >> >> > real things that someone would want to do. Here’s an alternate > way > > to > > >> >> > implement them so let’s not do headers.” If we have a few use > cases > > >> that > > >> >> we > > >> >> > actually came up with, you can be sure that over the next year > > >> there’s a > > >> >> > dozen others that we didn’t think of that someone would like to > > do. I > > >> >> > really think it’s time to stop rehashing this discussion and > > instead > > >> >> focus > > >> >> > on a workable standard that we can adopt. > > >> >> > > > >> >> > -Todd > > >> >> > > > >> >> > > > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <tpal...@gmail.com> > > >> wrote: > > >> >> > > > >> >> >> C. per message encryption > > >> >> >>> One drawback of this approach is that this significantly reduce > > the > > >> >> >>> effectiveness of compression, which happens on a set of > > serialized > > >> >> >>> messages. An alternative is to enable SSL for wire encryption > and > > >> rely > > >> >> on > > >> >> >>> the storage system (e.g. LUKS) for at rest encryption. > > >> >> >> > > >> >> >> > > >> >> >> Jun, this is not sufficient. While this does cover the case of > > >> removing > > >> >> a > > >> >> >> drive from the system, it will not satisfy most compliance > > >> requirements > > >> >> for > > >> >> >> encryption of data as whoever has access to the broker itself > > still > > >> has > > >> >> >> access to the unencrypted data. For end-to-end encryption you > > need to > > >> >> >> encrypt at the producer, before it enters the system, and > decrypt > > at > > >> the > > >> >> >> consumer, after it exits the system. > > >> >> >> > > >> >> >> -Todd > > >> >> >> > > >> >> >> > > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai < > radai.rosenbl...@gmail.com > > > > > >> >> wrote: > > >> >> >> > > >> >> >>> another big plus of headers in the protocol is that it would > > enable > > >> >> rapid > > >> >> >>> iteration on ideas outside of core kafka and would reduce the > > >> number of > > >> >> >>> future wire format changes required. > > >> >> >>> > > >> >> >>> a lot of what is currently a KIP represents use cases that are > > not > > >> 100% > > >> >> >>> relevant to all users, and some of them require rather invasive > > wire > > >> >> >>> protocol changes. a thing a good recent example of this is > > kip-98. > > >> >> >>> tx-utilizing traffic is expected to be a very small fraction of > > >> total > > >> >> >>> traffic and yet the changes are invasive. > > >> >> >>> > > >> >> >>> every such wire format change translates into painful and slow > > >> >> adoption of > > >> >> >>> new versions. > > >> >> >>> > > >> >> >>> i think a lot of functionality currently in KIPs could be "spun > > out" > > >> >> and > > >> >> >>> implemented as opt-in plugins transmitting data over headers. > > this > > >> >> would > > >> >> >>> keep the core wire format stable(r), core codebase smaller, and > > >> avoid > > >> >> the > > >> >> >>> "burden of proof" thats sometimes required to prove a certain > > >> feature > > >> >> is > > >> >> >>> useful enough for a wide-enough audience to warrant a wire > format > > >> >> change > > >> >> >>> and code complexity additions. > > >> >> >>> > > >> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes > and > > im > > >> not > > >> >> >>> saying it could have been completely done with headers, but > > >> >> exactly-once > > >> >> >>> delivery certainly could) > > >> >> >>> > > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira < > g...@confluent.io > > > > > >> >> wrote: > > >> >> >>> > > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai < > > >> radai.rosenbl...@gmail.com> > > >> >> >>> wrote: > > >> >> >>> > > "For use cases within an organization, one could always use > > >> other > > >> >> >>> > > approaches such as company-wise containers" > > >> >> >>> > > this is what linkedin has traditionally done but there are > > now > > >> >> cases > > >> >> >>> > (read > > >> >> >>> > > - topics) where this is not acceptable. this makes headers > > >> useful > > >> >> even > > >> >> >>> > > within single orgs for cases where one-container-fits-all > > cannot > > >> >> >>> apply. > > >> >> >>> > > > > >> >> >>> > > as for the particular use cases listed, i dont want this to > > >> devolve > > >> >> >>> to a > > >> >> >>> > > discussion of particular use cases - i think its enough > that > > >> some > > >> >> of > > >> >> >>> them > > >> >> >>> > > > >> >> >>> > I think a main point of contention is that: We identified few > > >> >> >>> > use-cases where headers are useful, do we want Kafka to be a > > >> system > > >> >> >>> > that supports those use-cases? > > >> >> >>> > > > >> >> >>> > For example, Jun said: > > >> >> >>> > "Not sure how widely useful record-level lineage is though > > since > > >> the > > >> >> >>> > overhead could > > >> >> >>> > be significant." > > >> >> >>> > > > >> >> >>> > We know NiFi supports record level lineage. I don't think it > > was > > >> >> >>> > developed for lols, I think it is safe to assume that the NSA > > >> needed > > >> >> >>> > that functionality. We also know that certain financial > > institutes > > >> >> >>> > need to track tampering with records at a record level and > > there > > >> are > > >> >> >>> > federal regulations that absolutely require this. They also > > need > > >> to > > >> >> >>> > prove that routing apps that "touches" the messages and > either > > >> reads > > >> >> >>> > or updates headers couldn't have possibly modified the > payload > > >> >> itself. > > >> >> >>> > They use record level encryption to do that - apps can read > and > > >> >> >>> > (sometimes) modify headers but can't touch the payload. > > >> >> >>> > > > >> >> >>> > We can totally say "those are corner cases and not worth > adding > > >> >> >>> > headers to Kafka for", they should use a different pubsub > > message > > >> for > > >> >> >>> > that (Nifi or one of the other 1000 that cater specifically > to > > the > > >> >> >>> > financial industry). > > >> >> >>> > > > >> >> >>> > But this gets us into a catch 22: > > >> >> >>> > If we discuss a specific use-case, someone can always say it > > isn't > > >> >> >>> > interesting enough for Kafka. If we discuss more general > > trends, > > >> >> >>> > others can say "well, we are not sure any of them really > needs > > >> >> headers > > >> >> >>> > specifically. This is just hand waving and not interesting.". > > >> >> >>> > > > >> >> >>> > I think discussing use-cases in specifics is super important > to > > >> >> decide > > >> >> >>> > implementation details for headers (my use-cases lean toward > > >> >> numerical > > >> >> >>> > keys with namespaces and object values, others differ), but I > > >> think > > >> >> we > > >> >> >>> > need to answer the general "Are we going to have headers" > > question > > >> >> >>> > first. > > >> >> >>> > > > >> >> >>> > I'd love to hear from the other committers in the discussion: > > >> >> >>> > What would it take to convince you that headers in Kafka are > a > > >> good > > >> >> >>> > idea in general, so we can move ahead and try to agree on the > > >> >> details? > > >> >> >>> > > > >> >> >>> > I feel like we keep moving the goal posts and this is truly > > >> >> exhausting. > > >> >> >>> > > > >> >> >>> > For the record, I mildly support adding headers to Kafka > > (+0.5?). > > >> >> >>> > The community can continue to find workarounds to the issue > and > > >> there > > >> >> >>> > are some benefits to keeping the message format and clients > > >> simpler. > > >> >> >>> > But I see the usefulness of headers to many use-cases and if > we > > >> can > > >> >> >>> > find a good and generally useful way to add it to Kafka, it > > will > > >> make > > >> >> >>> > Kafka easier to use for many - worthy goal in my eyes. > > >> >> >>> > > > >> >> >>> > > are interesting/feasible, but: > > >> >> >>> > > A+B. i think there are use cases for polyglot topics. > > >> especially if > > >> >> >>> kafka > > >> >> >>> > > is being used to "trunk" something else. > > >> >> >>> > > D. multiple topics would make it harder to write portable > > >> consumer > > >> >> >>> code. > > >> >> >>> > > partition remapping would mess with locality of consumption > > >> >> >>> guarantees. > > >> >> >>> > > E+F. a use case I see for lineage/metadata is > > >> billing/chargeback. > > >> >> for > > >> >> >>> > that > > >> >> >>> > > use case it is not enough to simply record the point of > > origin, > > >> but > > >> >> >>> every > > >> >> >>> > > replication stop (think mirror maker) must also add a > record > > to > > >> >> form a > > >> >> >>> > > "transit log". > > >> >> >>> > > > > >> >> >>> > > as for stream processing on top of kafka - i know samza > has a > > >> >> metadata > > >> >> >>> > map > > >> >> >>> > > which they carry around in addition to user values. headers > > are > > >> the > > >> >> >>> > perfect > > >> >> >>> > > fit for these things. > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <j...@confluent.io > > > > >> wrote: > > >> >> >>> > > > > >> >> >>> > >> Hi, Michael, > > >> >> >>> > >> > > >> >> >>> > >> In order to answer the first two questions, it would be > > helpful > > >> >> if we > > >> >> >>> > could > > >> >> >>> > >> identify 1 or 2 strong use cases for headers in the space > > for > > >> >> >>> > third-party > > >> >> >>> > >> vendors. For use cases within an organization, one could > > always > > >> >> use > > >> >> >>> > other > > >> >> >>> > >> approaches such as company-wise containers to get around > w/o > > >> >> >>> headers. I > > >> >> >>> > >> went through the use cases in the KIP and in Radai's wiki > ( > > >> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+ > <https://cwiki.apache.org/confluence/display/KAFKA/A+> > > >> >> >>> > Case+for+Kafka+Headers > > >> >> >>> > >> ). > > >> >> >>> > >> The following are the ones that that I understand and > could > > be > > >> in > > >> >> the > > >> >> >>> > >> third-party use case category. > > >> >> >>> > >> > > >> >> >>> > >> A. content-type > > >> >> >>> > >> It seems that in general, content-type should be set at > the > > >> topic > > >> >> >>> level. > > >> >> >>> > >> Not sure if mixing messages with different content types > > >> should be > > >> >> >>> > >> encouraged. > > >> >> >>> > >> > > >> >> >>> > >> B. schema id > > >> >> >>> > >> Since the value is mostly useless without schema id, it > > seems > > >> that > > >> >> >>> > storing > > >> >> >>> > >> the schema id together with serialized bytes in the value > is > > >> >> better? > > >> >> >>> > >> > > >> >> >>> > >> C. per message encryption > > >> >> >>> > >> One drawback of this approach is that this significantly > > reduce > > >> >> the > > >> >> >>> > >> effectiveness of compression, which happens on a set of > > >> serialized > > >> >> >>> > >> messages. An alternative is to enable SSL for wire > > encryption > > >> and > > >> >> >>> rely > > >> >> >>> > on > > >> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption. > > >> >> >>> > >> > > >> >> >>> > >> D. cluster ID for mirroring across Kafka clusters > > >> >> >>> > >> This is actually interesting. Today, to avoid introducing > > >> cycles > > >> >> when > > >> >> >>> > doing > > >> >> >>> > >> mirroring across data centers, one would either have to > set > > up > > >> two > > >> >> >>> Kafka > > >> >> >>> > >> clusters (a local and an aggregate) per data center or > > rename > > >> >> topics. > > >> >> >>> > >> Neither is ideal. With headers, the producer could tag > each > > >> >> message > > >> >> >>> with > > >> >> >>> > >> the producing cluster ID in the header. MirrorMaker could > > then > > >> >> avoid > > >> >> >>> > >> mirroring messages to a cluster if they are tagged with > the > > >> same > > >> >> >>> cluster > > >> >> >>> > >> id. > > >> >> >>> > >> > > >> >> >>> > >> However, an alternative approach is to introduce sth like > > >> >> >>> hierarchical > > >> >> >>> > >> topic and store messages from different clusters in > > different > > >> >> >>> partitions > > >> >> >>> > >> under the same topic. This approach avoids filtering out > > >> unneeded > > >> >> >>> data > > >> >> >>> > and > > >> >> >>> > >> makes offset preserving easier to support. It may make > > >> compaction > > >> >> >>> > trickier > > >> >> >>> > >> though since the same key may show up in different > > partitions. > > >> >> >>> > >> > > >> >> >>> > >> E. record-level lineage > > >> >> >>> > >> For example, a source connector could store in the message > > the > > >> >> >>> metadata > > >> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream > job > > >> >> >>> transforms > > >> >> >>> > >> messages from topic A to topic B, the library could > include > > the > > >> >> >>> source > > >> >> >>> > >> message offset in each of the transformed message in the > > >> header. > > >> >> Not > > >> >> >>> > sure > > >> >> >>> > >> how widely useful record-level lineage is though since the > > >> >> overhead > > >> >> >>> > could > > >> >> >>> > >> be significant. > > >> >> >>> > >> > > >> >> >>> > >> F. auditing metadata > > >> >> >>> > >> We could put things like clientId/host/user in the header > in > > >> each > > >> >> >>> > message > > >> >> >>> > >> for auditing. These metadata are really at the producer > > level > > >> >> though. > > >> >> >>> > So, a > > >> >> >>> > >> more efficient way is to only include a "producerId" per > > >> message > > >> >> and > > >> >> >>> > send > > >> >> >>> > >> the producerId -> metadata mapping independently. KIP-98 > is > > >> >> actually > > >> >> >>> > >> proposing including such a producerId natively in the > > message. > > >> >> >>> > >> > > >> >> >>> > >> So, overall, I not sure that I am fully convinced of the > > strong > > >> >> >>> > third-party > > >> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit > > more > > >> to > > >> >> make > > >> >> >>> > one > > >> >> >>> > >> or two really convincing use cases. > > >> >> >>> > >> > > >> >> >>> > >> Another orthogonal question is whether header should be > > >> exposed > > >> >> in > > >> >> >>> > stream > > >> >> >>> > >> processing systems such Kafka stream, Samza, and Spark > > >> streaming. > > >> >> >>> > >> Currently, those systems just deal with key/value pairs. > > >> Should we > > >> >> >>> > expose a > > >> >> >>> > >> third thing header there too or somehow map header to key > or > > >> >> value? > > >> >> >>> > >> > > >> >> >>> > >> Thanks, > > >> >> >>> > >> > > >> >> >>> > >> Jun > > >> >> >>> > >> > > >> >> >>> > >> > > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce < > > >> >> >>> michael.pea...@ig.com> > > >> >> >>> > >> wrote: > > >> >> >>> > >> > > >> >> >>> > >> > I assume, that after a period of a week, that there is > no > > >> >> concerns > > >> >> >>> now > > >> >> >>> > >> > with points 1, and 2 and now we have agreement that > > headers > > >> are > > >> >> >>> useful > > >> >> >>> > >> and > > >> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this > > wouldn’t > > >> be > > >> >> a > > >> >> >>> > reason > > >> >> >>> > >> to > > >> >> >>> > >> > reject. > > >> >> >>> > >> > > > >> >> >>> > >> > @ > > >> >> >>> > >> > Ignacio on point 4). > > >> >> >>> > >> > I think for purpose of getting this KIP moving past > this, > > we > > >> can > > >> >> >>> state > > >> >> >>> > >> the > > >> >> >>> > >> > key will be a 4 bytes space that can will be naturally > > >> >> interpreted > > >> >> >>> as > > >> >> >>> > an > > >> >> >>> > >> > Int32 (if namespacing is later wanted you can easily > split > > >> this > > >> >> >>> into > > >> >> >>> > two > > >> >> >>> > >> > int16 spaces), from the wire protocol implementation > this > > >> makes > > >> >> no > > >> >> >>> > >> > difference I don’t believe. Is this reasonable to all? > > >> >> >>> > >> > > > >> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32 > bits. > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > On 18/11/2016, 20:34, "ignacio.so...@gmail.com on > behalf > > of > > >> >> >>> Ignacio > > >> >> >>> > >> > Solis" <ignacio.so...@gmail.com on behalf of > > iso...@igso.net > > >> > > > >> >> >>> wrote: > > >> >> >>> > >> > > > >> >> >>> > >> > Summary: > > >> >> >>> > >> > > > >> >> >>> > >> > 3) Yes - Header value as byte[] > > >> >> >>> > >> > > > >> >> >>> > >> > 4a) Int,Int - No > > >> >> >>> > >> > 4b) Int - Yes > > >> >> >>> > >> > 4c) String - Reluctant maybe > > >> >> >>> > >> > > > >> >> >>> > >> > 5) I believe the header system should take a single > > >> int. I > > >> >> >>> think > > >> >> >>> > >> > 32bits is > > >> >> >>> > >> > a good size, if you want to interpret this as to 16bit > > >> >> numbers > > >> >> >>> in > > >> >> >>> > the > > >> >> >>> > >> > layer > > >> >> >>> > >> > above go right ahead. If somebody wants to argue for > > 16 > > >> >> bits > > >> >> >>> or > > >> >> >>> > 64 > > >> >> >>> > >> > bits of > > >> >> >>> > >> > header key space I would listen. > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > Discussion: > > >> >> >>> > >> > Dividing the key space into sub_key_1 and sub_key_2 > > >> makes no > > >> >> >>> > sense to > > >> >> >>> > >> > me at > > >> >> >>> > >> > this layer. Are we going to start providing APIs to > > get > > >> all > > >> >> >>> the > > >> >> >>> > >> > sub_key_1s? or all the sub_key_2s? If there is no > > >> >> >>> distinguishing > > >> >> >>> > >> > functions > > >> >> >>> > >> > that are applied to each one then they should be a > > single > > >> >> >>> value. > > >> >> >>> > At > > >> >> >>> > >> > this > > >> >> >>> > >> > layer all we're doing is equality. > > >> >> >>> > >> > If the above layer wants to interpret this as 2, 3 or > > >> more > > >> >> >>> values > > >> >> >>> > >> > that's a > > >> >> >>> > >> > different question. I personally think it's all one > > >> >> keyspace > > >> >> >>> > that is > > >> >> >>> > >> > getting assigned using some structure, but if you > > want to > > >> >> >>> > sub-assign > > >> >> >>> > >> > parts > > >> >> >>> > >> > of it then that's fine. > > >> >> >>> > >> > > > >> >> >>> > >> > The same discussion applies to strings. If somebody > > >> argued > > >> >> for > > >> >> >>> > >> > strings, > > >> >> >>> > >> > would we be arguing to divide the strings with dots > > ('.') > > >> >> as a > > >> >> >>> > >> > requirement? > > >> >> >>> > >> > Would we want them to give us the different name > > segments > > >> >> >>> > separately? > > >> >> >>> > >> > Would we be performing any actions on this key other > > than > > >> >> >>> > matching? > > >> >> >>> > >> > > > >> >> >>> > >> > Nacho > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > > > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce < > > >> >> >>> > >> michael.pea...@ig.com > > >> >> >>> > >> > > > > >> >> >>> > >> > wrote: > > >> >> >>> > >> > > > >> >> >>> > >> > > #jay #jun any concerns on 1 and 2 still? > > >> >> >>> > >> > > > > >> >> >>> > >> > > @all > > >> >> >>> > >> > > To get this moving along a bit more I'd also like to > > >> ask > > >> >> to > > >> >> >>> get > > >> >> >>> > >> > clarity on > > >> >> >>> > >> > > the below last points: > > >> >> >>> > >> > > > > >> >> >>> > >> > > 3) I believe we're all roughly happy with the header > > >> value > > >> >> >>> > being a > > >> >> >>> > >> > byte[]? > > >> >> >>> > >> > > > > >> >> >>> > >> > > 4) I believe consensus has been for an namespace > > based > > >> int > > >> >> >>> > approach > > >> >> >>> > >> > > {int,int} for the key. Any objections if this is > > what > > >> we > > >> >> go > > >> >> >>> > with? > > >> >> >>> > >> > > > > >> >> >>> > >> > > 5) as we have if assumption in (4) is correct, > > >> {int,int} > > >> >> >>> keys. > > >> >> >>> > >> > > Should both int's be int16 or int32? > > >> >> >>> > >> > > I'm for them being int16(2 bytes) as combined is > > space > > >> of > > >> >> >>> > 4bytes as > > >> >> >>> > >> > per > > >> >> >>> > >> > > original and gives plenty of combinations for the > > >> >> >>> foreseeable, > > >> >> >>> > and > > >> >> >>> > >> > keeps > > >> >> >>> > >> > > the overhead small. > > >> >> >>> > >> > > > > >> >> >>> > >> > > Do we see any benefit in another kip call to discuss > > >> >> these at > > >> >> >>> > all? > > >> >> >>> > >> > > > > >> >> >>> > >> > > Cheers > > >> >> >>> > >> > > Mike > > >> >> >>> > >> > > ________________________________________ > > >> >> >>> > >> > > From: K Burstev <k.burs...@yandex.com> > > >> >> >>> > >> > > Sent: Friday, November 18, 2016 7:07:07 AM > > >> >> >>> > >> > > To: dev@kafka.apache.org > > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers > > >> >> >>> > >> > > > > >> >> >>> > >> > > For what it is worth also i agree. As a user: > > >> >> >>> > >> > > > > >> >> >>> > >> > > 1) Yes - Headers are worthwhile > > >> >> >>> > >> > > 2) Yes - Headers should be a top level option > > >> >> >>> > >> > > > > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio Solis" <iso...@igso.net > > >: > > >> >> >>> > >> > > > 1) Yes - Headers are worthwhile > > >> >> >>> > >> > > > 2) Yes - Headers should be a top level option > > >> >> >>> > >> > > > > > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce < > > >> >> >>> > >> > michael.pea...@ig.com> > > >> >> >>> > >> > > > wrote: > > >> >> >>> > >> > > > > > >> >> >>> > >> > > >> Hi Roger, > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> The kip details/examples the original proposal > > for > > >> key > > >> >> >>> > spacing > > >> >> >>> > >> , > > >> >> >>> > >> > not > > >> >> >>> > >> > > the > > >> >> >>> > >> > > >> new mentioned as per discussion namespace idea. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> We will need to update the kip, when we get > > >> agreement > > >> >> >>> this > > >> >> >>> > is a > > >> >> >>> > >> > better > > >> >> >>> > >> > > >> approach (which seems to be the case if I have > > >> >> understood > > >> >> >>> > the > > >> >> >>> > >> > general > > >> >> >>> > >> > > >> feeling in the conversation) > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Re the variable ints, at very early stage we did > > >> think > > >> >> >>> about > > >> >> >>> > >> > this. I > > >> >> >>> > >> > > think > > >> >> >>> > >> > > >> the added complexity for the saving isn't worth > > it. > > >> >> I'd > > >> >> >>> > rather > > >> >> >>> > >> go > > >> >> >>> > >> > > with, if > > >> >> >>> > >> > > >> we want to reduce overheads and size int16 > > (2bytes) > > >> >> keys > > >> >> >>> as > > >> >> >>> > it > > >> >> >>> > >> > keeps it > > >> >> >>> > >> > > >> simple. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> On the note of no headers, there is as per the > > kip > > >> as > > >> >> we > > >> >> >>> > use an > > >> >> >>> > >> > > attribute > > >> >> >>> > >> > > >> bit to denote if headers are present or not as > > such > > >> >> >>> > provides a > > >> >> >>> > >> > zero > > >> >> >>> > >> > > >> overhead currently if headers are not used. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> I think as radai mentions would be good first > > if we > > >> >> can > > >> >> >>> get > > >> >> >>> > >> > clarity if > > >> >> >>> > >> > > do > > >> >> >>> > >> > > >> we now have general consensus that (1) headers > > are > > >> >> >>> > worthwhile > > >> >> >>> > >> and > > >> >> >>> > >> > > useful, > > >> >> >>> > >> > > >> and (2) we want it as a top level entity. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Just to state the obvious i believe (1) headers > > are > > >> >> >>> > worthwhile > > >> >> >>> > >> > and (2) > > >> >> >>> > >> > > >> agree as a top level entity. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Cheers > > >> >> >>> > >> > > >> Mike > > >> >> >>> > >> > > >> ________________________________________ > > >> >> >>> > >> > > >> From: Roger Hoover <roger.hoo...@gmail.com> > > >> >> >>> > >> > > >> Sent: Wednesday, November 9, 2016 9:10:47 PM > > >> >> >>> > >> > > >> To: dev@kafka.apache.org > > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] KIP-82 - Add Record > > Headers > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Sorry for going a little in the weeds but thanks > > >> for > > >> >> the > > >> >> >>> > >> replies > > >> >> >>> > >> > > regarding > > >> >> >>> > >> > > >> varint. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Agreed that a prefix and {int, int} can be the > > >> same. > > >> >> It > > >> >> >>> > doesn't > > >> >> >>> > >> > look > > >> >> >>> > >> > > like > > >> >> >>> > >> > > >> that's what the KIP is saying the "Open" > > section. > > >> The > > >> >> >>> > example > > >> >> >>> > >> > shows > > >> >> >>> > >> > > >> 2100001 > > >> >> >>> > >> > > >> for New Relic and 210002 for App Dynamics > > implying > > >> >> that > > >> >> >>> the > > >> >> >>> > New > > >> >> >>> > >> > Relic > > >> >> >>> > >> > > >> organization will have only a single header id > > to > > >> work > > >> >> >>> > with. Or > > >> >> >>> > >> > is > > >> >> >>> > >> > > 2100001 > > >> >> >>> > >> > > >> a prefix? The main point of a namespace or > > prefix > > >> is > > >> >> to > > >> >> >>> > reduce > > >> >> >>> > >> > the > > >> >> >>> > >> > > >> overhead of config mapping or registration > > >> depending > > >> >> on > > >> >> >>> how > > >> >> >>> > >> > > >> namespaces/prefixes are managed. > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Would love to hear more feedback on the > > >> higher-level > > >> >> >>> > questions > > >> >> >>> > >> > > though... > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Cheers, > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> Roger > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at 11:38 AM, radai < > > >> >> >>> > >> > radai.rosenbl...@gmail.com> > > >> >> >>> > >> > > wrote: > > >> >> >>> > >> > > >> > > >> >> >>> > >> > > >> > I think this discussion is getting a bit into > > the > > >> >> >>> weeds on > > >> >> >>> > >> > technical > > >> >> >>> > >> > > >> > implementation details. > > >> >> >>> > >> > > >> > I'd liek to step back a minute and try and > > >> establish > > >> >> >>> > where we > > >> >> >>> > >> > are in > > >> >> >>> > >> > > the > > >> >> >>> > >> > > >> > larger picture: > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > (re-wording nacho's last paragraph) > > >> >> >>> > >> > > >> > 1. are we all in agreement that headers are a > > >> >> >>> worthwhile > > >> >> >>> > and > > >> >> >>> > >> > useful > > >> >> >>> > >> > > >> > addition to have? this was contested early on > > >> >> >>> > >> > > >> > 2. are we all in agreement on headers as top > > >> level > > >> >> >>> entity > > >> >> >>> > vs > > >> >> >>> > >> > headers > > >> >> >>> > >> > > >> > squirreled-away in V? > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > if there are still concerns around these #2 > > >> points > > >> >> >>> (#jay? > > >> >> >>> > >> > #jun?)? > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > (and now back to our normal programming ...) > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > varints are nice. having said that, its adding > > >> >> >>> complexity > > >> >> >>> > >> (see > > >> >> >>> > >> > > >> > https://github.com/addthis/ > <https://github.com/addthis/> > > >> >> stream-lib/blob/master/src/ > > >> >> >>> > >> > > >> > main/java/com/clearspring/ > > >> >> analytics/util/Varint.java > > >> >> >>> > >> > > >> > as 1st google result) and would require anyone > > >> >> writing > > >> >> >>> > other > > >> >> >>> > >> > clients > > >> >> >>> > >> > > (C? > > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) to get/implement the > > >> same, > > >> >> and > > >> >> >>> for > > >> >> >>> > >> > relatively > > >> >> >>> > >> > > >> > little gain (int vs string is order of > > magnitude, > > >> >> this > > >> >> >>> > isnt). > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > int namespacing vs {int, int} namespacing are > > >> >> basically > > >> >> >>> > the > > >> >> >>> > >> > same > > >> >> >>> > >> > > thing - > > >> >> >>> > >> > > >> > youre just namespacing an int64 and giving > > people > > >> >> while > > >> >> >>> > 2^32 > > >> >> >>> > >> > ranges > > >> >> >>> > >> > > at a > > >> >> >>> > >> > > >> > time. the part i like about this is letting > > >> people > > >> >> >>> have a > > >> >> >>> > >> large > > >> >> >>> > >> > > swath of > > >> >> >>> > >> > > >> > numbers with one registration so they dont > > have > > >> to > > >> >> come > > >> >> >>> > back > > >> >> >>> > >> > for > > >> >> >>> > >> > > every > > >> >> >>> > >> > > >> > single plugin/header they want to "reserve". > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover > > < > > >> >> >>> > >> > > roger.hoo...@gmail.com> > > >> >> >>> > >> > > >> > wrote: > > >> >> >>> > >> > > >> > > > >> >> >>> > >> > > >> > > Since some of the debate has been about > > >> overhead + > > >> >> >>> > >> > performance, I'm > > >> >> >>> > >> > > >> > > wondering if we have considered a varint > > >> encoding > > >> >> ( > > >> >> >>> > >> > > >> > > https://developers.google.com/ > <https://developers.google.com/> > > >> >> protocol-buffers/docs/ > > >> >> >>> > >> > > encoding#varints) > > >> >> >>> > >> > > >> > for > > >> >> >>> > >> > > >> > > the header length field (int32 in the > > proposal) > > >> >> and > > >> >> >>> for > > >> >> >>> > >> > header > > >> >> >>> > >> > > ids? If > > >> >> >>> > >> > > >> > you > > >> >> >>> > >> > > >> > > don't use headers, the overhead would be a > > >> single > > >> >> >>> byte > > >> >> >>> > and > > >> >> >>> > >> > for each > > >> >> >>> > >> > > >> > header > > >> >> >>> > >> > > >> > > id < 128 would also need only a single byte? > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at 6:43 AM, radai < > > >> >> >>> > >> > radai.rosenbl...@gmail.com> > > >> >> >>> > >> > > >> > wrote: > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > > @magnus - and very dangerous (youre > > >> essentially > > >> >> >>> > >> > downloading and > > >> >> >>> > >> > > >> > executing > > >> >> >>> > >> > > >> > > > arbitrary code off the internet on your > > >> servers > > >> >> ... > > >> >> >>> > bad > > >> >> >>> > >> > idea > > >> >> >>> > >> > > without > > >> >> >>> > >> > > >> a > > >> >> >>> > >> > > >> > > > sandbox, even with) > > >> >> >>> > >> > > >> > > > > > >> >> >>> > >> > > >> > > > as for it being a purely administrative > > task > > >> - i > > >> >> >>> > >> disagree. > > >> >> >>> > >> > > >> > > > > > >> >> >>> > >> > > >> > > > i wish it would, really, because then my > > >> earlier > > >> >> >>> > point on > > >> >> >>> > >> > the > > >> >> >>> > >> > > >> > complexity > > >> >> >>> > >> > > >> > > of > > >> >> >>> > >> > > >> > > > the remapping process would be invalid, > > but > > >> at > > >> >> >>> > linkedin, > > >> >> >>> > >> > for > > >> >> >>> > >> > > example, > > >> >> >>> > >> > > >> > we > > >> >> >>> > >> > > >> > > > (the team im in) run kafka as a service. > > we > > >> dont > > >> >> >>> > really > > >> >> >>> > >> > know > > >> >> >>> > >> > > what our > > >> >> >>> > >> > > >> > > users > > >> >> >>> > >> > > >> > > > (developing applications that use kafka) > > are > > >> up > > >> >> to > > >> >> >>> at > > >> >> >>> > any > > >> >> >>> > >> > given > > >> >> >>> > >> > > >> moment. > > >> >> >>> > >> > > >> > > it > > >> >> >>> > >> > > >> > > > is very possible (given the existance of > > >> headers > > >> >> >>> and a > > >> >> >>> > >> > > corresponding > > >> >> >>> > >> > > >> > > plugin > > >> >> >>> > >> > > >> > > > ecosystem) for some application to "equip" > > >> their > > >> >> >>> > >> producers > > >> >> >>> > >> > and > > >> >> >>> > >> > > >> > consumers > > >> >> >>> > >> > > >> > > > with the required plugin without us > > knowing. > > >> i > > >> >> dont > > >> >> >>> > mean > > >> >> >>> > >> > to imply > > >> >> >>> > >> > > >> thats > > >> >> >>> > >> > > >> > > > bad, i just want to make the point that > > its > > >> not > > >> >> as > > >> >> >>> > simple > > >> >> >>> > >> > > keeping it > > >> >> >>> > >> > > >> in > > >> >> >>> > >> > > >> > > > sync across a large-enough organization. > > >> >> >>> > >> > > >> > > > > > >> >> >>> > >> > > >> > > > > > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus > > >> Edenhill > > >> >> < > > >> >> >>> > >> > > mag...@edenhill.se> > > >> >> >>> > >> > > >> > > > wrote: > > >> >> >>> > >> > > >> > > > > > >> >> >>> > >> > > >> > > > > I think there is a piece missing in the > > >> >> Strings > > >> >> >>> > >> > discussion, > > >> >> >>> > >> > > where > > >> >> >>> > >> > > >> > > > > pro-Stringers > > >> >> >>> > >> > > >> > > > > reason that by providing unique string > > >> >> >>> identifiers > > >> >> >>> > for > > >> >> >>> > >> > each > > >> >> >>> > >> > > header > > >> >> >>> > >> > > >> > > > > everything will just > > >> >> >>> > >> > > >> > > > > magically work for all parts of the > > stream > > >> >> >>> pipeline. > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > But the strings dont mean anything by > > >> >> themselves, > > >> >> >>> > and > > >> >> >>> > >> > while we > > >> >> >>> > >> > > >> could > > >> >> >>> > >> > > >> > > > > probably envision > > >> >> >>> > >> > > >> > > > > some auto plugin loader that downloads, > > >> >> compiles, > > >> >> >>> > links > > >> >> >>> > >> > and > > >> >> >>> > >> > > runs > > >> >> >>> > >> > > >> > > plugins > > >> >> >>> > >> > > >> > > > > on-demand > > >> >> >>> > >> > > >> > > > > as soon as they're seen by a consumer, I > > >> dont > > >> >> >>> really > > >> >> >>> > >> see > > >> >> >>> > >> > a > > >> >> >>> > >> > > use-case > > >> >> >>> > >> > > >> > for > > >> >> >>> > >> > > >> > > > > something > > >> >> >>> > >> > > >> > > > > so dynamic (and fragile) in practice. > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > In the real world an application will be > > >> >> >>> configured > > >> >> >>> > >> with > > >> >> >>> > >> > a set > > >> >> >>> > >> > > of > > >> >> >>> > >> > > >> > > plugins > > >> >> >>> > >> > > >> > > > > to either add (producer) > > >> >> >>> > >> > > >> > > > > or read (consumer) headers. > > >> >> >>> > >> > > >> > > > > This is an administrative task based on > > >> what > > >> >> >>> > features a > > >> >> >>> > >> > client > > >> >> >>> > >> > > >> > > > > needs/provides and results in > > >> >> >>> > >> > > >> > > > > some sort of configuration to enable and > > >> >> >>> configure > > >> >> >>> > the > > >> >> >>> > >> > desired > > >> >> >>> > >> > > >> > plugins. > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > Since this needs to be kept somewhat in > > >> sync > > >> >> >>> across > > >> >> >>> > an > > >> >> >>> > >> > > organisation > > >> >> >>> > >> > > >> > > > (there > > >> >> >>> > >> > > >> > > > > is no point in having producers > > >> >> >>> > >> > > >> > > > > add headers no consumers will read, and > > >> vice > > >> >> >>> versa), > > >> >> >>> > >> the > > >> >> >>> > >> > added > > >> >> >>> > >> > > >> > > complexity > > >> >> >>> > >> > > >> > > > > of assigning an id namespace > > >> >> >>> > >> > > >> > > > > for each plugin as it is being > > configured > > >> >> should > > >> >> >>> be > > >> >> >>> > >> > tolerable. > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > /Magnus > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 GMT+01:00 Michael > > Pearce < > > >> >> >>> > >> > > michael.pea...@ig.com>: > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > > Just following/catching up on what > > seems > > >> to > > >> >> be > > >> >> >>> an > > >> >> >>> > >> > active > > >> >> >>> > >> > > night :) > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > @Radai sorry if it may seem obvious > > but > > >> what > > >> >> >>> does > > >> >> >>> > MD > > >> >> >>> > >> > stand > > >> >> >>> > >> > > for? > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > My take on String vs Int: > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > I will state first I am pro Int (16 or > > >> 32). > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > I do though playing devils advocate > > see a > > >> >> big > > >> >> >>> plus > > >> >> >>> > >> > with the > > >> >> >>> > >> > > >> > argument > > >> >> >>> > >> > > >> > > of > > >> >> >>> > >> > > >> > > > > > String keys, this is around > > integrating > > >> >> into an > > >> >> >>> > >> > existing > > >> >> >>> > >> > > >> > eco-system. > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > As many other systems use String based > > >> >> headers > > >> >> >>> > >> (Flume, > > >> >> >>> > >> > JMS) > > >> >> >>> > >> > > it > > >> >> >>> > >> > > >> > makes > > >> >> >>> > >> > > >> > > > it > > >> >> >>> > >> > > >> > > > > > much easier for these to be > > >> >> >>> > incorporated/integrated > > >> >> >>> > >> > into. > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > How with Int based headers could we > > >> provide > > >> >> a > > >> >> >>> > >> > way/guidence to > > >> >> >>> > >> > > >> make > > >> >> >>> > >> > > >> > > this > > >> >> >>> > >> > > >> > > > > > integration simple / easy with > > transition > > >> >> flows > > >> >> >>> > over > > >> >> >>> > >> to > > >> >> >>> > >> > > kafka? > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > * tough luck buddy you're on your own > > >> >> >>> > >> > > >> > > > > > * simply hash the string into int code > > >> and > > >> >> hope > > >> >> >>> > for > > >> >> >>> > >> no > > >> >> >>> > >> > > collisions > > >> >> >>> > >> > > >> > > (how > > >> >> >>> > >> > > >> > > > to > > >> >> >>> > >> > > >> > > > > > convert back though?) > > >> >> >>> > >> > > >> > > > > > * http2 style as mentioned by nacho. > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > cheers, > > >> >> >>> > >> > > >> > > > > > Mike > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > ______________________________ > > __________ > > >> >> >>> > >> > > >> > > > > > From: radai < > > radai.rosenbl...@gmail.com> > > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, November 9, 2016 > > 8:12 AM > > >> >> >>> > >> > > >> > > > > > To: dev@kafka.apache.org > > >> >> >>> > >> > > >> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add > > >> Record > > >> >> >>> Headers > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > thinking about it some more, the best > > >> way to > > >> >> >>> > transmit > > >> >> >>> > >> > the > > >> >> >>> > >> > > header > > >> >> >>> > >> > > >> > > > > remapping > > >> >> >>> > >> > > >> > > > > > data to consumers would be to put it > > in > > >> the > > >> >> MD > > >> >> >>> > >> response > > >> >> >>> > >> > > payload, > > >> >> >>> > >> > > >> so > > >> >> >>> > >> > > >> > > > maybe > > >> >> >>> > >> > > >> > > > > > it should be discussed now. > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, 2016 at 12:09 AM, > > radai < > > >> >> >>> > >> > > >> radai.rosenbl...@gmail.com > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > > > wrote: > > >> >> >>> > >> > > >> > > > > > > > >> >> >>> > >> > > >> > > > > > > im not opposed to the idea of > > namespace > > >> >> >>> mapping. > > >> >> >>> > >> all > > >> >> >>> > >> > im > > >> >> >>> > >> > > saying > > >> >> >>> > >> > > >> is > > >> >> >>> > >> > > >> > > > that > > >> >> >>> > >> > > >> > > > > > its > > >> >> >>> > >> > > >> > > > > > > not part of the "mvp" and, since it > > >> >> requires > > >> >> >>> no > > >> >> >>> > >> wire > > >> >> >>> > >> > format > > >> >> >>> > >> > > >> > change, > > >> >> >>> > >> > > >> > > > can > > >> >> >>> > >> > > >> > > > > > > always be added later. > > >> >> >>> > >> > > >> > > > > > > also, its not as simple as just > > >> >> configuring > > >> >> >>> MM > > >> >> >>> > to > > >> >> >>> > >> do > > >> >> >>> > >> > the > > >> >> >>> > >> > > >> > transform: > > >> >> >>> > >> > > >> > > > > lets > > >> >> >>> > >> > > >> > > > > > > say i've implemented large message > > >> >> support as > > >> >> >>> > >> > {666,1} and > > >> >> >>> > >> > > on > > >> >> >>> > >> > > >> some > > >> >> >>> > >> > > >> > > > > mirror > > >> >> >>> > >> > > >> > > > > > > target cluster its been remapped to > > >> >> {999,1}. > > >> >> >>> the > > >> >> >>> > >> > consumer > > >> >> >>> > >> > > >> plugin > > >> >> >>> > >> > > >> > > code > > >> >> >>> > >> > > >> > > > > > would > > >> >> >>> > >> > > >> > > > > > > also need to be told to look for the > > >> large > > >> >> >>> > message > > >> >> >>> > >> > "part X > > >> >> >>> > >> > > of > > >> >> >>> > >> > > >> Y" > > >> >> >>> > >> > > >> > > > header > > >> >> >>> > >> > > >> > > > > > > under {999,1}. doable, but tricky. > > >> >> >>> > >> > > >> > > > > > > > > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, > > Gwen > > >> >> >>> Shapira < > > >> >> >>> > >> > > >> g...@confluent.io > > >> >> >>> > >> > > >> > > > > >> >> >>> > >> > > >> > > > > wrote: > > >> >> >>> > >> > > >> > > > > > > > > >> >> >>> > >> > > >> > > > > > >> While you can do whatever you want > > >> with a > > >> >> >>> > >> namespace > > >> >> >>> > >> > and > > >> >> >>> > >> > > your > > >> >> >>> > >> > > >> > code, > > >> >> >>> > >> > > >> > > > > > >> what I'd expect is for each app to > > >> >> >>> namespaces > > >> >> >>> > >> > > configurable... > > >> >> >>> > >> > > >> > > > > > >> > > >> >> >>> > >> > > >> > > > > > >> So if I accidentally used 666 for > > my > > >> HR > > >> >> >>> > >> department, > > >> >> >>> > >> > and > > >> >> >>> > >> > > still > > >> >> >>> > >> > > >> > want > > >> >> >>> > >> > > >> > > > to > > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, I can config > > >> "namespace=42" > > >> >> >>> for > > >> >> >>> > >> > RadaiApp and > > >> >> >>> > >> > > >> > > > everything > > >> >> >>> > >> > > >> > > > > > >> will look normal. > > >> >> >>> > >> > > >> > > > > > >> > > >> >> >>> > >> > > >> > > > > > >> This means you only need to sync > > usage > > >> >> >>> inside > > >> >> >>> > your > > >> >> >>> > >> > own > > >> >> >>> > >> > > >> > > organization. > > >> >> >>> > >> > > >> > > > > > >> Still hard, but somewhat easier > > than > > >> >> syncing > > >> >> >>> > with > > >> >> >>> > >> > the > > >> >> >>> > >> > > entire > > >> >> >>> > >> > > >> > > world. > > >> >> >>> > >> > > >> > > > > > >> > > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, > > >> radai < > > >> >> >>> > >> > > >> > > radai.rosenbl...@gmail.com> > > >> >> >>> > >> > > >> > > > > > >> wrote: > > >> >> >>> > >> > > >> > > > > > >> > and we can start with {namespace, > > >> id} > > >> >> and > > >> >> >>> no > > >> >> >>> > >> > re-mapping > > >> >> >>> > >> > > >> > support > > >> >> >>> > >> > > >> > > > and > > >> >> >>> > >> > > >> > > > > > >> always > > >> >> >>> > >> > > >> > > > > > >> > add it later on if/when > > collisions > > >> >> >>> actually > > >> >> >>> > >> > happen (i > > >> >> >>> > >> > > dont > > >> >> >>> > >> > > >> > think > > >> >> >>> > >> > > >> > > > > > they'd > > >> >> >>> > >> > > >> > > > > > >> be > > >> >> >>> > >> > > >> > > > > > >> > a problem). > > >> >> >>> > >> > > >> > > > > > >> > > > >> >> >>> > >> > > >> > > > > > >> > every interested party (so orgs > > or > > >> >> >>> > individuals) > > >> >> >>> > >> > could > > >> >> >>> > >> > > then > > >> >> >>> > >> > > >> > > > register > > >> >> >>> > >> > > >> > > > > a > > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = reserved, 1 = > > confluent > > >> ... > > >> >> >>> 666 > > >> >> >>> > = me > > >> >> >>> > >> > :-) ) > > >> >> >>> > >> > > and > > >> >> >>> > >> > > >> do > > >> >> >>> > >> > > >> > > > > whatever > > >> >> >>> > >> > > >> > > > > > >> with > > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID - so once linkedin > > >> >> registers, > > >> >> >>> say > > >> >> >>> > 3, > > >> >> >>> > >> > then > > >> >> >>> > >> > > >> linkedin > > >> >> >>> > >> > > >> > > devs > > >> >> >>> > >> > > >> > > > > are > > >> >> >>> > >> > > >> > > > > > >> free > > >> >> >>> > >> > > >> > > > > > >> > to use {3, *} with a reasonable > > >> >> >>> expectation > > >> >> >>> > to > > >> >> >>> > >> to > > >> >> >>> > >> > > collide > > >> >> >>> > >> > > >> with > > >> >> >>> > >> > > >> > > > > > anything > > >> >> >>> > >> > > >> > > > > > >> > else. further partitioning of > > that * > > >> >> >>> becomes > > >> >> >>> > >> > linkedin's > > >> >> >>> > >> > > >> > problem, > > >> >> >>> > >> > > >> > > > but > > >> >> >>> > >> > > >> > > > > > the > > >> >> >>> > >> > > >> > > > > > >> > "upstream registration" of a > > >> namespace > > >> >> >>> only > > >> >> >>> > has > > >> >> >>> > >> to > > >> >> >>> > >> > > happen > > >> >> >>> > >> > > >> > once. > > >> >> >>> > >> > > >> > > > > > >> > > > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, > > >> James > > >> >> >>> Cheng < > > >> >> >>> > >> > > >> > > wushuja...@gmail.com > > >> >> >>> > >> > > >> > > > > > > >> >> >>> > >> > > >> > > > > > >> wrote: > > >> >> >>> > >> > > >> > > > > > >> > > > >> >> >>> > >> > > >> > > > > > >> >> > > >> >> >>> > >> > > >> > > > > > >> >> > > >> >> >>> > >> > > >> > > > > > >> >> > > >> >> >>> > >> > > >> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, > > Gwen > > >> >> >>> Shapira < > > >> >> >>> > >> > > >> > g...@confluent.io> > > >> >> >>> > >> > > >> > > > > > wrote: > > >> >> >>> > >> > > >> > > > > > >> >> > > > >> >> >>> > >> > > >> > > > > > >> >> > Thank you so much for this > > clear > > >> and > > >> >> >>> fair > > >> >> >>> > >> > summary of > > >> >> >>> > >> > > the > > >> >> >>> > >> > > >> > > > > arguments. > > >> >> >>> > >> > > >> > > > > > >> >> > > > >> >> >>> > >> > > >> > > > > > >> >> > I'm in favor of ints. Not a > > >> >> >>> deal-breaker, > > >> >> >>> > but > > >> >> >>> > >> > in > > >> >> >>> > >> > > favor. > > >> >> >>> > >> > > >> > > > > > >> >> > > > >> >> >>> > >> > > >> > > > > > >> >> > Even more in favor of Magnus's > > >> >> >>> > decentralized > > >> >> >>> > >> > > suggestion > > >> >> >>> > >> > > >> > with > > >> >> >>> > >> > > >> > > > > > Roger's > > >> >> >>> > >> > > >> > > > > > >> >> > tweak: add a namespace for > > >> headers. > > >> >> >>> This > > >> >> >> >