@Matthias - oh. I think over the course of this thread enough use cases have been presented for things that can be done/solved with headers that even if every single potential use case has a better custom implementation (which I dont believe) headers are clearly one of the best possible kafka modifications in terms of "bang for your buck"/ROI
On Thu, Dec 15, 2016 at 5:08 PM, Jun Rao <j...@confluent.io> wrote: > Hi, Michael, > > Thanks for the response. > > 100. Is there any other metadata associated with the uuid that APM sends to > the central coordinator? What kind of things could you do once the tracing > is embedded in each message? > > 103. How do you preserve the per key ordering when switching to a different > DC at IG? Are you doing 2-way mirroring? > > 105. Got it. So, you don't need to use headers for encryption itself. But > if there is another use case for headers, it's hard to put that info into > the encrypted payload. > > 106. Embedding all metadata instead of just the producer id per message is > likely more verbose, right? Similarly, in 100 above, only a uuid is > embedded in each message. > > 107. Yes, this kind of UUID is proposed KIP-98 for deduping. > > Jun > > On Thu, Dec 8, 2016 at 12:12 AM, Michael Pearce <michael.pea...@ig.com> > wrote: > > > Hi Jun > > > > 100) each time a transaction exits a jvm for a remote system (HTTP/JMS/ > > Hopefully one day kafka) the APM tools stich in a unique id (though I > > believe it contains the end2end uuid embedded in this id), on receiving > the > > message at the receiving JVM the apm code takes this out, and continues > its > > tracing on the that new thread. Both JVM’s (and other languages the APM > > tool supports) send this data async back to the central controllers where > > the stiching togeather occurs. For this they need some header space for > > them to put this id. > > > > 101) Yes indeed we have a business transaction Id in the payload. Though > > this is a system level tracing, that we need to have marry up. Also as > per > > note on end2end encryption we’d be unable to prove the flow if the > payload > > is encrypted as we’d not have access to this at certain points of the > flow > > through the infrastructure/platform. > > > > > > 103) As said we use this mechanism in IG very successfully, as stated per > > key we guarantee the transaction producing app to handle the transaction > of > > a key at one DC unless at point of critical failure where we have to flip > > processing to another. We care about key ordering. > > I disagree on the offset comment for the partition solution unless you do > > full ISR, or expensive full XA transactions even with partitions you > cannot > > fully guarantee offsets would match. > > > > 105) Very much so, I need to have access at the platform level to the > > other meta data all mentioned, without having to need to have access to > the > > encryption keys of the payload. > > > > 106) > > Techincally yes for AZ/Region/Cluster, but then we’d need to have a > global > > producerId register which would be very hard to enforce/ensure is current > > and correct, just to understand the message origins of its > > region/az/cluster for routing. > > The client wrapper version, producerId can be the same, as obviously the > > producer could upgrade its wrapper, as such we need to know what wrapper > > version the message is created with. > > Likewise the IP address, as stated we can have our producer move, where > > its IP would change. > > > > 107) > > UUID is set on the message by interceptors before actual producer > > transport send. This is for platform level message dedupe guarantee, the > > business payload should be agnostic to this. Please see > > https://activemq.apache.org/artemis/docs/1.5.0/duplicate-detection.html > > note this is not touching business payloads. > > > > > > > > On 06/12/2016, 18:22, "Jun Rao" <j...@confluent.io> wrote: > > > > Hi, Michael, > > > > Thanks for the reply. I find it very helpful. > > > > Data lineage: > > 100. I'd like to understand the APM use case a bit more. It sounds > like > > that those APM plugins can generate a transaction id that we could > > potentially put in the header of every message. How would you > typically > > make use of such transaction ids? Are there other metadata associated > > with > > the transaction id and if so, how are they propagated downstream? > > > > 101. For the finance use case, if the concept of transaction is > > important, > > wouldn't it be typically included in the message payload instead of > as > > an > > optional header field? > > > > 102. The data lineage that Altas and Navigator support seems to be at > > the > > dataset level, not per record level? So, not sure if per message > > headers > > are relevant there. > > > > Mirroring: > > 103. The benefit of using separate partitions is that it potentially > > makes > > it easy to preserve offsets during mirroring. This will make it > easier > > for > > consumer to switch clusters. Currently, the consumers can switch > > clusters > > by using the timestampToOffset() api, but it has to deal with > > duplicates. > > Good point on the issue with log compact and I am not sure how to > > address > > this. However, even if we mirror into the existing partitions, the > > ordering > > for messages generated from different clusters seems > non-deterministic > > anyway. So, it seems that the consumers already have to deal with > > that? If > > a topic is compacted, does that mean which messages are preserved is > > also > > non-deterministic across clusters? > > > > 104. Good point on partition key. > > > > End-to-end encryption: > > 105. So, it seems end-to-end encryption is useful. Are headers useful > > there? > > > > Auditing: > > 106. It seems other than the UUID, all other metadata are per > producer? > > > > EOS: > > 107. How are those UUIDs generated? I am not sure if they can be > > generated > > in the producer library. An application may send messages through a > > load > > balancer and on retry, the same message could be routed to a > different > > producer instance. So, it seems that the application has to generate > > the > > UUIDs. In that case, shouldn't the application just put the UUID in > the > > payload? > > > > Thanks, > > > > Jun > > > > > > On Fri, Dec 2, 2016 at 4:57 PM, Michael Pearce < > michael.pea...@ig.com> > > wrote: > > > > > Hi Jun. > > > > > > Per Transaction Tracing / Data Lineage. > > > > > > As Stated in the KIP this has the first use case of how many APM > > tools now > > > work. > > > I would find it impossible for any one to argue this is not > > important or a > > > niche market as it has its own gartner report for this space. Such > > > companies as Appdynamics, NewRelic, Dynatrace, Hawqular are but a > > few. > > > > > > Likewise these APM tools can help very rapidly track down issues > and > > > automatically capture metrics, perform actions based on unexpected > > behavior > > > to auto recover services. > > > > > > Before mentioning looking at aggregated stats, in these cases where > > > actually on critical flows we cannot afford to have aggregated > > rolled up > > > stats only. > > > > > > With the APM tool we use its actually able to detect a single > > transaction > > > failure and capture the thread traces in the JVM where it failed > and > > > everything for us, to the point it sends us alerts where we have > this > > > giving the line number of the code that caused it, the transaction > > trace > > > through all the services and endpoints (supported) upto the point > of > > > failure, it can also capture the data in and out (so we can > replay). > > > Because atm Kafka doesn’t support us being able to stich in these > > tracing > > > transaction ids natively, we cannot get these benefits as such is > > limiting > > > our ability support apps and monitor them to the same standards we > > come to > > > expect when on a kafka flow. > > > > > > This actually ties in with Data Lineage, as the same tracing can be > > used > > > to back stich this. Essentially many times due to the sums of money > > > involved there are disputes, and typically as a financial institute > > the > > > easiest and cleanest way to prove when disputes arise is to present > > the > > > actual flow and processes involved in a transaction. > > > > > > Likewise as Hadoop matures its evident this case is important, as > > tools > > > such as Atlas (Hortonworks led) and Navigator (cloudera led) are > > evident > > > also I believe the importance here is very much NOT just a > financial > > issue. > > > > > > From a MDM point of view any company wanting to care about Data > > Quality > > > and Data Governance - Data Lineage is a key piece in this puzzle. > > > > > > > > > > > > RE Mirroring, > > > > > > As per the KIP in-fact this is exactly what we do re cluster id, to > > mirror > > > a network of clusters between AZ’s / Regions. We know a transaction > > for a > > > key will be done within a AZ/Region, as such we know the write to > > kafka > > > would be ordered per key. But we need eventual view of that across > > in our > > > other regions/az’s. When we have complete AZ or Region failure we > > know > > > there will be a brief interruption whilst those transactions are > > moved to > > > another region but we expect after it to continue. > > > > > > As mentioned having separate Partions to do this starts to get > > > ugly/complicated for us: > > > how would I do compaction where a key is in two partitions? > > > How do we balance consumers so where multiple partitions with the > > same key > > > goto the same consumer > > > What do you do if cluster 1 has 5 partitions but cluster 20 has 10 > > because > > > its larger kit in our more core DC’s, as such key to partition > > mappings for > > > consumers get even more complicated. > > > What do you do if we add or remove a complete region > > > > > > Where as simple mirror will work we just need to ensure we don’t > > have a > > > cycle which we can do with clusterId. > > > > > > We even have started to look at shortest path mirror routing based > on > > > clusterId, if we also had the region and az info on the originating > > > message, this we have not implemented but some ideas come from > > network > > > routing, and also the dispatcher router in apache qpid. > > > > > > Also we need to have data perimeters e.g. certain data cannot leave > > > certain countries borders. We want this all automated so that at > the > > > platform level without having to touch or look at the business data > > inside > > > we can have headers we can put tags into so that we can ensure this > > doesn’t > > > occur when we mirror. (actually links in to data lineage / tracing > > as again > > > we need to tag messages at a platform level) Examples are we are > not > > > allowed Private customer details to leave Switzerland, yet we need > > those > > > systems integrated. > > > > > > Lastly around mirroring we have a partionKey field, as the key used > > for > > > portioning logic != compaction key all the time but we want to > > preserve it > > > for when we mirror so that if source cluster partition count != > > destination > > > cluster partition count we can honour the same partitioning logic. > > > > > > > > > > > > RE End 2 End encryption > > > > > > As I believe mentioned just before, the solution you mention just > > doesn’t > > > cut the mustard these days with many regulators. An operations > > person with > > > access to the box should not be able to have access to the data. > > Many now > > > actually impose quite literally the implementation expected being > > end2end > > > encryption for certain data (Singapore for us is one that I am most > > aware > > > of). In fact we’re even now needing encrypt the data and store the > > keys in > > > HSM modules. > > > > > > Likewise the performance penalty on encrypting decrypting as you > > produce > > > over wire, then again encrypt decrypt as the data is stored on the > > brokers > > > disks and back again, then again encrypted and decrypted back over > > the wire > > > each time for each consumer all adds up, ignoring this doubling > with > > mirror > > > makers etc. simply encrypting the value once on write by the client > > and > > > again decrypting on consume by the consumer is far more performant, > > but > > > then the routing and platform meta data needs to be separate (thus > > headers) > > > > > > > > > > > > RE Auditing: > > > > > > Our Auditing needs are: > > > Producer Id, > > > Origin Cluster Id that message first produced into > > > Origin AZ – agreed we can derive this if we have cluster id, but it > > makes > > > resolving this for audit reporting a lot easier. > > > Origin Region – agreed we can derive this if we have cluster id, > but > > it > > > makes resolving this for audit reporting a lot easier. > > > Unique Message Identification (this is not the same as transaction > > > tracing) – note offset and partition are not the same, as when we > > mirror or > > > have for what ever system failure duplicate send, > > > Custom Client wrapper version (where organizations have to wrap the > > kafka > > > client for added features) so we know what version of the wrapper > is > > used > > > Producer IP address (in case of clients being in our vm/open stack > > infra > > > where they can move around, producer id will stay the same but this > > would > > > change) > > > > > > > > > > > > RE Once and only once delivery case > > > > > > Using the same Message UUID for auditing we can achieve this quite > > simply. > > > > > > As per how some other brokers do this (cough qpid, artemis) message > > uuid > > > are used to dedupe where message is sent and produced but the > client > > didn’t > > > receive the ack, and there for replays the send, by having a unique > > message > > > id per message, this can be filtered out, on consumers where > message > > > delivery may occur twice for what ever reasons a message uuid can > be > > used > > > to remove duplicates being deliverd , like wise we can do this in > the > > > mirrormakers so if we detect a dupe message we can avoid > replicating > > it. > > > > > > > > > > > > > > > Cheers > > > Mike > > > > > > > > > > > > On 02/12/2016, 22:09, "Jun Rao" <j...@confluent.io> wrote: > > > > > > Since this KIP affects message format, wire protocol, apis, I > > think > > > it's > > > worth spending a bit more time to nail down the concrete use > > cases. It > > > would be bad if we add this feature, but when start > implementing > > it > > > for say > > > mirroring, we then realize that header is not the best > approach. > > > Initially, > > > I thought I was convinced of the use cases of headers and was > > trying to > > > write down a few use cases to convince others. That's when I > > became > > > less > > > certain. For me to be convinced, I just want to see two strong > > use > > > cases > > > (instead of 10 maybe use cases) in the third-party space. The > > reason is > > > that when we discussed the use cases within a company, often it > > ends > > > with > > > "we can't force everyone to use this standard since we may have > > to > > > integrate with third-party tools". > > > > > > At present, I am not sure why headers are useful for things > like > > > schemaId > > > or encryption. In order to do anything useful to the value, one > > needs > > > to > > > know the schemaId or how data is encrypted, but header is > > optional. > > > But, I > > > can be convinced if someone (Radai, Sean, Todd?) provides more > > details > > > on > > > the argument. > > > > > > I am not very sure header is the best approach for mirroring > > either. If > > > someone has thought about this more, I'd be happy to hear. > > > > > > I can see the data lineage use case. I am just not sure how > > widely > > > applicable this is. If someone familiar with this space can > > justify > > > this is > > > a significant use case, say in the finance industry, this would > > be a > > > strong > > > use case. > > > > > > I can see the auditing use case. I am just not sure if a native > > > producer id > > > solves that problem. If there are additional metadata that's > > worth > > > collecting but not covered by the producer id, that would make > > this a > > > strong use case. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Fri, Dec 2, 2016 at 1:41 PM, radai < > > radai.rosenbl...@gmail.com> > > > wrote: > > > > > > > this KIP is about enabling headers, nothing more nothing less > > - so > > > no, > > > > broker-side use of headers is not in the KIP scope. > > > > > > > > obviously though, once you have headers potential use cases > > could > > > include > > > > broker-side header-aware interceptors (which would be the > > topic of > > > other > > > > future KIPs). > > > > > > > > a trivially clear use case (to me) would be using such > > broker-side > > > > interceptors to enforce compliance with organizational > > policies - it > > > would > > > > make our SREs lives much easier if instead of retroactively > > > discovering > > > > "rogue" topics/users those messages would have been rejected > > > up-front. > > > > > > > > the kafka broker code is lacking any such extensibility > support > > > (beyond > > > > maybe authorizer) which is why these use cases were left out > > of the > > > "case > > > > for headers" doc - broker extensibility is a separate > > discussion. > > > > > > > > On Fri, Dec 2, 2016 at 12:59 PM, Gwen Shapira < > > g...@confluent.io> > > > wrote: > > > > > > > > > Woah, I wasn't aware this is something we'll do. It wasn't > > in the > > > KIP, > > > > > right? > > > > > > > > > > I guess we could do it the same way ACLs currently work. > > > > > I had in mind something that will allow admins to apply > > rules to > > > the > > > > > new create/delete/config topic APIs. So Todd can decide to > > reject > > > > > "create topic" requests that ask for more than 40 > > partitions, or > > > > > require exactly 3 replicas, or no more than 50GB partition > > size, > > > etc. > > > > > > > > > > ACLs were added a bit ad-hoc, if we are planning to apply > > more > > > rules > > > > > to requests (and I think we should), we may want a bit more > > generic > > > > > design around that. > > > > > > > > > > On Fri, Dec 2, 2016 at 7:16 AM, radai < > > radai.rosenbl...@gmail.com> > > > > wrote: > > > > > > "wouldn't you be in the business of making sure everyone > > uses > > > them > > > > > > properly?" > > > > > > > > > > > > thats where a broker-side plugin would come handy - any > > incoming > > > > message > > > > > > that does not conform to org policy (read - does not have > > the > > > proper > > > > > > headers) gets thrown out (with an error returned to user) > > > > > > > > > > > > On Thu, Dec 1, 2016 at 8:44 PM, Todd Palino < > > tpal...@gmail.com> > > > wrote: > > > > > > > > > > > >> Come on, I’ve done at least 2 talks on this one :) > > > > > >> > > > > > >> Producing counts to a topic is part of it, but that’s > only > > > part. So > > > > you > > > > > >> count you have 100 messages in topic A. When you mirror > > topic A > > > to > > > > > another > > > > > >> cluster, you have 99 messages. Where was your problem? > Or > > > worse, you > > > > > have > > > > > >> 100 messages, but one producer duplicated messages and > > another > > > one > > > > lost > > > > > >> messages. You need details about where the message came > > from in > > > order > > > > to > > > > > >> pinpoint problems when they happen. Source producer > info, > > where > > > it was > > > > > >> produced into your infrastructure, and when it was > > produced. > > > This > > > > > requires > > > > > >> you to add the information to the message. > > > > > >> > > > > > >> And yes, you still need to maintain your clients. So > > maybe my > > > original > > > > > >> example was not the best. My thoughts on not wanting to > be > > > responsible > > > > > for > > > > > >> message formats stands, because that’s very much > separate > > from > > > the > > > > > client. > > > > > >> As you know, we have our own internal client library > that > > can > > > insert > > > > the > > > > > >> right headers, and right now inserts the right audit > > > information into > > > > > the > > > > > >> message fields. If they exist, and assuming the message > > is Avro > > > > encoded. > > > > > >> What if someone wants to use JSON instead for a good > > reason? > > > What if > > > > > user X > > > > > >> wants to encrypt messages, but user Y does not? > > Maintaining the > > > client > > > > > >> library is still much easier than maintaining the > message > > > formats. > > > > > >> > > > > > >> -Todd > > > > > >> > > > > > >> > > > > > >> On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira < > > g...@confluent.io > > > > > > > > wrote: > > > > > >> > > > > > >> > Based on your last sentence, consider me convinced :) > > > > > >> > > > > > > >> > I get why headers are critical for Mirroring (you need > > tags to > > > > prevent > > > > > >> > loops and sometimes to route messages to the correct > > > destination). > > > > > >> > But why do you need headers to audit? We are auditing > by > > > producing > > > > > >> > counts to a side topic (and I was under the impression > > you do > > > the > > > > > >> > same), so we never need to modify the message. > > > > > >> > > > > > > >> > Another thing - after we added headers, wouldn't you > be > > in the > > > > > >> > business of making sure everyone uses them properly? > > Making > > > sure > > > > > >> > everyone includes the right headers you need, not > using > > the > > > header > > > > > >> > names you intend to use, etc. I don't think the > > "policing" > > > business > > > > > >> > will ever go away. > > > > > >> > > > > > > >> > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino < > > > tpal...@gmail.com> > > > > > wrote: > > > > > >> > > Got it. As an ops guy, I'm not very happy with the > > > workaround. > > > > Avro > > > > > >> means > > > > > >> > > that I have to be concerned with the format of the > > messages > > > in > > > > > order to > > > > > >> > run > > > > > >> > > the infrastructure (audit, mirroring, etc.). That > > means > > > that I > > > > have > > > > > to > > > > > >> > > handle the schemas, and I have to enforce rules > about > > good > > > > formats. > > > > > >> This > > > > > >> > is > > > > > >> > > not something I want to be in the business of, > > because I > > > should be > > > > > able > > > > > >> > to > > > > > >> > > run a service infrastructure without needing to be > in > > the > > > weeds of > > > > > >> > dealing > > > > > >> > > with customer data formats. > > > > > >> > > > > > > > >> > > Trust me, a sizable portion of my support time is > > spent > > > dealing > > > > with > > > > > >> > schema > > > > > >> > > issues. I really would like to get away from that. > > Maybe > > > I'd have > > > > > more > > > > > >> > time > > > > > >> > > for other hobbies. Like writing. ;) > > > > > >> > > > > > > > >> > > -Todd > > > > > >> > > > > > > > >> > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira < > > > g...@confluent.io> > > > > > wrote: > > > > > >> > > > > > > > >> > >> I'm pretty satisfied with the current workarounds > > (Avro > > > container > > > > > >> > >> format), so I'm not too excited about the extra > work > > > required to > > > > do > > > > > >> > >> headers in Kafka. I absolutely don't mind it if you > > do > > > it... > > > > > >> > >> I think the Apache convention for "good idea, but > not > > > willing to > > > > > put > > > > > >> > >> any work toward it" is +0.5? anyway, that's what I > > was > > > trying to > > > > > >> > >> convey :) > > > > > >> > >> > > > > > >> > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino < > > > tpal...@gmail.com> > > > > > >> wrote: > > > > > >> > >> > Well I guess my question for you, then, is what > is > > > holding you > > > > > back > > > > > >> > from > > > > > >> > >> > full support for headers? What’s the bit that > > you’re > > > missing > > > > that > > > > > >> has > > > > > >> > you > > > > > >> > >> > under a full +1? > > > > > >> > >> > > > > > > >> > >> > -Todd > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira < > > > > g...@confluent.io> > > > > > >> > wrote: > > > > > >> > >> > > > > > > >> > >> >> I know why people who support headers support > > them, and > > > I've > > > > > seen > > > > > >> > what > > > > > >> > >> >> the discussion is like. > > > > > >> > >> >> > > > > > >> > >> >> This is why I'm asking people who are against > > headers > > > > > (especially > > > > > >> > >> >> committers) what will make them change their > mind > > - so > > > we can > > > > > get > > > > > >> > this > > > > > >> > >> >> part over one way or another. > > > > > >> > >> >> > > > > > >> > >> >> If I sound frustrated it is not at Radai, Jun or > > you > > > (Todd)... > > > > > I am > > > > > >> > >> >> just looking for something concrete we can do to > > move > > > the > > > > > >> discussion > > > > > >> > >> >> along to the yummy design details (which is the > > > argument I > > > > > really > > > > > >> am > > > > > >> > >> >> looking forward to). > > > > > >> > >> >> > > > > > >> > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino < > > > > tpal...@gmail.com> > > > > > >> > wrote: > > > > > >> > >> >> > So, Gwen, to your question (even though I’m > not > > a > > > > > committer)... > > > > > >> > >> >> > > > > > > >> > >> >> > I have always been a strong supporter of > > introducing > > > the > > > > > concept > > > > > >> > of an > > > > > >> > >> >> > envelope to messages, which headers > > accomplishes. The > > > > message > > > > > key > > > > > >> > is > > > > > >> > >> >> > already an example of a piece of envelope > > > information. By > > > > > >> > providing a > > > > > >> > >> >> means > > > > > >> > >> >> > to do this within Kafka itself, and not > relying > > on > > > use-case > > > > > >> > specific > > > > > >> > >> >> > implementations, you make it much easier for > > > components to > > > > > >> > >> interoperate. > > > > > >> > >> >> It > > > > > >> > >> >> > simplifies development of all these things > > (message > > > routing, > > > > > >> > auditing, > > > > > >> > >> >> > encryption, etc.) because each one does not > > have to > > > reinvent > > > > > the > > > > > >> > >> wheel. > > > > > >> > >> >> > > > > > > >> > >> >> > It also makes it much easier from a client > > point of > > > view if > > > > > the > > > > > >> > >> headers > > > > > >> > >> >> are > > > > > >> > >> >> > defined as part of the protocol and/or message > > format > > > in > > > > > general > > > > > >> > >> because > > > > > >> > >> >> > you can easily produce and consume messages > > without > > > having > > > > to > > > > > >> take > > > > > >> > >> into > > > > > >> > >> >> > account specific cases. For example, I want to > > route > > > > messages, > > > > > >> but > > > > > >> > >> >> client A > > > > > >> > >> >> > doesn’t support the way audit implemented > > headers, and > > > > client > > > > > B > > > > > >> > >> doesn’t > > > > > >> > >> >> > support the way encryption or routing > > implemented > > > headers, > > > > so > > > > > now > > > > > >> > my > > > > > >> > >> >> > application has to create some really fragile > > (my > > > > autocorrect > > > > > >> just > > > > > >> > >> tried > > > > > >> > >> >> to > > > > > >> > >> >> > make that “tragic”, which is probably > > appropriate > > > too) code > > > > to > > > > > >> > strip > > > > > >> > >> >> > everything off, rather than just consuming the > > > messages, > > > > > picking > > > > > >> > out > > > > > >> > >> the > > > > > >> > >> >> 1 > > > > > >> > >> >> > or 2 headers it’s interested in, and > performing > > its > > > > function. > > > > > >> > >> >> > > > > > > >> > >> >> > Honestly, this discussion has been going on > for > > a > > > long time, > > > > > and > > > > > >> > it’s > > > > > >> > >> >> > always “Oh, you came up with 2 use cases, and > > yeah, > > > those > > > > use > > > > > >> cases > > > > > >> > >> are > > > > > >> > >> >> > real things that someone would want to do. > > Here’s an > > > > alternate > > > > > >> way > > > > > >> > to > > > > > >> > >> >> > implement them so let’s not do headers.” If we > > have a > > > few > > > > use > > > > > >> cases > > > > > >> > >> that > > > > > >> > >> >> we > > > > > >> > >> >> > actually came up with, you can be sure that > > over the > > > next > > > > year > > > > > >> > >> there’s a > > > > > >> > >> >> > dozen others that we didn’t think of that > > someone > > > would like > > > > > to > > > > > >> > do. I > > > > > >> > >> >> > really think it’s time to stop rehashing this > > > discussion and > > > > > >> > instead > > > > > >> > >> >> focus > > > > > >> > >> >> > on a workable standard that we can adopt. > > > > > >> > >> >> > > > > > > >> > >> >> > -Todd > > > > > >> > >> >> > > > > > > >> > >> >> > > > > > > >> > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino < > > > > > tpal...@gmail.com> > > > > > >> > >> wrote: > > > > > >> > >> >> > > > > > > >> > >> >> >> C. per message encryption > > > > > >> > >> >> >>> One drawback of this approach is that this > > > significantly > > > > > reduce > > > > > >> > the > > > > > >> > >> >> >>> effectiveness of compression, which happens > > on a > > > set of > > > > > >> > serialized > > > > > >> > >> >> >>> messages. An alternative is to enable SSL > for > > wire > > > > > encryption > > > > > >> and > > > > > >> > >> rely > > > > > >> > >> >> on > > > > > >> > >> >> >>> the storage system (e.g. LUKS) for at rest > > > encryption. > > > > > >> > >> >> >> > > > > > >> > >> >> >> > > > > > >> > >> >> >> Jun, this is not sufficient. While this does > > cover > > > the case > > > > > of > > > > > >> > >> removing > > > > > >> > >> >> a > > > > > >> > >> >> >> drive from the system, it will not satisfy > most > > > compliance > > > > > >> > >> requirements > > > > > >> > >> >> for > > > > > >> > >> >> >> encryption of data as whoever has access to > the > > > broker > > > > itself > > > > > >> > still > > > > > >> > >> has > > > > > >> > >> >> >> access to the unencrypted data. For > end-to-end > > > encryption > > > > you > > > > > >> > need to > > > > > >> > >> >> >> encrypt at the producer, before it enters the > > > system, and > > > > > >> decrypt > > > > > >> > at > > > > > >> > >> the > > > > > >> > >> >> >> consumer, after it exits the system. > > > > > >> > >> >> >> > > > > > >> > >> >> >> -Todd > > > > > >> > >> >> >> > > > > > >> > >> >> >> > > > > > >> > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai < > > > > > >> radai.rosenbl...@gmail.com > > > > > >> > > > > > > > >> > >> >> wrote: > > > > > >> > >> >> >> > > > > > >> > >> >> >>> another big plus of headers in the protocol > > is that > > > it > > > > would > > > > > >> > enable > > > > > >> > >> >> rapid > > > > > >> > >> >> >>> iteration on ideas outside of core kafka and > > would > > > reduce > > > > > the > > > > > >> > >> number of > > > > > >> > >> >> >>> future wire format changes required. > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> a lot of what is currently a KIP represents > > use > > > cases that > > > > > are > > > > > >> > not > > > > > >> > >> 100% > > > > > >> > >> >> >>> relevant to all users, and some of them > > require > > > rather > > > > > invasive > > > > > >> > wire > > > > > >> > >> >> >>> protocol changes. a thing a good recent > > example of > > > this is > > > > > >> > kip-98. > > > > > >> > >> >> >>> tx-utilizing traffic is expected to be a > very > > small > > > > > fraction of > > > > > >> > >> total > > > > > >> > >> >> >>> traffic and yet the changes are invasive. > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> every such wire format change translates > into > > > painful and > > > > > slow > > > > > >> > >> >> adoption of > > > > > >> > >> >> >>> new versions. > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> i think a lot of functionality currently in > > KIPs > > > could be > > > > > "spun > > > > > >> > out" > > > > > >> > >> >> and > > > > > >> > >> >> >>> implemented as opt-in plugins transmitting > > data over > > > > > headers. > > > > > >> > this > > > > > >> > >> >> would > > > > > >> > >> >> >>> keep the core wire format stable(r), core > > codebase > > > > smaller, > > > > > and > > > > > >> > >> avoid > > > > > >> > >> >> the > > > > > >> > >> >> >>> "burden of proof" thats sometimes required > to > > prove > > > a > > > > > certain > > > > > >> > >> feature > > > > > >> > >> >> is > > > > > >> > >> >> >>> useful enough for a wide-enough audience to > > warrant > > > a wire > > > > > >> format > > > > > >> > >> >> change > > > > > >> > >> >> >>> and code complexity additions. > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> (to be clear - kip-98 goes beyond "mere" > wire > > format > > > > changes > > > > > >> and > > > > > >> > im > > > > > >> > >> not > > > > > >> > >> >> >>> saying it could have been completely done > with > > > headers, > > > > but > > > > > >> > >> >> exactly-once > > > > > >> > >> >> >>> delivery certainly could) > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen > Shapira > > < > > > > > >> g...@confluent.io > > > > > >> > > > > > > > >> > >> >> wrote: > > > > > >> > >> >> >>> > > > > > >> > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai < > > > > > >> > >> radai.rosenbl...@gmail.com> > > > > > >> > >> >> >>> wrote: > > > > > >> > >> >> >>> > > "For use cases within an organization, > > one could > > > > always > > > > > use > > > > > >> > >> other > > > > > >> > >> >> >>> > > approaches such as company-wise > > containers" > > > > > >> > >> >> >>> > > this is what linkedin has traditionally > > done > > > but there > > > > > are > > > > > >> > now > > > > > >> > >> >> cases > > > > > >> > >> >> >>> > (read > > > > > >> > >> >> >>> > > - topics) where this is not acceptable. > > this > > > makes > > > > > headers > > > > > >> > >> useful > > > > > >> > >> >> even > > > > > >> > >> >> >>> > > within single orgs for cases where > > > > > one-container-fits-all > > > > > >> > cannot > > > > > >> > >> >> >>> apply. > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > > as for the particular use cases listed, > i > > dont > > > want > > > > > this to > > > > > >> > >> devolve > > > > > >> > >> >> >>> to a > > > > > >> > >> >> >>> > > discussion of particular use cases - i > > think its > > > > enough > > > > > >> that > > > > > >> > >> some > > > > > >> > >> >> of > > > > > >> > >> >> >>> them > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > I think a main point of contention is > that: > > We > > > > identified > > > > > few > > > > > >> > >> >> >>> > use-cases where headers are useful, do we > > want > > > Kafka to > > > > > be a > > > > > >> > >> system > > > > > >> > >> >> >>> > that supports those use-cases? > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > For example, Jun said: > > > > > >> > >> >> >>> > "Not sure how widely useful record-level > > lineage > > > is > > > > though > > > > > >> > since > > > > > >> > >> the > > > > > >> > >> >> >>> > overhead could > > > > > >> > >> >> >>> > be significant." > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > We know NiFi supports record level > lineage. > > I > > > don't > > > > think > > > > > it > > > > > >> > was > > > > > >> > >> >> >>> > developed for lols, I think it is safe to > > assume > > > that > > > > the > > > > > NSA > > > > > >> > >> needed > > > > > >> > >> >> >>> > that functionality. We also know that > > certain > > > financial > > > > > >> > institutes > > > > > >> > >> >> >>> > need to track tampering with records at a > > record > > > level > > > > and > > > > > >> > there > > > > > >> > >> are > > > > > >> > >> >> >>> > federal regulations that absolutely > require > > > this. They > > > > > also > > > > > >> > need > > > > > >> > >> to > > > > > >> > >> >> >>> > prove that routing apps that "touches" the > > > messages and > > > > > >> either > > > > > >> > >> reads > > > > > >> > >> >> >>> > or updates headers couldn't have possibly > > > modified the > > > > > >> payload > > > > > >> > >> >> itself. > > > > > >> > >> >> >>> > They use record level encryption to do > that > > - > > > apps can > > > > > read > > > > > >> and > > > > > >> > >> >> >>> > (sometimes) modify headers but can't touch > > the > > > payload. > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > We can totally say "those are corner cases > > and > > > not worth > > > > > >> adding > > > > > >> > >> >> >>> > headers to Kafka for", they should use a > > different > > > > pubsub > > > > > >> > message > > > > > >> > >> for > > > > > >> > >> >> >>> > that (Nifi or one of the other 1000 that > > cater > > > > > specifically > > > > > >> to > > > > > >> > the > > > > > >> > >> >> >>> > financial industry). > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > But this gets us into a catch 22: > > > > > >> > >> >> >>> > If we discuss a specific use-case, someone > > can > > > always > > > > say > > > > > it > > > > > >> > isn't > > > > > >> > >> >> >>> > interesting enough for Kafka. If we > discuss > > more > > > general > > > > > >> > trends, > > > > > >> > >> >> >>> > others can say "well, we are not sure any > > of them > > > really > > > > > >> needs > > > > > >> > >> >> headers > > > > > >> > >> >> >>> > specifically. This is just hand waving and > > not > > > > > interesting.". > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > I think discussing use-cases in specifics > > is super > > > > > important > > > > > >> to > > > > > >> > >> >> decide > > > > > >> > >> >> >>> > implementation details for headers (my > > use-cases > > > lean > > > > > toward > > > > > >> > >> >> numerical > > > > > >> > >> >> >>> > keys with namespaces and object values, > > others > > > differ), > > > > > but I > > > > > >> > >> think > > > > > >> > >> >> we > > > > > >> > >> >> >>> > need to answer the general "Are we going > to > > have > > > > headers" > > > > > >> > question > > > > > >> > >> >> >>> > first. > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > I'd love to hear from the other committers > > in the > > > > > discussion: > > > > > >> > >> >> >>> > What would it take to convince you that > > headers > > > in Kafka > > > > > are > > > > > >> a > > > > > >> > >> good > > > > > >> > >> >> >>> > idea in general, so we can move ahead and > > try to > > > agree > > > > on > > > > > the > > > > > >> > >> >> details? > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > I feel like we keep moving the goal posts > > and > > > this is > > > > > truly > > > > > >> > >> >> exhausting. > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > For the record, I mildly support adding > > headers > > > to Kafka > > > > > >> > (+0.5?). > > > > > >> > >> >> >>> > The community can continue to find > > workarounds to > > > the > > > > > issue > > > > > >> and > > > > > >> > >> there > > > > > >> > >> >> >>> > are some benefits to keeping the message > > format > > > and > > > > > clients > > > > > >> > >> simpler. > > > > > >> > >> >> >>> > But I see the usefulness of headers to > many > > > use-cases > > > > and > > > > > if > > > > > >> we > > > > > >> > >> can > > > > > >> > >> >> >>> > find a good and generally useful way to > add > > it to > > > Kafka, > > > > > it > > > > > >> > will > > > > > >> > >> make > > > > > >> > >> >> >>> > Kafka easier to use for many - worthy goal > > in my > > > eyes. > > > > > >> > >> >> >>> > > > > > > >> > >> >> >>> > > are interesting/feasible, but: > > > > > >> > >> >> >>> > > A+B. i think there are use cases for > > polyglot > > > topics. > > > > > >> > >> especially if > > > > > >> > >> >> >>> kafka > > > > > >> > >> >> >>> > > is being used to "trunk" something else. > > > > > >> > >> >> >>> > > D. multiple topics would make it harder > > to write > > > > > portable > > > > > >> > >> consumer > > > > > >> > >> >> >>> code. > > > > > >> > >> >> >>> > > partition remapping would mess with > > locality of > > > > > consumption > > > > > >> > >> >> >>> guarantees. > > > > > >> > >> >> >>> > > E+F. a use case I see for > > lineage/metadata is > > > > > >> > >> billing/chargeback. > > > > > >> > >> >> for > > > > > >> > >> >> >>> > that > > > > > >> > >> >> >>> > > use case it is not enough to simply > > record the > > > point > > > > of > > > > > >> > origin, > > > > > >> > >> but > > > > > >> > >> >> >>> every > > > > > >> > >> >> >>> > > replication stop (think mirror maker) > > must also > > > add a > > > > > >> record > > > > > >> > to > > > > > >> > >> >> form a > > > > > >> > >> >> >>> > > "transit log". > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > > as for stream processing on top of kafka > > - i > > > know > > > > samza > > > > > >> has a > > > > > >> > >> >> metadata > > > > > >> > >> >> >>> > map > > > > > >> > >> >> >>> > > which they carry around in addition to > > user > > > values. > > > > > headers > > > > > >> > are > > > > > >> > >> the > > > > > >> > >> >> >>> > perfect > > > > > >> > >> >> >>> > > fit for these things. > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun > Rao < > > > > > j...@confluent.io > > > > > >> > > > > > > >> > >> wrote: > > > > > >> > >> >> >>> > > > > > > > >> > >> >> >>> > >> Hi, Michael, > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> In order to answer the first two > > questions, it > > > would > > > > be > > > > > >> > helpful > > > > > >> > >> >> if we > > > > > >> > >> >> >>> > could > > > > > >> > >> >> >>> > >> identify 1 or 2 strong use cases for > > headers > > > in the > > > > > space > > > > > >> > for > > > > > >> > >> >> >>> > third-party > > > > > >> > >> >> >>> > >> vendors. For use cases within an > > organization, > > > one > > > > > could > > > > > >> > always > > > > > >> > >> >> use > > > > > >> > >> >> >>> > other > > > > > >> > >> >> >>> > >> approaches such as company-wise > > containers to > > > get > > > > > around > > > > > >> w/o > > > > > >> > >> >> >>> headers. I > > > > > >> > >> >> >>> > >> went through the use cases in the KIP > > and in > > > Radai's > > > > > wiki > > > > > >> ( > > > > > >> > >> >> >>> > >> https://cwiki.apache.org/confl > > > uence/display/KAFKA/A+ > > > > > >> > >> >> >>> > Case+for+Kafka+Headers > > > > > >> > >> >> >>> > >> ). > > > > > >> > >> >> >>> > >> The following are the ones that that I > > > understand and > > > > > >> could > > > > > >> > be > > > > > >> > >> in > > > > > >> > >> >> the > > > > > >> > >> >> >>> > >> third-party use case category. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> A. content-type > > > > > >> > >> >> >>> > >> It seems that in general, content-type > > should > > > be set > > > > at > > > > > >> the > > > > > >> > >> topic > > > > > >> > >> >> >>> level. > > > > > >> > >> >> >>> > >> Not sure if mixing messages with > > different > > > content > > > > > types > > > > > >> > >> should be > > > > > >> > >> >> >>> > >> encouraged. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> B. schema id > > > > > >> > >> >> >>> > >> Since the value is mostly useless > without > > > schema id, > > > > it > > > > > >> > seems > > > > > >> > >> that > > > > > >> > >> >> >>> > storing > > > > > >> > >> >> >>> > >> the schema id together with serialized > > bytes > > > in the > > > > > value > > > > > >> is > > > > > >> > >> >> better? > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> C. per message encryption > > > > > >> > >> >> >>> > >> One drawback of this approach is that > > this > > > > > significantly > > > > > >> > reduce > > > > > >> > >> >> the > > > > > >> > >> >> >>> > >> effectiveness of compression, which > > happens on > > > a set > > > > of > > > > > >> > >> serialized > > > > > >> > >> >> >>> > >> messages. An alternative is to enable > > SSL for > > > wire > > > > > >> > encryption > > > > > >> > >> and > > > > > >> > >> >> >>> rely > > > > > >> > >> >> >>> > on > > > > > >> > >> >> >>> > >> the storage system (e.g. LUKS) for at > > rest > > > > encryption. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> D. cluster ID for mirroring across > Kafka > > > clusters > > > > > >> > >> >> >>> > >> This is actually interesting. Today, to > > avoid > > > > > introducing > > > > > >> > >> cycles > > > > > >> > >> >> when > > > > > >> > >> >> >>> > doing > > > > > >> > >> >> >>> > >> mirroring across data centers, one > would > > > either have > > > > to > > > > > >> set > > > > > >> > up > > > > > >> > >> two > > > > > >> > >> >> >>> Kafka > > > > > >> > >> >> >>> > >> clusters (a local and an aggregate) per > > data > > > center > > > > or > > > > > >> > rename > > > > > >> > >> >> topics. > > > > > >> > >> >> >>> > >> Neither is ideal. With headers, the > > producer > > > could > > > > tag > > > > > >> each > > > > > >> > >> >> message > > > > > >> > >> >> >>> with > > > > > >> > >> >> >>> > >> the producing cluster ID in the header. > > > MirrorMaker > > > > > could > > > > > >> > then > > > > > >> > >> >> avoid > > > > > >> > >> >> >>> > >> mirroring messages to a cluster if they > > are > > > tagged > > > > with > > > > > >> the > > > > > >> > >> same > > > > > >> > >> >> >>> cluster > > > > > >> > >> >> >>> > >> id. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> However, an alternative approach is to > > > introduce sth > > > > > like > > > > > >> > >> >> >>> hierarchical > > > > > >> > >> >> >>> > >> topic and store messages from different > > > clusters in > > > > > >> > different > > > > > >> > >> >> >>> partitions > > > > > >> > >> >> >>> > >> under the same topic. This approach > > avoids > > > filtering > > > > > out > > > > > >> > >> unneeded > > > > > >> > >> >> >>> data > > > > > >> > >> >> >>> > and > > > > > >> > >> >> >>> > >> makes offset preserving easier to > > support. It > > > may > > > > make > > > > > >> > >> compaction > > > > > >> > >> >> >>> > trickier > > > > > >> > >> >> >>> > >> though since the same key may show up > in > > > different > > > > > >> > partitions. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> E. record-level lineage > > > > > >> > >> >> >>> > >> For example, a source connector could > > store in > > > the > > > > > message > > > > > >> > the > > > > > >> > >> >> >>> metadata > > > > > >> > >> >> >>> > >> (e.g. UUID) of the source record. > > Similarly, > > > if a > > > > > stream > > > > > >> job > > > > > >> > >> >> >>> transforms > > > > > >> > >> >> >>> > >> messages from topic A to topic B, the > > library > > > could > > > > > >> include > > > > > >> > the > > > > > >> > >> >> >>> source > > > > > >> > >> >> >>> > >> message offset in each of the > transformed > > > message in > > > > > the > > > > > >> > >> header. > > > > > >> > >> >> Not > > > > > >> > >> >> >>> > sure > > > > > >> > >> >> >>> > >> how widely useful record-level lineage > is > > > though > > > > since > > > > > the > > > > > >> > >> >> overhead > > > > > >> > >> >> >>> > could > > > > > >> > >> >> >>> > >> be significant. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> F. auditing metadata > > > > > >> > >> >> >>> > >> We could put things like > > clientId/host/user in > > > the > > > > > header > > > > > >> in > > > > > >> > >> each > > > > > >> > >> >> >>> > message > > > > > >> > >> >> >>> > >> for auditing. These metadata are really > > at the > > > > producer > > > > > >> > level > > > > > >> > >> >> though. > > > > > >> > >> >> >>> > So, a > > > > > >> > >> >> >>> > >> more efficient way is to only include a > > > "producerId" > > > > > per > > > > > >> > >> message > > > > > >> > >> >> and > > > > > >> > >> >> >>> > send > > > > > >> > >> >> >>> > >> the producerId -> metadata mapping > > > independently. > > > > > KIP-98 > > > > > >> is > > > > > >> > >> >> actually > > > > > >> > >> >> >>> > >> proposing including such a producerId > > natively > > > in the > > > > > >> > message. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> So, overall, I not sure that I am fully > > > convinced of > > > > > the > > > > > >> > strong > > > > > >> > >> >> >>> > third-party > > > > > >> > >> >> >>> > >> use cases of headers yet. Perhaps we > > could > > > discuss a > > > > > bit > > > > > >> > more > > > > > >> > >> to > > > > > >> > >> >> make > > > > > >> > >> >> >>> > one > > > > > >> > >> >> >>> > >> or two really convincing use cases. > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> Another orthogonal question is whether > > header > > > should > > > > > be > > > > > >> > >> exposed > > > > > >> > >> >> in > > > > > >> > >> >> >>> > stream > > > > > >> > >> >> >>> > >> processing systems such Kafka stream, > > Samza, > > > and > > > > Spark > > > > > >> > >> streaming. > > > > > >> > >> >> >>> > >> Currently, those systems just deal with > > > key/value > > > > > pairs. > > > > > >> > >> Should we > > > > > >> > >> >> >>> > expose a > > > > > >> > >> >> >>> > >> third thing header there too or somehow > > map > > > header to > > > > > key > > > > > >> or > > > > > >> > >> >> value? > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> Thanks, > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> Jun > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, > Michael > > > Pearce < > > > > > >> > >> >> >>> michael.pea...@ig.com> > > > > > >> > >> >> >>> > >> wrote: > > > > > >> > >> >> >>> > >> > > > > > >> > >> >> >>> > >> > I assume, that after a period of a > > week, > > > that there > > > > > is > > > > > >> no > > > > > >> > >> >> concerns > > > > > >> > >> >> >>> now > > > > > >> > >> >> >>> > >> > with points 1, and 2 and now we have > > > agreement that > > > > > >> > headers > > > > > >> > >> are > > > > > >> > >> >> >>> useful > > > > > >> > >> >> >>> > >> and > > > > > >> > >> >> >>> > >> > needed in Kafka. As such if put to a > > KIP > > > vote, this > > > > > >> > wouldn’t > > > > > >> > >> be > > > > > >> > >> >> a > > > > > >> > >> >> >>> > reason > > > > > >> > >> >> >>> > >> to > > > > > >> > >> >> >>> > >> > reject. > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > @ > > > > > >> > >> >> >>> > >> > Ignacio on point 4). > > > > > >> > >> >> >>> > >> > I think for purpose of getting this > KIP > > > moving past > > > > > >> this, > > > > > >> > we > > > > > >> > >> can > > > > > >> > >> >> >>> state > > > > > >> > >> >> >>> > >> the > > > > > >> > >> >> >>> > >> > key will be a 4 bytes space that can > > will be > > > > > naturally > > > > > >> > >> >> interpreted > > > > > >> > >> >> >>> as > > > > > >> > >> >> >>> > an > > > > > >> > >> >> >>> > >> > Int32 (if namespacing is later wanted > > you can > > > > easily > > > > > >> split > > > > > >> > >> this > > > > > >> > >> >> >>> into > > > > > >> > >> >> >>> > two > > > > > >> > >> >> >>> > >> > int16 spaces), from the wire protocol > > > > implementation > > > > > >> this > > > > > >> > >> makes > > > > > >> > >> >> no > > > > > >> > >> >> >>> > >> > difference I don’t believe. Is this > > > reasonable to > > > > > all? > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > On 5) as per point 4 therefor happy > we > > keep > > > with 32 > > > > > >> bits. > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > On 18/11/2016, 20:34, " > > > ignacio.so...@gmail.com on > > > > > >> behalf > > > > > >> > of > > > > > >> > >> >> >>> Ignacio > > > > > >> > >> >> >>> > >> > Solis" <ignacio.so...@gmail.com on > > behalf of > > > > > >> > iso...@igso.net > > > > > >> > >> > > > > > > >> > >> >> >>> wrote: > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > Summary: > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > 3) Yes - Header value as byte[] > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > 4a) Int,Int - No > > > > > >> > >> >> >>> > >> > 4b) Int - Yes > > > > > >> > >> >> >>> > >> > 4c) String - Reluctant maybe > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > 5) I believe the header system > > should > > > take a > > > > > single > > > > > >> > >> int. I > > > > > >> > >> >> >>> think > > > > > >> > >> >> >>> > >> > 32bits is > > > > > >> > >> >> >>> > >> > a good size, if you want to > > interpret > > > this as > > > > to > > > > > >> 16bit > > > > > >> > >> >> numbers > > > > > >> > >> >> >>> in > > > > > >> > >> >> >>> > the > > > > > >> > >> >> >>> > >> > layer > > > > > >> > >> >> >>> > >> > above go right ahead. If > somebody > > wants > > > to > > > > argue > > > > > >> for > > > > > >> > 16 > > > > > >> > >> >> bits > > > > > >> > >> >> >>> or > > > > > >> > >> >> >>> > 64 > > > > > >> > >> >> >>> > >> > bits of > > > > > >> > >> >> >>> > >> > header key space I would listen. > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > Discussion: > > > > > >> > >> >> >>> > >> > Dividing the key space into > > sub_key_1 and > > > > > sub_key_2 > > > > > >> > >> makes no > > > > > >> > >> >> >>> > sense to > > > > > >> > >> >> >>> > >> > me at > > > > > >> > >> >> >>> > >> > this layer. Are we going to > start > > > providing > > > > > APIs to > > > > > >> > get > > > > > >> > >> all > > > > > >> > >> >> >>> the > > > > > >> > >> >> >>> > >> > sub_key_1s? or all the > > sub_key_2s? If > > > there is > > > > > no > > > > > >> > >> >> >>> distinguishing > > > > > >> > >> >> >>> > >> > functions > > > > > >> > >> >> >>> > >> > that are applied to each one then > > they > > > should > > > > be > > > > > a > > > > > >> > single > > > > > >> > >> >> >>> value. > > > > > >> > >> >> >>> > At > > > > > >> > >> >> >>> > >> > this > > > > > >> > >> >> >>> > >> > layer all we're doing is > equality. > > > > > >> > >> >> >>> > >> > If the above layer wants to > > interpret > > > this as > > > > 2, > > > > > 3 > > > > > >> or > > > > > >> > >> more > > > > > >> > >> >> >>> values > > > > > >> > >> >> >>> > >> > that's a > > > > > >> > >> >> >>> > >> > different question. I personally > > think > > > it's > > > > all > > > > > one > > > > > >> > >> >> keyspace > > > > > >> > >> >> >>> > that is > > > > > >> > >> >> >>> > >> > getting assigned using some > > structure, > > > but if > > > > you > > > > > >> > want to > > > > > >> > >> >> >>> > sub-assign > > > > > >> > >> >> >>> > >> > parts > > > > > >> > >> >> >>> > >> > of it then that's fine. > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > The same discussion applies to > > strings. > > > If > > > > > somebody > > > > > >> > >> argued > > > > > >> > >> >> for > > > > > >> > >> >> >>> > >> > strings, > > > > > >> > >> >> >>> > >> > would we be arguing to divide the > > > strings with > > > > > dots > > > > > >> > ('.') > > > > > >> > >> >> as a > > > > > >> > >> >> >>> > >> > requirement? > > > > > >> > >> >> >>> > >> > Would we want them to give us the > > > different > > > > name > > > > > >> > segments > > > > > >> > >> >> >>> > separately? > > > > > >> > >> >> >>> > >> > Would we be performing any > actions > > on > > > this key > > > > > other > > > > > >> > than > > > > > >> > >> >> >>> > matching? > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > Nacho > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, > > Michael > > > > Pearce < > > > > > >> > >> >> >>> > >> michael.pea...@ig.com > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > wrote: > > > > > >> > >> >> >>> > >> > > > > > > >> > >> >> >>> > >> > > #jay #jun any concerns on 1 > and 2 > > > still? > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > @all > > > > > >> > >> >> >>> > >> > > To get this moving along a bit > > more > > > I'd also > > > > > like > > > > > >> to > > > > > >> > >> ask > > > > > >> > >> >> to > > > > > >> > >> >> >>> get > > > > > >> > >> >> >>> > >> > clarity on > > > > > >> > >> >> >>> > >> > > the below last points: > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > 3) I believe we're all roughly > > happy > > > with the > > > > > >> header > > > > > >> > >> value > > > > > >> > >> >> >>> > being a > > > > > >> > >> >> >>> > >> > byte[]? > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > 4) I believe consensus has been > > for an > > > > > namespace > > > > > >> > based > > > > > >> > >> int > > > > > >> > >> >> >>> > approach > > > > > >> > >> >> >>> > >> > > {int,int} for the key. Any > > objections > > > if this > > > > > is > > > > > >> > what > > > > > >> > >> we > > > > > >> > >> >> go > > > > > >> > >> >> >>> > with? > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > 5) as we have if assumption in > > (4) is > > > > correct, > > > > > >> > >> {int,int} > > > > > >> > >> >> >>> keys. > > > > > >> > >> >> >>> > >> > > Should both int's be int16 or > > int32? > > > > > >> > >> >> >>> > >> > > I'm for them being int16(2 > > bytes) as > > > combined > > > > > is > > > > > >> > space > > > > > >> > >> of > > > > > >> > >> >> >>> > 4bytes as > > > > > >> > >> >> >>> > >> > per > > > > > >> > >> >> >>> > >> > > original and gives plenty of > > > combinations for > > > > > the > > > > > >> > >> >> >>> foreseeable, > > > > > >> > >> >> >>> > and > > > > > >> > >> >> >>> > >> > keeps > > > > > >> > >> >> >>> > >> > > the overhead small. > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > Do we see any benefit in > another > > kip > > > call to > > > > > >> discuss > > > > > >> > >> >> these at > > > > > >> > >> >> >>> > all? > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > Cheers > > > > > >> > >> >> >>> > >> > > Mike > > > > > >> > >> >> >>> > >> > > ______________________________ > > > __________ > > > > > >> > >> >> >>> > >> > > From: K Burstev < > > k.burs...@yandex.com> > > > > > >> > >> >> >>> > >> > > Sent: Friday, November 18, 2016 > > > 7:07:07 AM > > > > > >> > >> >> >>> > >> > > To: dev@kafka.apache.org > > > > > >> > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 - > > Add > > > Record > > > > > Headers > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > For what it is worth also i > > agree. As > > > a user: > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > 1) Yes - Headers are > worthwhile > > > > > >> > >> >> >>> > >> > > 2) Yes - Headers should be a > > top level > > > > option > > > > > >> > >> >> >>> > >> > > > > > > > >> > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio > > Solis" < > > > > > >> iso...@igso.net > > > > > >> > >: > > > > > >> > >> >> >>> > >> > > > 1) Yes - Headers are > worthwhile > > > > > >> > >> >> >>> > >> > > > 2) Yes - Headers should be a > > top > > > level > > > > option > > > > > >> > >> >> >>> > >> > > > > > > > > >> > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 > > AM, > > > Michael > > > > > Pearce > > > > > >> < > > > > > >> > >> >> >>> > >> > michael.pea...@ig.com> > > > > > >> > >> >> >>> > >> > > > wrote: > > > > > >> > >> >> >>> > >> > > > > > > > > >> > >> >> >>> > >> > > >> Hi Roger, > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> The kip details/examples > the > > > original > > > > > proposal > > > > > >> > for > > > > > >> > >> key > > > > > >> > >> >> >>> > spacing > > > > > >> > >> >> >>> > >> , > > > > > >> > >> >> >>> > >> > not > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> new mentioned as per > > discussion > > > namespace > > > > > >> idea. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> We will need to update the > > kip, > > > when we > > > > get > > > > > >> > >> agreement > > > > > >> > >> >> >>> this > > > > > >> > >> >> >>> > is a > > > > > >> > >> >> >>> > >> > better > > > > > >> > >> >> >>> > >> > > >> approach (which seems to be > > the > > > case if I > > > > > have > > > > > >> > >> >> understood > > > > > >> > >> >> >>> > the > > > > > >> > >> >> >>> > >> > general > > > > > >> > >> >> >>> > >> > > >> feeling in the > conversation) > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Re the variable ints, at > very > > > early stage > > > > > we > > > > > >> did > > > > > >> > >> think > > > > > >> > >> >> >>> about > > > > > >> > >> >> >>> > >> > this. I > > > > > >> > >> >> >>> > >> > > think > > > > > >> > >> >> >>> > >> > > >> the added complexity for > the > > > saving isn't > > > > > >> worth > > > > > >> > it. > > > > > >> > >> >> I'd > > > > > >> > >> >> >>> > rather > > > > > >> > >> >> >>> > >> go > > > > > >> > >> >> >>> > >> > > with, if > > > > > >> > >> >> >>> > >> > > >> we want to reduce overheads > > and > > > size > > > > int16 > > > > > >> > (2bytes) > > > > > >> > >> >> keys > > > > > >> > >> >> >>> as > > > > > >> > >> >> >>> > it > > > > > >> > >> >> >>> > >> > keeps it > > > > > >> > >> >> >>> > >> > > >> simple. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> On the note of no headers, > > there > > > is as > > > > per > > > > > the > > > > > >> > kip > > > > > >> > >> as > > > > > >> > >> >> we > > > > > >> > >> >> >>> > use an > > > > > >> > >> >> >>> > >> > > attribute > > > > > >> > >> >> >>> > >> > > >> bit to denote if headers > are > > > present or > > > > > not as > > > > > >> > such > > > > > >> > >> >> >>> > provides a > > > > > >> > >> >> >>> > >> > zero > > > > > >> > >> >> >>> > >> > > >> overhead currently if > > headers are > > > not > > > > used. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> I think as radai mentions > > would be > > > good > > > > > first > > > > > >> > if we > > > > > >> > >> >> can > > > > > >> > >> >> >>> get > > > > > >> > >> >> >>> > >> > clarity if > > > > > >> > >> >> >>> > >> > > do > > > > > >> > >> >> >>> > >> > > >> we now have general > > consensus that > > > (1) > > > > > headers > > > > > >> > are > > > > > >> > >> >> >>> > worthwhile > > > > > >> > >> >> >>> > >> and > > > > > >> > >> >> >>> > >> > > useful, > > > > > >> > >> >> >>> > >> > > >> and (2) we want it as a top > > level > > > entity. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Just to state the obvious i > > > believe (1) > > > > > >> headers > > > > > >> > are > > > > > >> > >> >> >>> > worthwhile > > > > > >> > >> >> >>> > >> > and (2) > > > > > >> > >> >> >>> > >> > > >> agree as a top level > entity. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Cheers > > > > > >> > >> >> >>> > >> > > >> Mike > > > > > >> > >> >> >>> > >> > > >> > > ______________________________ > > > __________ > > > > > >> > >> >> >>> > >> > > >> From: Roger Hoover < > > > > roger.hoo...@gmail.com > > > > > > > > > > > >> > >> >> >>> > >> > > >> Sent: Wednesday, November > 9, > > 2016 > > > 9:10:47 > > > > > PM > > > > > >> > >> >> >>> > >> > > >> To: dev@kafka.apache.org > > > > > >> > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] > > KIP-82 - Add > > > > Record > > > > > >> > Headers > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Sorry for going a little in > > the > > > weeds but > > > > > >> thanks > > > > > >> > >> for > > > > > >> > >> >> the > > > > > >> > >> >> >>> > >> replies > > > > > >> > >> >> >>> > >> > > regarding > > > > > >> > >> >> >>> > >> > > >> varint. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Agreed that a prefix and > > {int, > > > int} can > > > > be > > > > > the > > > > > >> > >> same. > > > > > >> > >> >> It > > > > > >> > >> >> >>> > doesn't > > > > > >> > >> >> >>> > >> > look > > > > > >> > >> >> >>> > >> > > like > > > > > >> > >> >> >>> > >> > > >> that's what the KIP is > > saying the > > > "Open" > > > > > >> > section. > > > > > >> > >> The > > > > > >> > >> >> >>> > example > > > > > >> > >> >> >>> > >> > shows > > > > > >> > >> >> >>> > >> > > >> 2100001 > > > > > >> > >> >> >>> > >> > > >> for New Relic and 210002 > for > > App > > > Dynamics > > > > > >> > implying > > > > > >> > >> >> that > > > > > >> > >> >> >>> the > > > > > >> > >> >> >>> > New > > > > > >> > >> >> >>> > >> > Relic > > > > > >> > >> >> >>> > >> > > >> organization will have > only a > > > single > > > > > header id > > > > > >> > to > > > > > >> > >> work > > > > > >> > >> >> >>> > with. Or > > > > > >> > >> >> >>> > >> > is > > > > > >> > >> >> >>> > >> > > 2100001 > > > > > >> > >> >> >>> > >> > > >> a prefix? The main point > of a > > > namespace > > > > or > > > > > >> > prefix > > > > > >> > >> is > > > > > >> > >> >> to > > > > > >> > >> >> >>> > reduce > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > >> overhead of config mapping > or > > > > registration > > > > > >> > >> depending > > > > > >> > >> >> on > > > > > >> > >> >> >>> how > > > > > >> > >> >> >>> > >> > > >> namespaces/prefixes are > > managed. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Would love to hear more > > feedback > > > on the > > > > > >> > >> higher-level > > > > > >> > >> >> >>> > questions > > > > > >> > >> >> >>> > >> > > though... > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Cheers, > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> Roger > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at > 11:38 > > AM, > > > radai < > > > > > >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> > > > > > >> > >> >> >>> > >> > > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > I think this discussion > is > > > getting a > > > > bit > > > > > >> into > > > > > >> > the > > > > > >> > >> >> >>> weeds on > > > > > >> > >> >> >>> > >> > technical > > > > > >> > >> >> >>> > >> > > >> > implementation details. > > > > > >> > >> >> >>> > >> > > >> > I'd liek to step back a > > minute > > > and try > > > > > and > > > > > >> > >> establish > > > > > >> > >> >> >>> > where we > > > > > >> > >> >> >>> > >> > are in > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> > larger picture: > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > (re-wording nacho's last > > > paragraph) > > > > > >> > >> >> >>> > >> > > >> > 1. are we all in > agreement > > that > > > headers > > > > > are > > > > > >> a > > > > > >> > >> >> >>> worthwhile > > > > > >> > >> >> >>> > and > > > > > >> > >> >> >>> > >> > useful > > > > > >> > >> >> >>> > >> > > >> > addition to have? this > was > > > contested > > > > > early > > > > > >> on > > > > > >> > >> >> >>> > >> > > >> > 2. are we all in > agreement > > on > > > headers > > > > as > > > > > top > > > > > >> > >> level > > > > > >> > >> >> >>> entity > > > > > >> > >> >> >>> > vs > > > > > >> > >> >> >>> > >> > headers > > > > > >> > >> >> >>> > >> > > >> > squirreled-away in V? > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > if there are still > concerns > > > around > > > > these > > > > > #2 > > > > > >> > >> points > > > > > >> > >> >> >>> (#jay? > > > > > >> > >> >> >>> > >> > #jun?)? > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > (and now back to our > normal > > > programming > > > > > ...) > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > varints are nice. having > > said > > > that, its > > > > > >> adding > > > > > >> > >> >> >>> complexity > > > > > >> > >> >> >>> > >> (see > > > > > >> > >> >> >>> > >> > > >> > > > https://github.com/addthis/ > > > > > >> > >> >> stream-lib/blob/master/src/ > > > > > >> > >> >> >>> > >> > > >> > > main/java/com/clearspring/ > > > > > >> > >> >> analytics/util/Varint.java > > > > > >> > >> >> >>> > >> > > >> > as 1st google result) and > > would > > > require > > > > > >> anyone > > > > > >> > >> >> writing > > > > > >> > >> >> >>> > other > > > > > >> > >> >> >>> > >> > clients > > > > > >> > >> >> >>> > >> > > (C? > > > > > >> > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) > to > > > > get/implement > > > > > the > > > > > >> > >> same, > > > > > >> > >> >> and > > > > > >> > >> >> >>> for > > > > > >> > >> >> >>> > >> > relatively > > > > > >> > >> >> >>> > >> > > >> > little gain (int vs > string > > is > > > order of > > > > > >> > magnitude, > > > > > >> > >> >> this > > > > > >> > >> >> >>> > isnt). > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > int namespacing vs {int, > > int} > > > > namespacing > > > > > >> are > > > > > >> > >> >> basically > > > > > >> > >> >> >>> > the > > > > > >> > >> >> >>> > >> > same > > > > > >> > >> >> >>> > >> > > thing - > > > > > >> > >> >> >>> > >> > > >> > youre just namespacing an > > int64 > > > and > > > > > giving > > > > > >> > people > > > > > >> > >> >> while > > > > > >> > >> >> >>> > 2^32 > > > > > >> > >> >> >>> > >> > ranges > > > > > >> > >> >> >>> > >> > > at a > > > > > >> > >> >> >>> > >> > > >> > time. the part i like > > about this > > > is > > > > > letting > > > > > >> > >> people > > > > > >> > >> >> >>> have a > > > > > >> > >> >> >>> > >> large > > > > > >> > >> >> >>> > >> > > swath of > > > > > >> > >> >> >>> > >> > > >> > numbers with one > > registration so > > > they > > > > > dont > > > > > >> > have > > > > > >> > >> to > > > > > >> > >> >> come > > > > > >> > >> >> >>> > back > > > > > >> > >> >> >>> > >> > for > > > > > >> > >> >> >>> > >> > > every > > > > > >> > >> >> >>> > >> > > >> > single plugin/header they > > want to > > > > > "reserve". > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at > > 11:01 AM, > > > Roger > > > > > >> Hoover > > > > > >> > < > > > > > >> > >> >> >>> > >> > > roger.hoo...@gmail.com> > > > > > >> > >> >> >>> > >> > > >> > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > > Since some of the > debate > > has > > > been > > > > about > > > > > >> > >> overhead + > > > > > >> > >> >> >>> > >> > performance, I'm > > > > > >> > >> >> >>> > >> > > >> > > wondering if we have > > > considered a > > > > > varint > > > > > >> > >> encoding > > > > > >> > >> >> ( > > > > > >> > >> >> >>> > >> > > >> > > > > https://developers.google.com/ > > > > > >> > >> >> protocol-buffers/docs/ > > > > > >> > >> >> >>> > >> > > encoding#varints) > > > > > >> > >> >> >>> > >> > > >> > for > > > > > >> > >> >> >>> > >> > > >> > > the header length field > > (int32 > > > in the > > > > > >> > proposal) > > > > > >> > >> >> and > > > > > >> > >> >> >>> for > > > > > >> > >> >> >>> > >> > header > > > > > >> > >> >> >>> > >> > > ids? If > > > > > >> > >> >> >>> > >> > > >> > you > > > > > >> > >> >> >>> > >> > > >> > > don't use headers, the > > > overhead would > > > > > be a > > > > > >> > >> single > > > > > >> > >> >> >>> byte > > > > > >> > >> >> >>> > and > > > > > >> > >> >> >>> > >> > for each > > > > > >> > >> >> >>> > >> > > >> > header > > > > > >> > >> >> >>> > >> > > >> > > id < 128 would also > need > > only a > > > > single > > > > > >> byte? > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at > > 6:43 AM, > > > > radai < > > > > > >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> > > > > > >> > >> >> >>> > >> > > >> > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > > @magnus - and very > > dangerous > > > (youre > > > > > >> > >> essentially > > > > > >> > >> >> >>> > >> > downloading and > > > > > >> > >> >> >>> > >> > > >> > executing > > > > > >> > >> >> >>> > >> > > >> > > > arbitrary code off > the > > > internet on > > > > > your > > > > > >> > >> servers > > > > > >> > >> >> ... > > > > > >> > >> >> >>> > bad > > > > > >> > >> >> >>> > >> > idea > > > > > >> > >> >> >>> > >> > > without > > > > > >> > >> >> >>> > >> > > >> a > > > > > >> > >> >> >>> > >> > > >> > > > sandbox, even with) > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > >> > >> >> >>> > >> > > >> > > > as for it being a > > purely > > > > > administrative > > > > > >> > task > > > > > >> > >> - i > > > > > >> > >> >> >>> > >> disagree. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > >> > >> >> >>> > >> > > >> > > > i wish it would, > > really, > > > because > > > > > then my > > > > > >> > >> earlier > > > > > >> > >> >> >>> > point on > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > >> > complexity > > > > > >> > >> >> >>> > >> > > >> > > of > > > > > >> > >> >> >>> > >> > > >> > > > the remapping process > > would > > > be > > > > > invalid, > > > > > >> > but > > > > > >> > >> at > > > > > >> > >> >> >>> > linkedin, > > > > > >> > >> >> >>> > >> > for > > > > > >> > >> >> >>> > >> > > example, > > > > > >> > >> >> >>> > >> > > >> > we > > > > > >> > >> >> >>> > >> > > >> > > > (the team im in) run > > kafka > > > as a > > > > > service. > > > > > >> > we > > > > > >> > >> dont > > > > > >> > >> >> >>> > really > > > > > >> > >> >> >>> > >> > know > > > > > >> > >> >> >>> > >> > > what our > > > > > >> > >> >> >>> > >> > > >> > > users > > > > > >> > >> >> >>> > >> > > >> > > > (developing > > applications > > > that use > > > > > kafka) > > > > > >> > are > > > > > >> > >> up > > > > > >> > >> >> to > > > > > >> > >> >> >>> at > > > > > >> > >> >> >>> > any > > > > > >> > >> >> >>> > >> > given > > > > > >> > >> >> >>> > >> > > >> moment. > > > > > >> > >> >> >>> > >> > > >> > > it > > > > > >> > >> >> >>> > >> > > >> > > > is very possible > > (given the > > > > > existance of > > > > > >> > >> headers > > > > > >> > >> >> >>> and a > > > > > >> > >> >> >>> > >> > > corresponding > > > > > >> > >> >> >>> > >> > > >> > > plugin > > > > > >> > >> >> >>> > >> > > >> > > > ecosystem) for some > > > application to > > > > > >> "equip" > > > > > >> > >> their > > > > > >> > >> >> >>> > >> producers > > > > > >> > >> >> >>> > >> > and > > > > > >> > >> >> >>> > >> > > >> > consumers > > > > > >> > >> >> >>> > >> > > >> > > > with the required > > plugin > > > without us > > > > > >> > knowing. > > > > > >> > >> i > > > > > >> > >> >> dont > > > > > >> > >> >> >>> > mean > > > > > >> > >> >> >>> > >> > to imply > > > > > >> > >> >> >>> > >> > > >> thats > > > > > >> > >> >> >>> > >> > > >> > > > bad, i just want to > > make the > > > point > > > > > that > > > > > >> > its > > > > > >> > >> not > > > > > >> > >> >> as > > > > > >> > >> >> >>> > simple > > > > > >> > >> >> >>> > >> > > keeping it > > > > > >> > >> >> >>> > >> > > >> in > > > > > >> > >> >> >>> > >> > > >> > > > sync across a > > large-enough > > > > > organization. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > >> > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 > at > > 6:17 > > > AM, > > > > > Magnus > > > > > >> > >> Edenhill > > > > > >> > >> >> < > > > > > >> > >> >> >>> > >> > > mag...@edenhill.se> > > > > > >> > >> >> >>> > >> > > >> > > > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > >> > >> >> >>> > >> > > >> > > > > I think there is a > > piece > > > missing > > > > in > > > > > >> the > > > > > >> > >> >> Strings > > > > > >> > >> >> >>> > >> > discussion, > > > > > >> > >> >> >>> > >> > > where > > > > > >> > >> >> >>> > >> > > >> > > > > pro-Stringers > > > > > >> > >> >> >>> > >> > > >> > > > > reason that by > > providing > > > unique > > > > > string > > > > > >> > >> >> >>> identifiers > > > > > >> > >> >> >>> > for > > > > > >> > >> >> >>> > >> > each > > > > > >> > >> >> >>> > >> > > header > > > > > >> > >> >> >>> > >> > > >> > > > > everything will > just > > > > > >> > >> >> >>> > >> > > >> > > > > magically work for > > all > > > parts of > > > > the > > > > > >> > stream > > > > > >> > >> >> >>> pipeline. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > But the strings > dont > > mean > > > > anything > > > > > by > > > > > >> > >> >> themselves, > > > > > >> > >> >> >>> > and > > > > > >> > >> >> >>> > >> > while we > > > > > >> > >> >> >>> > >> > > >> could > > > > > >> > >> >> >>> > >> > > >> > > > > probably envision > > > > > >> > >> >> >>> > >> > > >> > > > > some auto plugin > > loader > > > that > > > > > >> downloads, > > > > > >> > >> >> compiles, > > > > > >> > >> >> >>> > links > > > > > >> > >> >> >>> > >> > and > > > > > >> > >> >> >>> > >> > > runs > > > > > >> > >> >> >>> > >> > > >> > > plugins > > > > > >> > >> >> >>> > >> > > >> > > > > on-demand > > > > > >> > >> >> >>> > >> > > >> > > > > as soon as they're > > seen by > > > a > > > > > >> consumer, I > > > > > >> > >> dont > > > > > >> > >> >> >>> really > > > > > >> > >> >> >>> > >> see > > > > > >> > >> >> >>> > >> > a > > > > > >> > >> >> >>> > >> > > use-case > > > > > >> > >> >> >>> > >> > > >> > for > > > > > >> > >> >> >>> > >> > > >> > > > > something > > > > > >> > >> >> >>> > >> > > >> > > > > so dynamic (and > > fragile) in > > > > > practice. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > In the real world > an > > > application > > > > > will > > > > > >> be > > > > > >> > >> >> >>> configured > > > > > >> > >> >> >>> > >> with > > > > > >> > >> >> >>> > >> > a set > > > > > >> > >> >> >>> > >> > > of > > > > > >> > >> >> >>> > >> > > >> > > plugins > > > > > >> > >> >> >>> > >> > > >> > > > > to either add > > (producer) > > > > > >> > >> >> >>> > >> > > >> > > > > or read (consumer) > > headers. > > > > > >> > >> >> >>> > >> > > >> > > > > This is an > > administrative > > > task > > > > > based > > > > > >> on > > > > > >> > >> what > > > > > >> > >> >> >>> > features a > > > > > >> > >> >> >>> > >> > client > > > > > >> > >> >> >>> > >> > > >> > > > > needs/provides and > > results > > > in > > > > > >> > >> >> >>> > >> > > >> > > > > some sort of > > configuration > > > to > > > > > enable > > > > > >> and > > > > > >> > >> >> >>> configure > > > > > >> > >> >> >>> > the > > > > > >> > >> >> >>> > >> > desired > > > > > >> > >> >> >>> > >> > > >> > plugins. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > Since this needs to > > be kept > > > > > somewhat > > > > > >> in > > > > > >> > >> sync > > > > > >> > >> >> >>> across > > > > > >> > >> >> >>> > an > > > > > >> > >> >> >>> > >> > > organisation > > > > > >> > >> >> >>> > >> > > >> > > > (there > > > > > >> > >> >> >>> > >> > > >> > > > > is no point in > having > > > producers > > > > > >> > >> >> >>> > >> > > >> > > > > add headers no > > consumers > > > will > > > > read, > > > > > >> and > > > > > >> > >> vice > > > > > >> > >> >> >>> versa), > > > > > >> > >> >> >>> > >> the > > > > > >> > >> >> >>> > >> > added > > > > > >> > >> >> >>> > >> > > >> > > complexity > > > > > >> > >> >> >>> > >> > > >> > > > > of assigning an id > > > namespace > > > > > >> > >> >> >>> > >> > > >> > > > > for each plugin as > > it is > > > being > > > > > >> > configured > > > > > >> > >> >> should > > > > > >> > >> >> >>> be > > > > > >> > >> >> >>> > >> > tolerable. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > /Magnus > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 > > GMT+01:00 > > > > Michael > > > > > >> > Pearce < > > > > > >> > >> >> >>> > >> > > michael.pea...@ig.com>: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > Just > > following/catching > > > up on > > > > > what > > > > > >> > seems > > > > > >> > >> to > > > > > >> > >> >> be > > > > > >> > >> >> >>> an > > > > > >> > >> >> >>> > >> > active > > > > > >> > >> >> >>> > >> > > night :) > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > @Radai sorry if > it > > may > > > seem > > > > > obvious > > > > > >> > but > > > > > >> > >> what > > > > > >> > >> >> >>> does > > > > > >> > >> >> >>> > MD > > > > > >> > >> >> >>> > >> > stand > > > > > >> > >> >> >>> > >> > > for? > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > My take on String > > vs Int: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > I will state > first > > I am > > > pro Int > > > > > (16 > > > > > >> or > > > > > >> > >> 32). > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > I do though > playing > > > devils > > > > > advocate > > > > > >> > see a > > > > > >> > >> >> big > > > > > >> > >> >> >>> plus > > > > > >> > >> >> >>> > >> > with the > > > > > >> > >> >> >>> > >> > > >> > argument > > > > > >> > >> >> >>> > >> > > >> > > of > > > > > >> > >> >> >>> > >> > > >> > > > > > String keys, this > > is > > > around > > > > > >> > integrating > > > > > >> > >> >> into an > > > > > >> > >> >> >>> > >> > existing > > > > > >> > >> >> >>> > >> > > >> > eco-system. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > As many other > > systems use > > > > String > > > > > >> based > > > > > >> > >> >> headers > > > > > >> > >> >> >>> > >> (Flume, > > > > > >> > >> >> >>> > >> > JMS) > > > > > >> > >> >> >>> > >> > > it > > > > > >> > >> >> >>> > >> > > >> > makes > > > > > >> > >> >> >>> > >> > > >> > > > it > > > > > >> > >> >> >>> > >> > > >> > > > > > much easier for > > these to > > > be > > > > > >> > >> >> >>> > incorporated/integrated > > > > > >> > >> >> >>> > >> > into. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > How with Int > based > > > headers > > > > could > > > > > we > > > > > >> > >> provide > > > > > >> > >> >> a > > > > > >> > >> >> >>> > >> > way/guidence to > > > > > >> > >> >> >>> > >> > > >> make > > > > > >> > >> >> >>> > >> > > >> > > this > > > > > >> > >> >> >>> > >> > > >> > > > > > integration > simple > > / > > > easy with > > > > > >> > transition > > > > > >> > >> >> flows > > > > > >> > >> >> >>> > over > > > > > >> > >> >> >>> > >> to > > > > > >> > >> >> >>> > >> > > kafka? > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > * tough luck > buddy > > > you're on > > > > your > > > > > >> own > > > > > >> > >> >> >>> > >> > > >> > > > > > * simply hash the > > string > > > into > > > > int > > > > > >> code > > > > > >> > >> and > > > > > >> > >> >> hope > > > > > >> > >> >> >>> > for > > > > > >> > >> >> >>> > >> no > > > > > >> > >> >> >>> > >> > > collisions > > > > > >> > >> >> >>> > >> > > >> > > (how > > > > > >> > >> >> >>> > >> > > >> > > > to > > > > > >> > >> >> >>> > >> > > >> > > > > > convert back > > though?) > > > > > >> > >> >> >>> > >> > > >> > > > > > * http2 style as > > > mentioned by > > > > > nacho. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > cheers, > > > > > >> > >> >> >>> > >> > > >> > > > > > Mike > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > ______________________________ > > > > > >> > __________ > > > > > >> > >> >> >>> > >> > > >> > > > > > From: radai < > > > > > >> > radai.rosenbl...@gmail.com> > > > > > >> > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, > > > November 9, > > > > 2016 > > > > > >> > 8:12 AM > > > > > >> > >> >> >>> > >> > > >> > > > > > To: > > dev@kafka.apache.org > > > > > >> > >> >> >>> > >> > > >> > > > > > Subject: Re: > > [DISCUSS] > > > KIP-82 - > > > > > Add > > > > > >> > >> Record > > > > > >> > >> >> >>> Headers > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > thinking about it > > some > > > more, > > > > the > > > > > >> best > > > > > >> > >> way to > > > > > >> > >> >> >>> > transmit > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > header > > > > > >> > >> >> >>> > >> > > >> > > > > remapping > > > > > >> > >> >> >>> > >> > > >> > > > > > data to consumers > > would > > > be to > > > > > put it > > > > > >> > in > > > > > >> > >> the > > > > > >> > >> >> MD > > > > > >> > >> >> >>> > >> response > > > > > >> > >> >> >>> > >> > > payload, > > > > > >> > >> >> >>> > >> > > >> so > > > > > >> > >> >> >>> > >> > > >> > > > maybe > > > > > >> > >> >> >>> > >> > > >> > > > > > it should be > > discussed > > > now. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, > > 2016 at > > > 12:09 > > > > AM, > > > > > >> > radai < > > > > > >> > >> >> >>> > >> > > >> radai.rosenbl...@gmail.com > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > > > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > im not opposed > > to the > > > idea of > > > > > >> > namespace > > > > > >> > >> >> >>> mapping. > > > > > >> > >> >> >>> > >> all > > > > > >> > >> >> >>> > >> > im > > > > > >> > >> >> >>> > >> > > saying > > > > > >> > >> >> >>> > >> > > >> is > > > > > >> > >> >> >>> > >> > > >> > > > that > > > > > >> > >> >> >>> > >> > > >> > > > > > its > > > > > >> > >> >> >>> > >> > > >> > > > > > > not part of the > > "mvp" > > > and, > > > > > since > > > > > >> it > > > > > >> > >> >> requires > > > > > >> > >> >> >>> no > > > > > >> > >> >> >>> > >> wire > > > > > >> > >> >> >>> > >> > format > > > > > >> > >> >> >>> > >> > > >> > change, > > > > > >> > >> >> >>> > >> > > >> > > > can > > > > > >> > >> >> >>> > >> > > >> > > > > > > always be added > > later. > > > > > >> > >> >> >>> > >> > > >> > > > > > > also, its not > as > > > simple as > > > > just > > > > > >> > >> >> configuring > > > > > >> > >> >> >>> MM > > > > > >> > >> >> >>> > to > > > > > >> > >> >> >>> > >> do > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > >> > transform: > > > > > >> > >> >> >>> > >> > > >> > > > > lets > > > > > >> > >> >> >>> > >> > > >> > > > > > > say i've > > implemented > > > large > > > > > message > > > > > >> > >> >> support as > > > > > >> > >> >> >>> > >> > {666,1} and > > > > > >> > >> >> >>> > >> > > on > > > > > >> > >> >> >>> > >> > > >> some > > > > > >> > >> >> >>> > >> > > >> > > > > mirror > > > > > >> > >> >> >>> > >> > > >> > > > > > > target cluster > > its been > > > > > remapped > > > > > >> to > > > > > >> > >> >> {999,1}. > > > > > >> > >> >> >>> the > > > > > >> > >> >> >>> > >> > consumer > > > > > >> > >> >> >>> > >> > > >> plugin > > > > > >> > >> >> >>> > >> > > >> > > code > > > > > >> > >> >> >>> > >> > > >> > > > > > would > > > > > >> > >> >> >>> > >> > > >> > > > > > > also need to be > > told > > > to look > > > > > for > > > > > >> the > > > > > >> > >> large > > > > > >> > >> >> >>> > message > > > > > >> > >> >> >>> > >> > "part X > > > > > >> > >> >> >>> > >> > > of > > > > > >> > >> >> >>> > >> > > >> Y" > > > > > >> > >> >> >>> > >> > > >> > > > header > > > > > >> > >> >> >>> > >> > > >> > > > > > > under {999,1}. > > doable, > > > but > > > > > tricky. > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, > > 2016 at > > > 10:29 > > > > > PM, > > > > > >> > Gwen > > > > > >> > >> >> >>> Shapira < > > > > > >> > >> >> >>> > >> > > >> g...@confluent.io > > > > > >> > >> >> >>> > >> > > >> > > > > > > > >> > >> >> >>> > >> > > >> > > > > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> While you can > do > > > whatever > > > > you > > > > > >> want > > > > > >> > >> with a > > > > > >> > >> >> >>> > >> namespace > > > > > >> > >> >> >>> > >> > and > > > > > >> > >> >> >>> > >> > > your > > > > > >> > >> >> >>> > >> > > >> > code, > > > > > >> > >> >> >>> > >> > > >> > > > > > >> what I'd > expect > > is > > > for each > > > > > app > > > > > >> to > > > > > >> > >> >> >>> namespaces > > > > > >> > >> >> >>> > >> > > configurable... > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> So if I > > accidentally > > > used > > > > 666 > > > > > for > > > > > >> > my > > > > > >> > >> HR > > > > > >> > >> >> >>> > >> department, > > > > > >> > >> >> >>> > >> > and > > > > > >> > >> >> >>> > >> > > still > > > > > >> > >> >> >>> > >> > > >> > want > > > > > >> > >> >> >>> > >> > > >> > > > to > > > > > >> > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, > I > > can > > > config > > > > > >> > >> "namespace=42" > > > > > >> > >> >> >>> for > > > > > >> > >> >> >>> > >> > RadaiApp and > > > > > >> > >> >> >>> > >> > > >> > > > everything > > > > > >> > >> >> >>> > >> > > >> > > > > > >> will look > > normal. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> This means you > > only > > > need to > > > > > sync > > > > > >> > usage > > > > > >> > >> >> >>> inside > > > > > >> > >> >> >>> > your > > > > > >> > >> >> >>> > >> > own > > > > > >> > >> >> >>> > >> > > >> > > organization. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> Still hard, > but > > > somewhat > > > > > easier > > > > > >> > than > > > > > >> > >> >> syncing > > > > > >> > >> >> >>> > with > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > entire > > > > > >> > >> >> >>> > >> > > >> > > world. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov 8, > > 2016 > > > at 10:07 > > > > > PM, > > > > > >> > >> radai < > > > > > >> > >> >> >>> > >> > > >> > > > > radai.rosenbl...@gmail.com> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > and we can > > start > > > with > > > > > >> {namespace, > > > > > >> > >> id} > > > > > >> > >> >> and > > > > > >> > >> >> >>> no > > > > > >> > >> >> >>> > >> > re-mapping > > > > > >> > >> >> >>> > >> > > >> > support > > > > > >> > >> >> >>> > >> > > >> > > > and > > > > > >> > >> >> >>> > >> > > >> > > > > > >> always > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > add it later > > on > > > if/when > > > > > >> > collisions > > > > > >> > >> >> >>> actually > > > > > >> > >> >> >>> > >> > happen (i > > > > > >> > >> >> >>> > >> > > dont > > > > > >> > >> >> >>> > >> > > >> > think > > > > > >> > >> >> >>> > >> > > >> > > > > > they'd > > > > > >> > >> >> >>> > >> > > >> > > > > > >> be > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > a problem). > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > every > > interested > > > party (so > > > > > orgs > > > > > >> > or > > > > > >> > >> >> >>> > individuals) > > > > > >> > >> >> >>> > >> > could > > > > > >> > >> >> >>> > >> > > then > > > > > >> > >> >> >>> > >> > > >> > > > register > > > > > >> > >> >> >>> > >> > > >> > > > > a > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = > > > reserved, 1 = > > > > > >> > confluent > > > > > >> > >> ... > > > > > >> > >> >> >>> 666 > > > > > >> > >> >> >>> > = me > > > > > >> > >> >> >>> > >> > :-) ) > > > > > >> > >> >> >>> > >> > > and > > > > > >> > >> >> >>> > >> > > >> do > > > > > >> > >> >> >>> > >> > > >> > > > > whatever > > > > > >> > >> >> >>> > >> > > >> > > > > > >> with > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID - > > so once > > > > > linkedin > > > > > >> > >> >> registers, > > > > > >> > >> >> >>> say > > > > > >> > >> >> >>> > 3, > > > > > >> > >> >> >>> > >> > then > > > > > >> > >> >> >>> > >> > > >> linkedin > > > > > >> > >> >> >>> > >> > > >> > > devs > > > > > >> > >> >> >>> > >> > > >> > > > > are > > > > > >> > >> >> >>> > >> > > >> > > > > > >> free > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > to use {3, > *} > > with a > > > > > reasonable > > > > > >> > >> >> >>> expectation > > > > > >> > >> >> >>> > to > > > > > >> > >> >> >>> > >> to > > > > > >> > >> >> >>> > >> > > collide > > > > > >> > >> >> >>> > >> > > >> with > > > > > >> > >> >> >>> > >> > > >> > > > > > anything > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > else. > further > > > partitioning > > > > > of > > > > > >> > that * > > > > > >> > >> >> >>> becomes > > > > > >> > >> >> >>> > >> > linkedin's > > > > > >> > >> >> >>> > >> > > >> > problem, > > > > > >> > >> >> >>> > >> > > >> > > > but > > > > > >> > >> >> >>> > >> > > >> > > > > > the > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > "upstream > > > registration" > > > > of a > > > > > >> > >> namespace > > > > > >> > >> >> >>> only > > > > > >> > >> >> >>> > has > > > > > >> > >> >> >>> > >> to > > > > > >> > >> >> >>> > >> > > happen > > > > > >> > >> >> >>> > >> > > >> > once. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov > > 8, 2016 > > > at > > > > 9:03 > > > > > PM, > > > > > >> > >> James > > > > > >> > >> >> >>> Cheng < > > > > > >> > >> >> >>> > >> > > >> > > wushuja...@gmail.com > > > > > >> > >> >> >>> > >> > > >> > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > On Nov 8, > > 2016, > > > at 5:54 > > > > > PM, > > > > > >> > Gwen > > > > > >> > >> >> >>> Shapira < > > > > > >> > >> >> >>> > >> > > >> > g...@confluent.io> > > > > > >> > >> >> >>> > >> > > >> > > > > > wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Thank you > > so > > > much for > > > > > this > > > > > >> > clear > > > > > >> > >> and > > > > > >> > >> >> >>> fair > > > > > >> > >> >> >>> > >> > summary of > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> > > > > arguments. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > I'm in > > favor of > > > ints. > > > > > Not a > > > > > >> > >> >> >>> deal-breaker, > > > > > >> > >> >> >>> > but > > > > > >> > >> >> >>> > >> > in > > > > > >> > >> >> >>> > >> > > favor. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Even more > > in > > > favor of > > > > > >> Magnus's > > > > > >> > >> >> >>> > decentralized > > > > > >> > >> >> >>> > >> > > suggestion > > > > > >> > >> >> >>> > >> > > >> > with > > > > > >> > >> >> >>> > >> > > >> > > > > > Roger's > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > tweak: > add > > a > > > namespace > > > > > for > > > > > >> > >> headers. > > > > > >> > >> >> >>> This > > > > > >> > >> >> >>> > will > > > > > >> > >> >> >>> > >> > allow > > > > > >> > >> >> >>> > >> > > each > > > > > >> > >> >> >>> > >> > > >> > app > > > > > >> > >> >> >>> > >> > > >> > > to > > > > > >> > >> >> >>> > >> > > >> > > > > > just > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > use > > whatever IDs > > > it > > > > wants > > > > > >> > >> >> internally, > > > > > >> > >> >> >>> and > > > > > >> > >> >> >>> > >> then > > > > > >> > >> >> >>> > >> > let > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> > admin > > > > > >> > >> >> >>> > >> > > >> > > > > > >> deploying > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > the app > > figure > > > out an > > > > > >> > available > > > > > >> > >> >> >>> namespace > > > > > >> > >> >> >>> > ID > > > > > >> > >> >> >>> > >> > for the > > > > > >> > >> >> >>> > >> > > app > > > > > >> > >> >> >>> > >> > > >> to > > > > > >> > >> >> >>> > >> > > >> > > > live > > > > > >> > >> >> >>> > >> > > >> > > > > > in. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > So > > > > > >> > io.confluent.schema-registry > > > > > >> > >> can > > > > > >> > >> >> be > > > > > >> > >> >> >>> > >> > namespace > > > > > >> > >> >> >>> > >> > > 0x01 on > > > > > >> > >> >> >>> > >> > > >> my > > > > > >> > >> >> >>> > >> > > >> > > > > > >> deployment > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > and 0x57 > on > > > yours, and > > > > > the > > > > > >> > poor > > > > > >> > >> guys > > > > > >> > >> >> >>> > >> > developing the > > > > > >> > >> >> >>> > >> > > app > > > > > >> > >> >> >>> > >> > > >> > don't > > > > > >> > >> >> >>> > >> > > >> > > > > need > > > > > >> > >> >> >>> > >> > > >> > > > > > to > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > worry > > about that. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> Gwen, if I > > > understand > > > > your > > > > > >> > example > > > > > >> > >> >> >>> right, an > > > > > >> > >> >> >>> > >> > > application > > > > > >> > >> >> >>> > >> > > >> > > deployer > > > > > >> > >> >> >>> > >> > > >> > > > > > might > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> decide to > > use 0x01 > > > in one > > > > > >> > >> deployment, > > > > > >> > >> >> and > > > > > >> > >> >> >>> > that > > > > > >> > >> >> >>> > >> > means > > > > > >> > >> >> >>> > >> > > that > > > > > >> > >> >> >>> > >> > > >> > once > > > > > >> > >> >> >>> > >> > > >> > > > the > > > > > >> > >> >> >>> > >> > > >> > > > > > >> message > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> is written > > into the > > > > > broker, it > > > > > >> > >> will be > > > > > >> > >> >> >>> > saved on > > > > > >> > >> >> >>> > >> > the > > > > > >> > >> >> >>> > >> > > broker > > > > > >> > >> >> >>> > >> > > >> > with > > > > > >> > >> >> >>> > >> > > >> > > > > that > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> specific > > namespace > > > > (0x01). > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> If you were > > to > > > mirror > > > > that > > > > > >> > message > > > > > >> > >> >> into > > > > > >> > >> >> >>> > another > > > > > >> > >> >> >>> > >> > > cluster, > > > > > >> > >> >> >>> > >> > > >> the > > > > > >> > >> >> >>> > >> > > >> > > 0x01 > > > > > >> > >> >> >>> > >> > > >> > > > > > would > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> accompany > the > > > message, > > > > > right? > > > > > >> > What > > > > > >> > >> if > > > > > >> > >> >> the > > > > > >> > >> >> >>> > >> > deployers of > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> > same > > > > > >> > >> >> >>> > >> > > >> > > > app > > > > > >> > >> >> >>> > >> > > >> > > > > > in > > > > > >> > >> >> >>> > >> > > >> > > > > > >> the > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> other > > cluster uses > > > 0x57? > > > > > They > > > > > >> > won't > > > > > >> > >> >> >>> > understand > > > > > >> > >> >> >>> > >> > each > > > > > >> > >> >> >>> > >> > > other? > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> I'm not > sure > > > that's an > > > > > >> avoidable > > > > > >> > >> >> >>> problem. I > > > > > >> > >> >> >>> > >> > think it > > > > > >> > >> >> >>> > >> > > simply > > > > > >> > >> >> >>> > >> > > >> > > means > > > > > >> > >> >> >>> > >> > > >> > > > > > that > > > > > >> > >> >> >>> > >> > > >> > > > > > >> in > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> order to > > share > > > data, you > > > > > have > > > > > >> to > > > > > >> > >> also > > > > > >> > >> >> >>> have a > > > > > >> > >> >> >>> > >> > shared > > > > > >> > >> >> >>> > >> > > (agreed > > > > > >> > >> >> >>> > >> > > >> > > upon) > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > understanding of > > > what the > > > > > >> > >> namespaces > > > > > >> > >> >> >>> mean. > > > > > >> > >> >> >>> > >> Which > > > > > >> > >> >> >>> > >> > I > > > > > >> > >> >> >>> > >> > > think > > > > > >> > >> >> >>> > >> > > >> > makes > > > > > >> > >> >> >>> > >> > > >> > > > > sense, > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> because the > > > alternate > > > > > (sharing > > > > > >> > >> >> *nothing* > > > > > >> > >> >> >>> at > > > > > >> > >> >> >>> > >> all) > > > > > >> > >> >> >>> > >> > would > > > > > >> > >> >> >>> > >> > > mean > > > > > >> > >> >> >>> > >> > > >> > > that > > > > > >> > >> >> >>> > >> > > >> > > > > > there > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> would be no > > way to > > > > > understand > > > > > >> > each > > > > > >> > >> >> other. > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> -James > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Gwen > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > On Tue, > > Nov 8, > > > 2016 at > > > > > 4:23 > > > > > >> > PM, > > > > > >> > >> >> radai < > > > > > >> > >> >> >>> > >> > > >> > > > > > > radai.rosenbl...@gmail.com > > > > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> wrote: > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> +1 for > > sean's > > > > document. > > > > > it > > > > > >> > >> covers > > > > > >> > >> >> >>> pretty > > > > > >> > >> >> >>> > >> much > > > > > >> > >> >> >>> > >> > all > > > > > >> > >> >> >>> > >> > > the > > > > > >> > >> >> >>> > >> > > >> > > > trade-offs > > > > > >> > >> >> >>> > >> > > >> > > > > > and > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> provides > > > concrete > > > > > figures > > > > > >> to > > > > > >> > >> argue > > > > > >> > >> >> >>> about > > > > > >> > >> >> >>> > :-) > > > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> > > (nit-picking - > > > used > > > > the > > > > > >> same > > > > > >> > >> xkcd > > > > > >> > >> >> >>> twice, > > > > > >> > >> >> >>> > >> also > > > > > >> > >> >> >>> > >> > trove > > > > > >> > >> >> >>> > >> > > has > > > > > >> > >> >> >>> > >> > > >> > been > > > > > >> > >> >> >>> > >> > > >> > > > > > >> superceded > > > > > >> > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > Gwen Shapira > > > > > >> > Product Manager | Confluent > > > > > >> > 650.450.2760 | @gwenshap > > > > > >> > Follow us: Twitter | blog > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> *Todd Palino* > > > > > >> Staff Site Reliability Engineer > > > > > >> Data Infrastructure Streaming > > > > > >> > > > > > >> > > > > > >> > > > > > >> linkedin.com/in/toddpalino > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > Gwen Shapira > > > > > Product Manager | Confluent > > > > > 650.450.2760 | @gwenshap > > > > > Follow us: Twitter | blog > > > > > > > > > > > > > > > > > > The information contained in this email is strictly confidential > and > > for > > > the use of the addressee only, unless otherwise indicated. If you > > are not > > > the intended recipient, please do not read, copy, use or disclose > to > > others > > > this message or any attachment. Please also notify the sender by > > replying > > > to this email or by telephone (+44(020 7896 0011) and then delete > > the email > > > and any copies of it. Opinions, conclusion (etc) that do not relate > > to the > > > official business of this company shall be understood as neither > > given nor > > > endorsed by it. IG is a trading name of IG Markets Limited (a > company > > > registered in England and Wales, company number 04008957) and IG > > Index > > > Limited (a company registered in England and Wales, company number > > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate > > Hill, > > > London EC4R 2YA. Both IG Markets Limited (register number 195355) > > and IG > > > Index Limited (register number 114059) are authorised and regulated > > by the > > > Financial Conduct Authority. > > > > > > > > > The information contained in this email is strictly confidential and for > > the use of the addressee only, unless otherwise indicated. If you are not > > the intended recipient, please do not read, copy, use or disclose to > others > > this message or any attachment. Please also notify the sender by replying > > to this email or by telephone (+44(020 7896 0011) and then delete the > email > > and any copies of it. Opinions, conclusion (etc) that do not relate to > the > > official business of this company shall be understood as neither given > nor > > endorsed by it. IG is a trading name of IG Markets Limited (a company > > registered in England and Wales, company number 04008957) and IG Index > > Limited (a company registered in England and Wales, company number > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > > Index Limited (register number 114059) are authorised and regulated by > the > > Financial Conduct Authority. > > >