I agree with all. Just want to elaborate a few things: 3. There are two different use cases: (a) the one you describe -- I want to shutdown NOW and don't want to wait -- I agree with your observations etc (b) we intentionally want to "drain" the stream processing topology before shutting down -- yes, if I have lot of intermediate data this might take some time, but I want/need a clean shutdown like this
Case 3(b) is currently not possible and exactly want we need for "Incremental Batch KIP" -- there are other use case for 3(b), too. 4. The point about "it's just a client thing is true, but it should work for client that are not aware of the messages, too. Ie, we need an opt-in mechanism -- so some changes are required -- not to the brokers though -- but it cannot be done "external" to the clients -- otherwise people would need to change their client code. About "embedded control message" vs "extra control message stream". IMHO, there a use cases for both and both approaches complete each other (they are not conflicting). -Matthias On 12/14/16 8:36 PM, Ignacio Solis wrote: > I'm renaming this thread in case we start deep diving. > > I'm in favor of so called "control messages", at least the notion of > those. However, I'm not sure about the design. > > What I understood from the original mail: > > A. Provide a message that does not get returned by poll() > B. Provide a way for applications to consume these messages (sign up?) > C. Control messages would be associated with a topic. > D. Control messages should be _in_ the topic. > > > > 1. The first thing to point out is that this can be done with headers. > I assume that's why you sent it on the header thread. As you state, if > we had headers, you would not require a separate KIP. So, in a way, > you're trying to provide a concrete use case for headers. I wanted to > separate the discussion to a separate thread mostly because while I > like the idea, and I like the fact that it can be done by headers, > people might want to discuss alternatives. > > 2. I'm also assuming that you're intentionally trying to preserve > order. Headers could do this natively of course. You could also > achieve this with the separate topic given identifiers, sequence > numbers, headers, etc. However... > > 3. There are a few use cases where ordering is important but > out-of-band is even more important. We have a few large workloads > where this is of interest to us. Obviously we can achieve this with a > separate topic, but having a control channel for a topic that can send > high priority data would be interesting. And yes, we would learn a > lot form the TCP experiences with the urgent pointer ( > https://tools.ietf.org/html/rfc6093 ) and other out-of-band > communication techniques. > > You have an example of a "shutdown marker". This works ok as a > terminator, however, it is not very fast. If I have 4 TB of data > because of asynchronous processing, then a shutdown marker at the end > of the 4TB is not as useful as having an out-of-band message that will > tell me immediately that those 4TB should not be processed. So, from > this perspective, I prefer to have a separate topic and not embed > control messages with the data. > > If the messages are part of the data, or associated to specific data, > then they should be in the data. If they are about process, we need an > out-of-band mechanism. > > > 4. The general feeling I have gotten from a few people on the list is: > Why not just do this above the kafka clients? After all, you could > have a system to ignore certain schemas. > > Effectively, if we had headers, it would be done from a client > perspective, without the need to modify anything major. > > If we wanted to do it with a separate topic, that could also be done > without any broker changes. But you could imagine wanting some broker > changes if the broker understands that 2 streams are tied together > then it may make decisions based on that. This would be similar to > the handling of file system forks ( > https://en.wikipedia.org/wiki/Fork_(file_system) ) > > > 5. Also heard on discussions about headers: we don't know if this is > generally useful. Maybe only a couple of institutions? It may not be > worth it to modify the whole stack for that. > > I would again say that with headers you could pull it off easily, even > if only for a subset of clients/applications wanted to use it. > > > So, in summary. I like the idea. I see benefits in implementing it > through headers, but I also see benefits of having it as a separate > stream. I'm not too in favor of having a separate message handling > pipeline for the same topic though. > > Nacho > > > > > > On Wed, Dec 14, 2016 at 9:51 AM, Matthias J. Sax <matth...@confluent.io> > wrote: >> Yes and no. I did overload the term "control message". >> >> EOS control messages are for client-broker communication and thus never >> exposed to any application. And I think this is a good design because >> broker needs to understand those control messages. Thus, this should be >> a protocol change. >> >> The type of control messages I have in mind are for client-client >> (application-application) communication and the broker is agnostic to >> them. Thus, it should not be a protocol change. >> >> >> -Matthias >> >> >> >> On 12/14/16 9:42 AM, radai wrote: >>> arent control messages getting pushed as their own top level protocol >>> change (and a fairly massive one) for the transactions KIP ? >>> >>> On Tue, Dec 13, 2016 at 5:54 PM, Matthias J. Sax <matth...@confluent.io> >>> wrote: >>> >>>> Hi, >>>> >>>> I want to add a completely new angle to this discussion. For this, I >>>> want to propose an extension for the headers feature that enables new >>>> uses cases -- and those new use cases might convince people to support >>>> headers (of course including the larger scoped proposal). >>>> >>>> Extended Proposal: >>>> >>>> Allow messages with a certain header key to be special "control >>>> messages" (w/ o w/o payload) that are not exposed to an application via >>>> .poll(). >>>> >>>> Thus, a consumer client would automatically skip over those messages. If >>>> an application knows about embedded control messages, it can "sing up" >>>> to those messages by the consumer client and either get a callback or >>>> the consumer auto-drop for this messages gets disabled (allowing to >>>> consumer those messages via poll()). >>>> >>>> (The details need further considerations/discussion. I just want to >>>> sketch the main idea.) >>>> >>>> Usage: >>>> >>>> There is a shared topic (ie, used by multiple applications) and a >>>> producer application wants to embed a special message in the topic for a >>>> dedicated consumer application. Because only one application will >>>> understand this message, it cannot be a regular message as this would >>>> break all applications that do not understand this message. The producer >>>> application would set a special metadata key and no consumer application >>>> would see this control message by default because they did not enable >>>> their consumer client to return this message in poll() (and the client >>>> would just drop this message with special metadata key). Only the single >>>> application that should receive this message, will subscribe to this >>>> message on its consumer client and process it. >>>> >>>> >>>> Concrete Use Case: Kafka Streams >>>> >>>> In Kafka Streams, we would like to propagate "control messages" from >>>> subtopology to subtopology. There are multiple scenarios for which this >>>> would be useful. For example, currently we do not guarantee a >>>> "consistent shutdown" of an application. By this, I mean that input >>>> records might not be completely processed by the whole topology because >>>> the application shutdown happens "in between" and an intermediate result >>>> topic gets "stock" in an intermediate topic. Thus, a user would see an >>>> committed offset of the source topic of the application, but no >>>> corresponding result record in the output topic. >>>> >>>> Having "shutdown markers" would allow us, to first stop the upstream >>>> subtopology and write this marker into the intermediate topic and the >>>> downstream subtopology would only shut down itself after is sees the >>>> "shutdown marker". Thus, we can guarantee on shutdown, that no >>>> "in-flight" messages got stuck in intermediate topics. >>>> >>>> >>>> A similar usage would be for KIP-95 (Incremental Batch Processing). >>>> There was a discussion about the proposed metadata topic, and we could >>>> avoid this metadata topic if we would have "control messages". >>>> >>>> >>>> Right now, we cannot insert an "application control message" because >>>> Kafka Streams does not own all topics it read/writes and thus might >>>> break other consumer application (as described above) if we inject >>>> random messages that are not understood by other apps. >>>> >>>> >>>> Of course, one can work around "embedded control messaged" by using an >>>> additional topic to propagate control messaged between application (as >>>> suggestion in KIP-95 via a metadata topic for Kafka Streams). But there >>>> are major concerns about adding this metadata topic in the KIP and this >>>> shows that other application that need a similar pattern might profit >>>> from topic embedded "control messages", too. >>>> >>>> >>>> One last important consideration: those "control messages" are used for >>>> client to client communication and are not understood by the broker. >>>> Thus, those messages should not be enabled within the message format >>>> (c.f. tombstone flag -- KIP-87). However, "client land" record headers >>>> would be a nice way to implement them. Because KIP-82 did consider key >>>> namespaces for metatdata keys, this extension should not be an own KIP >>>> but should be included in KIP-82 to reserve a namespace for "control >>>> message" in the first place. >>>> >>>> >>>> Sorry for the long email... Looking forward to your feedback. >>>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 12/8/16 12:12 AM, Michael Pearce wrote: >>>>> Hi Jun >>>>> >>>>> 100) each time a transaction exits a jvm for a remote system (HTTP/JMS/ >>>> Hopefully one day kafka) the APM tools stich in a unique id (though I >>>> believe it contains the end2end uuid embedded in this id), on receiving the >>>> message at the receiving JVM the apm code takes this out, and continues its >>>> tracing on the that new thread. Both JVM’s (and other languages the APM >>>> tool supports) send this data async back to the central controllers where >>>> the stiching togeather occurs. For this they need some header space for >>>> them to put this id. >>>>> >>>>> 101) Yes indeed we have a business transaction Id in the payload. Though >>>> this is a system level tracing, that we need to have marry up. Also as per >>>> note on end2end encryption we’d be unable to prove the flow if the payload >>>> is encrypted as we’d not have access to this at certain points of the flow >>>> through the infrastructure/platform. >>>>> >>>>> >>>>> 103) As said we use this mechanism in IG very successfully, as stated >>>> per key we guarantee the transaction producing app to handle the >>>> transaction of a key at one DC unless at point of critical failure where we >>>> have to flip processing to another. We care about key ordering. >>>>> I disagree on the offset comment for the partition solution unless you >>>> do full ISR, or expensive full XA transactions even with partitions you >>>> cannot fully guarantee offsets would match. >>>>> >>>>> 105) Very much so, I need to have access at the platform level to the >>>> other meta data all mentioned, without having to need to have access to the >>>> encryption keys of the payload. >>>>> >>>>> 106) >>>>> Techincally yes for AZ/Region/Cluster, but then we’d need to have a >>>> global producerId register which would be very hard to enforce/ensure is >>>> current and correct, just to understand the message origins of its >>>> region/az/cluster for routing. >>>>> The client wrapper version, producerId can be the same, as obviously the >>>> producer could upgrade its wrapper, as such we need to know what wrapper >>>> version the message is created with. >>>>> Likewise the IP address, as stated we can have our producer move, where >>>> its IP would change. >>>>> >>>>> 107) >>>>> UUID is set on the message by interceptors before actual producer >>>> transport send. This is for platform level message dedupe guarantee, the >>>> business payload should be agnostic to this. Please see >>>> https://activemq.apache.org/artemis/docs/1.5.0/duplicate-detection.html >>>> note this is not touching business payloads. >>>>> >>>>> >>>>> >>>>> On 06/12/2016, 18:22, "Jun Rao" <j...@confluent.io> wrote: >>>>> >>>>> Hi, Michael, >>>>> >>>>> Thanks for the reply. I find it very helpful. >>>>> >>>>> Data lineage: >>>>> 100. I'd like to understand the APM use case a bit more. It sounds >>>> like >>>>> that those APM plugins can generate a transaction id that we could >>>>> potentially put in the header of every message. How would you >>>> typically >>>>> make use of such transaction ids? Are there other metadata >>>> associated with >>>>> the transaction id and if so, how are they propagated downstream? >>>>> >>>>> 101. For the finance use case, if the concept of transaction is >>>> important, >>>>> wouldn't it be typically included in the message payload instead of >>>> as an >>>>> optional header field? >>>>> >>>>> 102. The data lineage that Altas and Navigator support seems to be >>>> at the >>>>> dataset level, not per record level? So, not sure if per message >>>> headers >>>>> are relevant there. >>>>> >>>>> Mirroring: >>>>> 103. The benefit of using separate partitions is that it potentially >>>> makes >>>>> it easy to preserve offsets during mirroring. This will make it >>>> easier for >>>>> consumer to switch clusters. Currently, the consumers can switch >>>> clusters >>>>> by using the timestampToOffset() api, but it has to deal with >>>> duplicates. >>>>> Good point on the issue with log compact and I am not sure how to >>>> address >>>>> this. However, even if we mirror into the existing partitions, the >>>> ordering >>>>> for messages generated from different clusters seems >>>> non-deterministic >>>>> anyway. So, it seems that the consumers already have to deal with >>>> that? If >>>>> a topic is compacted, does that mean which messages are preserved is >>>> also >>>>> non-deterministic across clusters? >>>>> >>>>> 104. Good point on partition key. >>>>> >>>>> End-to-end encryption: >>>>> 105. So, it seems end-to-end encryption is useful. Are headers >>>> useful there? >>>>> >>>>> Auditing: >>>>> 106. It seems other than the UUID, all other metadata are per >>>> producer? >>>>> >>>>> EOS: >>>>> 107. How are those UUIDs generated? I am not sure if they can be >>>> generated >>>>> in the producer library. An application may send messages through a >>>> load >>>>> balancer and on retry, the same message could be routed to a >>>> different >>>>> producer instance. So, it seems that the application has to generate >>>> the >>>>> UUIDs. In that case, shouldn't the application just put the UUID in >>>> the >>>>> payload? >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Fri, Dec 2, 2016 at 4:57 PM, Michael Pearce < >>>> michael.pea...@ig.com> >>>>> wrote: >>>>> >>>>> > Hi Jun. >>>>> > >>>>> > Per Transaction Tracing / Data Lineage. >>>>> > >>>>> > As Stated in the KIP this has the first use case of how many APM >>>> tools now >>>>> > work. >>>>> > I would find it impossible for any one to argue this is not >>>> important or a >>>>> > niche market as it has its own gartner report for this space. Such >>>>> > companies as Appdynamics, NewRelic, Dynatrace, Hawqular are but a >>>> few. >>>>> > >>>>> > Likewise these APM tools can help very rapidly track down issues >>>> and >>>>> > automatically capture metrics, perform actions based on unexpected >>>> behavior >>>>> > to auto recover services. >>>>> > >>>>> > Before mentioning looking at aggregated stats, in these cases where >>>>> > actually on critical flows we cannot afford to have aggregated >>>> rolled up >>>>> > stats only. >>>>> > >>>>> > With the APM tool we use its actually able to detect a single >>>> transaction >>>>> > failure and capture the thread traces in the JVM where it failed >>>> and >>>>> > everything for us, to the point it sends us alerts where we have >>>> this >>>>> > giving the line number of the code that caused it, the transaction >>>> trace >>>>> > through all the services and endpoints (supported) upto the point >>>> of >>>>> > failure, it can also capture the data in and out (so we can >>>> replay). >>>>> > Because atm Kafka doesn’t support us being able to stich in these >>>> tracing >>>>> > transaction ids natively, we cannot get these benefits as such is >>>> limiting >>>>> > our ability support apps and monitor them to the same standards we >>>> come to >>>>> > expect when on a kafka flow. >>>>> > >>>>> > This actually ties in with Data Lineage, as the same tracing can >>>> be used >>>>> > to back stich this. Essentially many times due to the sums of money >>>>> > involved there are disputes, and typically as a financial >>>> institute the >>>>> > easiest and cleanest way to prove when disputes arise is to >>>> present the >>>>> > actual flow and processes involved in a transaction. >>>>> > >>>>> > Likewise as Hadoop matures its evident this case is important, as >>>> tools >>>>> > such as Atlas (Hortonworks led) and Navigator (cloudera led) are >>>> evident >>>>> > also I believe the importance here is very much NOT just a >>>> financial issue. >>>>> > >>>>> > From a MDM point of view any company wanting to care about Data >>>> Quality >>>>> > and Data Governance - Data Lineage is a key piece in this puzzle. >>>>> > >>>>> > >>>>> > >>>>> > RE Mirroring, >>>>> > >>>>> > As per the KIP in-fact this is exactly what we do re cluster id, >>>> to mirror >>>>> > a network of clusters between AZ’s / Regions. We know a >>>> transaction for a >>>>> > key will be done within a AZ/Region, as such we know the write to >>>> kafka >>>>> > would be ordered per key. But we need eventual view of that across >>>> in our >>>>> > other regions/az’s. When we have complete AZ or Region failure we >>>> know >>>>> > there will be a brief interruption whilst those transactions are >>>> moved to >>>>> > another region but we expect after it to continue. >>>>> > >>>>> > As mentioned having separate Partions to do this starts to get >>>>> > ugly/complicated for us: >>>>> > how would I do compaction where a key is in two partitions? >>>>> > How do we balance consumers so where multiple partitions with the >>>> same key >>>>> > goto the same consumer >>>>> > What do you do if cluster 1 has 5 partitions but cluster 20 has 10 >>>> because >>>>> > its larger kit in our more core DC’s, as such key to partition >>>> mappings for >>>>> > consumers get even more complicated. >>>>> > What do you do if we add or remove a complete region >>>>> > >>>>> > Where as simple mirror will work we just need to ensure we don’t >>>> have a >>>>> > cycle which we can do with clusterId. >>>>> > >>>>> > We even have started to look at shortest path mirror routing based >>>> on >>>>> > clusterId, if we also had the region and az info on the originating >>>>> > message, this we have not implemented but some ideas come from >>>> network >>>>> > routing, and also the dispatcher router in apache qpid. >>>>> > >>>>> > Also we need to have data perimeters e.g. certain data cannot leave >>>>> > certain countries borders. We want this all automated so that at >>>> the >>>>> > platform level without having to touch or look at the business >>>> data inside >>>>> > we can have headers we can put tags into so that we can ensure >>>> this doesn’t >>>>> > occur when we mirror. (actually links in to data lineage / tracing >>>> as again >>>>> > we need to tag messages at a platform level) Examples are we are >>>> not >>>>> > allowed Private customer details to leave Switzerland, yet we need >>>> those >>>>> > systems integrated. >>>>> > >>>>> > Lastly around mirroring we have a partionKey field, as the key >>>> used for >>>>> > portioning logic != compaction key all the time but we want to >>>> preserve it >>>>> > for when we mirror so that if source cluster partition count != >>>> destination >>>>> > cluster partition count we can honour the same partitioning logic. >>>>> > >>>>> > >>>>> > >>>>> > RE End 2 End encryption >>>>> > >>>>> > As I believe mentioned just before, the solution you mention just >>>> doesn’t >>>>> > cut the mustard these days with many regulators. An operations >>>> person with >>>>> > access to the box should not be able to have access to the data. >>>> Many now >>>>> > actually impose quite literally the implementation expected being >>>> end2end >>>>> > encryption for certain data (Singapore for us is one that I am >>>> most aware >>>>> > of). In fact we’re even now needing encrypt the data and store the >>>> keys in >>>>> > HSM modules. >>>>> > >>>>> > Likewise the performance penalty on encrypting decrypting as you >>>> produce >>>>> > over wire, then again encrypt decrypt as the data is stored on the >>>> brokers >>>>> > disks and back again, then again encrypted and decrypted back over >>>> the wire >>>>> > each time for each consumer all adds up, ignoring this doubling >>>> with mirror >>>>> > makers etc. simply encrypting the value once on write by the >>>> client and >>>>> > again decrypting on consume by the consumer is far more >>>> performant, but >>>>> > then the routing and platform meta data needs to be separate (thus >>>> headers) >>>>> > >>>>> > >>>>> > >>>>> > RE Auditing: >>>>> > >>>>> > Our Auditing needs are: >>>>> > Producer Id, >>>>> > Origin Cluster Id that message first produced into >>>>> > Origin AZ – agreed we can derive this if we have cluster id, but >>>> it makes >>>>> > resolving this for audit reporting a lot easier. >>>>> > Origin Region – agreed we can derive this if we have cluster id, >>>> but it >>>>> > makes resolving this for audit reporting a lot easier. >>>>> > Unique Message Identification (this is not the same as transaction >>>>> > tracing) – note offset and partition are not the same, as when we >>>> mirror or >>>>> > have for what ever system failure duplicate send, >>>>> > Custom Client wrapper version (where organizations have to wrap >>>> the kafka >>>>> > client for added features) so we know what version of the wrapper >>>> is used >>>>> > Producer IP address (in case of clients being in our vm/open stack >>>> infra >>>>> > where they can move around, producer id will stay the same but >>>> this would >>>>> > change) >>>>> > >>>>> > >>>>> > >>>>> > RE Once and only once delivery case >>>>> > >>>>> > Using the same Message UUID for auditing we can achieve this quite >>>> simply. >>>>> > >>>>> > As per how some other brokers do this (cough qpid, artemis) >>>> message uuid >>>>> > are used to dedupe where message is sent and produced but the >>>> client didn’t >>>>> > receive the ack, and there for replays the send, by having a >>>> unique message >>>>> > id per message, this can be filtered out, on consumers where >>>> message >>>>> > delivery may occur twice for what ever reasons a message uuid can >>>> be used >>>>> > to remove duplicates being deliverd , like wise we can do this in >>>> the >>>>> > mirrormakers so if we detect a dupe message we can avoid >>>> replicating it. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > Cheers >>>>> > Mike >>>>> > >>>>> > >>>>> > >>>>> > On 02/12/2016, 22:09, "Jun Rao" <j...@confluent.io> wrote: >>>>> > >>>>> > Since this KIP affects message format, wire protocol, apis, I >>>> think >>>>> > it's >>>>> > worth spending a bit more time to nail down the concrete use >>>> cases. It >>>>> > would be bad if we add this feature, but when start >>>> implementing it >>>>> > for say >>>>> > mirroring, we then realize that header is not the best >>>> approach. >>>>> > Initially, >>>>> > I thought I was convinced of the use cases of headers and was >>>> trying to >>>>> > write down a few use cases to convince others. That's when I >>>> became >>>>> > less >>>>> > certain. For me to be convinced, I just want to see two strong >>>> use >>>>> > cases >>>>> > (instead of 10 maybe use cases) in the third-party space. The >>>> reason is >>>>> > that when we discussed the use cases within a company, often >>>> it ends >>>>> > with >>>>> > "we can't force everyone to use this standard since we may >>>> have to >>>>> > integrate with third-party tools". >>>>> > >>>>> > At present, I am not sure why headers are useful for things >>>> like >>>>> > schemaId >>>>> > or encryption. In order to do anything useful to the value, >>>> one needs >>>>> > to >>>>> > know the schemaId or how data is encrypted, but header is >>>> optional. >>>>> > But, I >>>>> > can be convinced if someone (Radai, Sean, Todd?) provides more >>>> details >>>>> > on >>>>> > the argument. >>>>> > >>>>> > I am not very sure header is the best approach for mirroring >>>> either. If >>>>> > someone has thought about this more, I'd be happy to hear. >>>>> > >>>>> > I can see the data lineage use case. I am just not sure how >>>> widely >>>>> > applicable this is. If someone familiar with this space can >>>> justify >>>>> > this is >>>>> > a significant use case, say in the finance industry, this >>>> would be a >>>>> > strong >>>>> > use case. >>>>> > >>>>> > I can see the auditing use case. I am just not sure if a native >>>>> > producer id >>>>> > solves that problem. If there are additional metadata that's >>>> worth >>>>> > collecting but not covered by the producer id, that would make >>>> this a >>>>> > strong use case. >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Jun >>>>> > >>>>> > >>>>> > On Fri, Dec 2, 2016 at 1:41 PM, radai < >>>> radai.rosenbl...@gmail.com> >>>>> > wrote: >>>>> > >>>>> > > this KIP is about enabling headers, nothing more nothing >>>> less - so >>>>> > no, >>>>> > > broker-side use of headers is not in the KIP scope. >>>>> > > >>>>> > > obviously though, once you have headers potential use cases >>>> could >>>>> > include >>>>> > > broker-side header-aware interceptors (which would be the >>>> topic of >>>>> > other >>>>> > > future KIPs). >>>>> > > >>>>> > > a trivially clear use case (to me) would be using such >>>> broker-side >>>>> > > interceptors to enforce compliance with organizational >>>> policies - it >>>>> > would >>>>> > > make our SREs lives much easier if instead of retroactively >>>>> > discovering >>>>> > > "rogue" topics/users those messages would have been rejected >>>>> > up-front. >>>>> > > >>>>> > > the kafka broker code is lacking any such extensibility >>>> support >>>>> > (beyond >>>>> > > maybe authorizer) which is why these use cases were left out >>>> of the >>>>> > "case >>>>> > > for headers" doc - broker extensibility is a separate >>>> discussion. >>>>> > > >>>>> > > On Fri, Dec 2, 2016 at 12:59 PM, Gwen Shapira < >>>> g...@confluent.io> >>>>> > wrote: >>>>> > > >>>>> > > > Woah, I wasn't aware this is something we'll do. It wasn't >>>> in the >>>>> > KIP, >>>>> > > > right? >>>>> > > > >>>>> > > > I guess we could do it the same way ACLs currently work. >>>>> > > > I had in mind something that will allow admins to apply >>>> rules to >>>>> > the >>>>> > > > new create/delete/config topic APIs. So Todd can decide to >>>> reject >>>>> > > > "create topic" requests that ask for more than 40 >>>> partitions, or >>>>> > > > require exactly 3 replicas, or no more than 50GB partition >>>> size, >>>>> > etc. >>>>> > > > >>>>> > > > ACLs were added a bit ad-hoc, if we are planning to apply >>>> more >>>>> > rules >>>>> > > > to requests (and I think we should), we may want a bit >>>> more generic >>>>> > > > design around that. >>>>> > > > >>>>> > > > On Fri, Dec 2, 2016 at 7:16 AM, radai < >>>> radai.rosenbl...@gmail.com> >>>>> > > wrote: >>>>> > > > > "wouldn't you be in the business of making sure everyone >>>> uses >>>>> > them >>>>> > > > > properly?" >>>>> > > > > >>>>> > > > > thats where a broker-side plugin would come handy - any >>>> incoming >>>>> > > message >>>>> > > > > that does not conform to org policy (read - does not >>>> have the >>>>> > proper >>>>> > > > > headers) gets thrown out (with an error returned to user) >>>>> > > > > >>>>> > > > > On Thu, Dec 1, 2016 at 8:44 PM, Todd Palino < >>>> tpal...@gmail.com> >>>>> > wrote: >>>>> > > > > >>>>> > > > >> Come on, I’ve done at least 2 talks on this one :) >>>>> > > > >> >>>>> > > > >> Producing counts to a topic is part of it, but that’s >>>> only >>>>> > part. So >>>>> > > you >>>>> > > > >> count you have 100 messages in topic A. When you mirror >>>> topic A >>>>> > to >>>>> > > > another >>>>> > > > >> cluster, you have 99 messages. Where was your problem? >>>> Or >>>>> > worse, you >>>>> > > > have >>>>> > > > >> 100 messages, but one producer duplicated messages and >>>> another >>>>> > one >>>>> > > lost >>>>> > > > >> messages. You need details about where the message came >>>> from in >>>>> > order >>>>> > > to >>>>> > > > >> pinpoint problems when they happen. Source producer >>>> info, where >>>>> > it was >>>>> > > > >> produced into your infrastructure, and when it was >>>> produced. >>>>> > This >>>>> > > > requires >>>>> > > > >> you to add the information to the message. >>>>> > > > >> >>>>> > > > >> And yes, you still need to maintain your clients. So >>>> maybe my >>>>> > original >>>>> > > > >> example was not the best. My thoughts on not wanting to >>>> be >>>>> > responsible >>>>> > > > for >>>>> > > > >> message formats stands, because that’s very much >>>> separate from >>>>> > the >>>>> > > > client. >>>>> > > > >> As you know, we have our own internal client library >>>> that can >>>>> > insert >>>>> > > the >>>>> > > > >> right headers, and right now inserts the right audit >>>>> > information into >>>>> > > > the >>>>> > > > >> message fields. If they exist, and assuming the message >>>> is Avro >>>>> > > encoded. >>>>> > > > >> What if someone wants to use JSON instead for a good >>>> reason? >>>>> > What if >>>>> > > > user X >>>>> > > > >> wants to encrypt messages, but user Y does not? >>>> Maintaining the >>>>> > client >>>>> > > > >> library is still much easier than maintaining the >>>> message >>>>> > formats. >>>>> > > > >> >>>>> > > > >> -Todd >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira < >>>> g...@confluent.io >>>>> > > >>>>> > > wrote: >>>>> > > > >> >>>>> > > > >> > Based on your last sentence, consider me convinced :) >>>>> > > > >> > >>>>> > > > >> > I get why headers are critical for Mirroring (you >>>> need tags to >>>>> > > prevent >>>>> > > > >> > loops and sometimes to route messages to the correct >>>>> > destination). >>>>> > > > >> > But why do you need headers to audit? We are auditing >>>> by >>>>> > producing >>>>> > > > >> > counts to a side topic (and I was under the >>>> impression you do >>>>> > the >>>>> > > > >> > same), so we never need to modify the message. >>>>> > > > >> > >>>>> > > > >> > Another thing - after we added headers, wouldn't you >>>> be in the >>>>> > > > >> > business of making sure everyone uses them properly? >>>> Making >>>>> > sure >>>>> > > > >> > everyone includes the right headers you need, not >>>> using the >>>>> > header >>>>> > > > >> > names you intend to use, etc. I don't think the >>>> "policing" >>>>> > business >>>>> > > > >> > will ever go away. >>>>> > > > >> > >>>>> > > > >> > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino < >>>>> > tpal...@gmail.com> >>>>> > > > wrote: >>>>> > > > >> > > Got it. As an ops guy, I'm not very happy with the >>>>> > workaround. >>>>> > > Avro >>>>> > > > >> means >>>>> > > > >> > > that I have to be concerned with the format of the >>>> messages >>>>> > in >>>>> > > > order to >>>>> > > > >> > run >>>>> > > > >> > > the infrastructure (audit, mirroring, etc.). That >>>> means >>>>> > that I >>>>> > > have >>>>> > > > to >>>>> > > > >> > > handle the schemas, and I have to enforce rules >>>> about good >>>>> > > formats. >>>>> > > > >> This >>>>> > > > >> > is >>>>> > > > >> > > not something I want to be in the business of, >>>> because I >>>>> > should be >>>>> > > > able >>>>> > > > >> > to >>>>> > > > >> > > run a service infrastructure without needing to be >>>> in the >>>>> > weeds of >>>>> > > > >> > dealing >>>>> > > > >> > > with customer data formats. >>>>> > > > >> > > >>>>> > > > >> > > Trust me, a sizable portion of my support time is >>>> spent >>>>> > dealing >>>>> > > with >>>>> > > > >> > schema >>>>> > > > >> > > issues. I really would like to get away from that. >>>> Maybe >>>>> > I'd have >>>>> > > > more >>>>> > > > >> > time >>>>> > > > >> > > for other hobbies. Like writing. ;) >>>>> > > > >> > > >>>>> > > > >> > > -Todd >>>>> > > > >> > > >>>>> > > > >> > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira < >>>>> > g...@confluent.io> >>>>> > > > wrote: >>>>> > > > >> > > >>>>> > > > >> > >> I'm pretty satisfied with the current workarounds >>>> (Avro >>>>> > container >>>>> > > > >> > >> format), so I'm not too excited about the extra >>>> work >>>>> > required to >>>>> > > do >>>>> > > > >> > >> headers in Kafka. I absolutely don't mind it if >>>> you do >>>>> > it... >>>>> > > > >> > >> I think the Apache convention for "good idea, but >>>> not >>>>> > willing to >>>>> > > > put >>>>> > > > >> > >> any work toward it" is +0.5? anyway, that's what I >>>> was >>>>> > trying to >>>>> > > > >> > >> convey :) >>>>> > > > >> > >> >>>>> > > > >> > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino < >>>>> > tpal...@gmail.com> >>>>> > > > >> wrote: >>>>> > > > >> > >> > Well I guess my question for you, then, is what >>>> is >>>>> > holding you >>>>> > > > back >>>>> > > > >> > from >>>>> > > > >> > >> > full support for headers? What’s the bit that >>>> you’re >>>>> > missing >>>>> > > that >>>>> > > > >> has >>>>> > > > >> > you >>>>> > > > >> > >> > under a full +1? >>>>> > > > >> > >> > >>>>> > > > >> > >> > -Todd >>>>> > > > >> > >> > >>>>> > > > >> > >> > >>>>> > > > >> > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira < >>>>> > > g...@confluent.io> >>>>> > > > >> > wrote: >>>>> > > > >> > >> > >>>>> > > > >> > >> >> I know why people who support headers support >>>> them, and >>>>> > I've >>>>> > > > seen >>>>> > > > >> > what >>>>> > > > >> > >> >> the discussion is like. >>>>> > > > >> > >> >> >>>>> > > > >> > >> >> This is why I'm asking people who are against >>>> headers >>>>> > > > (especially >>>>> > > > >> > >> >> committers) what will make them change their >>>> mind - so >>>>> > we can >>>>> > > > get >>>>> > > > >> > this >>>>> > > > >> > >> >> part over one way or another. >>>>> > > > >> > >> >> >>>>> > > > >> > >> >> If I sound frustrated it is not at Radai, Jun >>>> or you >>>>> > (Todd)... >>>>> > > > I am >>>>> > > > >> > >> >> just looking for something concrete we can do >>>> to move >>>>> > the >>>>> > > > >> discussion >>>>> > > > >> > >> >> along to the yummy design details (which is the >>>>> > argument I >>>>> > > > really >>>>> > > > >> am >>>>> > > > >> > >> >> looking forward to). >>>>> > > > >> > >> >> >>>>> > > > >> > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino < >>>>> > > tpal...@gmail.com> >>>>> > > > >> > wrote: >>>>> > > > >> > >> >> > So, Gwen, to your question (even though I’m >>>> not a >>>>> > > > committer)... >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > I have always been a strong supporter of >>>> introducing >>>>> > the >>>>> > > > concept >>>>> > > > >> > of an >>>>> > > > >> > >> >> > envelope to messages, which headers >>>> accomplishes. The >>>>> > > message >>>>> > > > key >>>>> > > > >> > is >>>>> > > > >> > >> >> > already an example of a piece of envelope >>>>> > information. By >>>>> > > > >> > providing a >>>>> > > > >> > >> >> means >>>>> > > > >> > >> >> > to do this within Kafka itself, and not >>>> relying on >>>>> > use-case >>>>> > > > >> > specific >>>>> > > > >> > >> >> > implementations, you make it much easier for >>>>> > components to >>>>> > > > >> > >> interoperate. >>>>> > > > >> > >> >> It >>>>> > > > >> > >> >> > simplifies development of all these things >>>> (message >>>>> > routing, >>>>> > > > >> > auditing, >>>>> > > > >> > >> >> > encryption, etc.) because each one does not >>>> have to >>>>> > reinvent >>>>> > > > the >>>>> > > > >> > >> wheel. >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > It also makes it much easier from a client >>>> point of >>>>> > view if >>>>> > > > the >>>>> > > > >> > >> headers >>>>> > > > >> > >> >> are >>>>> > > > >> > >> >> > defined as part of the protocol and/or >>>> message format >>>>> > in >>>>> > > > general >>>>> > > > >> > >> because >>>>> > > > >> > >> >> > you can easily produce and consume messages >>>> without >>>>> > having >>>>> > > to >>>>> > > > >> take >>>>> > > > >> > >> into >>>>> > > > >> > >> >> > account specific cases. For example, I want >>>> to route >>>>> > > messages, >>>>> > > > >> but >>>>> > > > >> > >> >> client A >>>>> > > > >> > >> >> > doesn’t support the way audit implemented >>>> headers, and >>>>> > > client >>>>> > > > B >>>>> > > > >> > >> doesn’t >>>>> > > > >> > >> >> > support the way encryption or routing >>>> implemented >>>>> > headers, >>>>> > > so >>>>> > > > now >>>>> > > > >> > my >>>>> > > > >> > >> >> > application has to create some really fragile >>>> (my >>>>> > > autocorrect >>>>> > > > >> just >>>>> > > > >> > >> tried >>>>> > > > >> > >> >> to >>>>> > > > >> > >> >> > make that “tragic”, which is probably >>>> appropriate >>>>> > too) code >>>>> > > to >>>>> > > > >> > strip >>>>> > > > >> > >> >> > everything off, rather than just consuming the >>>>> > messages, >>>>> > > > picking >>>>> > > > >> > out >>>>> > > > >> > >> the >>>>> > > > >> > >> >> 1 >>>>> > > > >> > >> >> > or 2 headers it’s interested in, and >>>> performing its >>>>> > > function. >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > Honestly, this discussion has been going on >>>> for a >>>>> > long time, >>>>> > > > and >>>>> > > > >> > it’s >>>>> > > > >> > >> >> > always “Oh, you came up with 2 use cases, and >>>> yeah, >>>>> > those >>>>> > > use >>>>> > > > >> cases >>>>> > > > >> > >> are >>>>> > > > >> > >> >> > real things that someone would want to do. >>>> Here’s an >>>>> > > alternate >>>>> > > > >> way >>>>> > > > >> > to >>>>> > > > >> > >> >> > implement them so let’s not do headers.” If >>>> we have a >>>>> > few >>>>> > > use >>>>> > > > >> cases >>>>> > > > >> > >> that >>>>> > > > >> > >> >> we >>>>> > > > >> > >> >> > actually came up with, you can be sure that >>>> over the >>>>> > next >>>>> > > year >>>>> > > > >> > >> there’s a >>>>> > > > >> > >> >> > dozen others that we didn’t think of that >>>> someone >>>>> > would like >>>>> > > > to >>>>> > > > >> > do. I >>>>> > > > >> > >> >> > really think it’s time to stop rehashing this >>>>> > discussion and >>>>> > > > >> > instead >>>>> > > > >> > >> >> focus >>>>> > > > >> > >> >> > on a workable standard that we can adopt. >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > -Todd >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino < >>>>> > > > tpal...@gmail.com> >>>>> > > > >> > >> wrote: >>>>> > > > >> > >> >> > >>>>> > > > >> > >> >> >> C. per message encryption >>>>> > > > >> > >> >> >>> One drawback of this approach is that this >>>>> > significantly >>>>> > > > reduce >>>>> > > > >> > the >>>>> > > > >> > >> >> >>> effectiveness of compression, which happens >>>> on a >>>>> > set of >>>>> > > > >> > serialized >>>>> > > > >> > >> >> >>> messages. An alternative is to enable SSL >>>> for wire >>>>> > > > encryption >>>>> > > > >> and >>>>> > > > >> > >> rely >>>>> > > > >> > >> >> on >>>>> > > > >> > >> >> >>> the storage system (e.g. LUKS) for at rest >>>>> > encryption. >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >> Jun, this is not sufficient. While this does >>>> cover >>>>> > the case >>>>> > > > of >>>>> > > > >> > >> removing >>>>> > > > >> > >> >> a >>>>> > > > >> > >> >> >> drive from the system, it will not satisfy >>>> most >>>>> > compliance >>>>> > > > >> > >> requirements >>>>> > > > >> > >> >> for >>>>> > > > >> > >> >> >> encryption of data as whoever has access to >>>> the >>>>> > broker >>>>> > > itself >>>>> > > > >> > still >>>>> > > > >> > >> has >>>>> > > > >> > >> >> >> access to the unencrypted data. For >>>> end-to-end >>>>> > encryption >>>>> > > you >>>>> > > > >> > need to >>>>> > > > >> > >> >> >> encrypt at the producer, before it enters the >>>>> > system, and >>>>> > > > >> decrypt >>>>> > > > >> > at >>>>> > > > >> > >> the >>>>> > > > >> > >> >> >> consumer, after it exits the system. >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >> -Todd >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai < >>>>> > > > >> radai.rosenbl...@gmail.com >>>>> > > > >> > > >>>>> > > > >> > >> >> wrote: >>>>> > > > >> > >> >> >> >>>>> > > > >> > >> >> >>> another big plus of headers in the protocol >>>> is that >>>>> > it >>>>> > > would >>>>> > > > >> > enable >>>>> > > > >> > >> >> rapid >>>>> > > > >> > >> >> >>> iteration on ideas outside of core kafka >>>> and would >>>>> > reduce >>>>> > > > the >>>>> > > > >> > >> number of >>>>> > > > >> > >> >> >>> future wire format changes required. >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> a lot of what is currently a KIP represents >>>> use >>>>> > cases that >>>>> > > > are >>>>> > > > >> > not >>>>> > > > >> > >> 100% >>>>> > > > >> > >> >> >>> relevant to all users, and some of them >>>> require >>>>> > rather >>>>> > > > invasive >>>>> > > > >> > wire >>>>> > > > >> > >> >> >>> protocol changes. a thing a good recent >>>> example of >>>>> > this is >>>>> > > > >> > kip-98. >>>>> > > > >> > >> >> >>> tx-utilizing traffic is expected to be a >>>> very small >>>>> > > > fraction of >>>>> > > > >> > >> total >>>>> > > > >> > >> >> >>> traffic and yet the changes are invasive. >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> every such wire format change translates >>>> into >>>>> > painful and >>>>> > > > slow >>>>> > > > >> > >> >> adoption of >>>>> > > > >> > >> >> >>> new versions. >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> i think a lot of functionality currently in >>>> KIPs >>>>> > could be >>>>> > > > "spun >>>>> > > > >> > out" >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> implemented as opt-in plugins transmitting >>>> data over >>>>> > > > headers. >>>>> > > > >> > this >>>>> > > > >> > >> >> would >>>>> > > > >> > >> >> >>> keep the core wire format stable(r), core >>>> codebase >>>>> > > smaller, >>>>> > > > and >>>>> > > > >> > >> avoid >>>>> > > > >> > >> >> the >>>>> > > > >> > >> >> >>> "burden of proof" thats sometimes required >>>> to prove >>>>> > a >>>>> > > > certain >>>>> > > > >> > >> feature >>>>> > > > >> > >> >> is >>>>> > > > >> > >> >> >>> useful enough for a wide-enough audience to >>>> warrant >>>>> > a wire >>>>> > > > >> format >>>>> > > > >> > >> >> change >>>>> > > > >> > >> >> >>> and code complexity additions. >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> (to be clear - kip-98 goes beyond "mere" >>>> wire format >>>>> > > changes >>>>> > > > >> and >>>>> > > > >> > im >>>>> > > > >> > >> not >>>>> > > > >> > >> >> >>> saying it could have been completely done >>>> with >>>>> > headers, >>>>> > > but >>>>> > > > >> > >> >> exactly-once >>>>> > > > >> > >> >> >>> delivery certainly could) >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen >>>> Shapira < >>>>> > > > >> g...@confluent.io >>>>> > > > >> > > >>>>> > > > >> > >> >> wrote: >>>>> > > > >> > >> >> >>> >>>>> > > > >> > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai < >>>>> > > > >> > >> radai.rosenbl...@gmail.com> >>>>> > > > >> > >> >> >>> wrote: >>>>> > > > >> > >> >> >>> > > "For use cases within an organization, >>>> one could >>>>> > > always >>>>> > > > use >>>>> > > > >> > >> other >>>>> > > > >> > >> >> >>> > > approaches such as company-wise >>>> containers" >>>>> > > > >> > >> >> >>> > > this is what linkedin has traditionally >>>> done >>>>> > but there >>>>> > > > are >>>>> > > > >> > now >>>>> > > > >> > >> >> cases >>>>> > > > >> > >> >> >>> > (read >>>>> > > > >> > >> >> >>> > > - topics) where this is not acceptable. >>>> this >>>>> > makes >>>>> > > > headers >>>>> > > > >> > >> useful >>>>> > > > >> > >> >> even >>>>> > > > >> > >> >> >>> > > within single orgs for cases where >>>>> > > > one-container-fits-all >>>>> > > > >> > cannot >>>>> > > > >> > >> >> >>> apply. >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > > as for the particular use cases listed, >>>> i dont >>>>> > want >>>>> > > > this to >>>>> > > > >> > >> devolve >>>>> > > > >> > >> >> >>> to a >>>>> > > > >> > >> >> >>> > > discussion of particular use cases - i >>>> think its >>>>> > > enough >>>>> > > > >> that >>>>> > > > >> > >> some >>>>> > > > >> > >> >> of >>>>> > > > >> > >> >> >>> them >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > I think a main point of contention is >>>> that: We >>>>> > > identified >>>>> > > > few >>>>> > > > >> > >> >> >>> > use-cases where headers are useful, do we >>>> want >>>>> > Kafka to >>>>> > > > be a >>>>> > > > >> > >> system >>>>> > > > >> > >> >> >>> > that supports those use-cases? >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > For example, Jun said: >>>>> > > > >> > >> >> >>> > "Not sure how widely useful record-level >>>> lineage >>>>> > is >>>>> > > though >>>>> > > > >> > since >>>>> > > > >> > >> the >>>>> > > > >> > >> >> >>> > overhead could >>>>> > > > >> > >> >> >>> > be significant." >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > We know NiFi supports record level >>>> lineage. I >>>>> > don't >>>>> > > think >>>>> > > > it >>>>> > > > >> > was >>>>> > > > >> > >> >> >>> > developed for lols, I think it is safe to >>>> assume >>>>> > that >>>>> > > the >>>>> > > > NSA >>>>> > > > >> > >> needed >>>>> > > > >> > >> >> >>> > that functionality. We also know that >>>> certain >>>>> > financial >>>>> > > > >> > institutes >>>>> > > > >> > >> >> >>> > need to track tampering with records at a >>>> record >>>>> > level >>>>> > > and >>>>> > > > >> > there >>>>> > > > >> > >> are >>>>> > > > >> > >> >> >>> > federal regulations that absolutely >>>> require >>>>> > this. They >>>>> > > > also >>>>> > > > >> > need >>>>> > > > >> > >> to >>>>> > > > >> > >> >> >>> > prove that routing apps that "touches" the >>>>> > messages and >>>>> > > > >> either >>>>> > > > >> > >> reads >>>>> > > > >> > >> >> >>> > or updates headers couldn't have possibly >>>>> > modified the >>>>> > > > >> payload >>>>> > > > >> > >> >> itself. >>>>> > > > >> > >> >> >>> > They use record level encryption to do >>>> that - >>>>> > apps can >>>>> > > > read >>>>> > > > >> and >>>>> > > > >> > >> >> >>> > (sometimes) modify headers but can't >>>> touch the >>>>> > payload. >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > We can totally say "those are corner >>>> cases and >>>>> > not worth >>>>> > > > >> adding >>>>> > > > >> > >> >> >>> > headers to Kafka for", they should use a >>>> different >>>>> > > pubsub >>>>> > > > >> > message >>>>> > > > >> > >> for >>>>> > > > >> > >> >> >>> > that (Nifi or one of the other 1000 that >>>> cater >>>>> > > > specifically >>>>> > > > >> to >>>>> > > > >> > the >>>>> > > > >> > >> >> >>> > financial industry). >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > But this gets us into a catch 22: >>>>> > > > >> > >> >> >>> > If we discuss a specific use-case, >>>> someone can >>>>> > always >>>>> > > say >>>>> > > > it >>>>> > > > >> > isn't >>>>> > > > >> > >> >> >>> > interesting enough for Kafka. If we >>>> discuss more >>>>> > general >>>>> > > > >> > trends, >>>>> > > > >> > >> >> >>> > others can say "well, we are not sure any >>>> of them >>>>> > really >>>>> > > > >> needs >>>>> > > > >> > >> >> headers >>>>> > > > >> > >> >> >>> > specifically. This is just hand waving >>>> and not >>>>> > > > interesting.". >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > I think discussing use-cases in specifics >>>> is super >>>>> > > > important >>>>> > > > >> to >>>>> > > > >> > >> >> decide >>>>> > > > >> > >> >> >>> > implementation details for headers (my >>>> use-cases >>>>> > lean >>>>> > > > toward >>>>> > > > >> > >> >> numerical >>>>> > > > >> > >> >> >>> > keys with namespaces and object values, >>>> others >>>>> > differ), >>>>> > > > but I >>>>> > > > >> > >> think >>>>> > > > >> > >> >> we >>>>> > > > >> > >> >> >>> > need to answer the general "Are we going >>>> to have >>>>> > > headers" >>>>> > > > >> > question >>>>> > > > >> > >> >> >>> > first. >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > I'd love to hear from the other >>>> committers in the >>>>> > > > discussion: >>>>> > > > >> > >> >> >>> > What would it take to convince you that >>>> headers >>>>> > in Kafka >>>>> > > > are >>>>> > > > >> a >>>>> > > > >> > >> good >>>>> > > > >> > >> >> >>> > idea in general, so we can move ahead and >>>> try to >>>>> > agree >>>>> > > on >>>>> > > > the >>>>> > > > >> > >> >> details? >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > I feel like we keep moving the goal posts >>>> and >>>>> > this is >>>>> > > > truly >>>>> > > > >> > >> >> exhausting. >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > For the record, I mildly support adding >>>> headers >>>>> > to Kafka >>>>> > > > >> > (+0.5?). >>>>> > > > >> > >> >> >>> > The community can continue to find >>>> workarounds to >>>>> > the >>>>> > > > issue >>>>> > > > >> and >>>>> > > > >> > >> there >>>>> > > > >> > >> >> >>> > are some benefits to keeping the message >>>> format >>>>> > and >>>>> > > > clients >>>>> > > > >> > >> simpler. >>>>> > > > >> > >> >> >>> > But I see the usefulness of headers to >>>> many >>>>> > use-cases >>>>> > > and >>>>> > > > if >>>>> > > > >> we >>>>> > > > >> > >> can >>>>> > > > >> > >> >> >>> > find a good and generally useful way to >>>> add it to >>>>> > Kafka, >>>>> > > > it >>>>> > > > >> > will >>>>> > > > >> > >> make >>>>> > > > >> > >> >> >>> > Kafka easier to use for many - worthy >>>> goal in my >>>>> > eyes. >>>>> > > > >> > >> >> >>> > >>>>> > > > >> > >> >> >>> > > are interesting/feasible, but: >>>>> > > > >> > >> >> >>> > > A+B. i think there are use cases for >>>> polyglot >>>>> > topics. >>>>> > > > >> > >> especially if >>>>> > > > >> > >> >> >>> kafka >>>>> > > > >> > >> >> >>> > > is being used to "trunk" something else. >>>>> > > > >> > >> >> >>> > > D. multiple topics would make it harder >>>> to write >>>>> > > > portable >>>>> > > > >> > >> consumer >>>>> > > > >> > >> >> >>> code. >>>>> > > > >> > >> >> >>> > > partition remapping would mess with >>>> locality of >>>>> > > > consumption >>>>> > > > >> > >> >> >>> guarantees. >>>>> > > > >> > >> >> >>> > > E+F. a use case I see for >>>> lineage/metadata is >>>>> > > > >> > >> billing/chargeback. >>>>> > > > >> > >> >> for >>>>> > > > >> > >> >> >>> > that >>>>> > > > >> > >> >> >>> > > use case it is not enough to simply >>>> record the >>>>> > point >>>>> > > of >>>>> > > > >> > origin, >>>>> > > > >> > >> but >>>>> > > > >> > >> >> >>> every >>>>> > > > >> > >> >> >>> > > replication stop (think mirror maker) >>>> must also >>>>> > add a >>>>> > > > >> record >>>>> > > > >> > to >>>>> > > > >> > >> >> form a >>>>> > > > >> > >> >> >>> > > "transit log". >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > > as for stream processing on top of >>>> kafka - i >>>>> > know >>>>> > > samza >>>>> > > > >> has a >>>>> > > > >> > >> >> metadata >>>>> > > > >> > >> >> >>> > map >>>>> > > > >> > >> >> >>> > > which they carry around in addition to >>>> user >>>>> > values. >>>>> > > > headers >>>>> > > > >> > are >>>>> > > > >> > >> the >>>>> > > > >> > >> >> >>> > perfect >>>>> > > > >> > >> >> >>> > > fit for these things. >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun >>>> Rao < >>>>> > > > j...@confluent.io >>>>> > > > >> > >>>>> > > > >> > >> wrote: >>>>> > > > >> > >> >> >>> > > >>>>> > > > >> > >> >> >>> > >> Hi, Michael, >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> In order to answer the first two >>>> questions, it >>>>> > would >>>>> > > be >>>>> > > > >> > helpful >>>>> > > > >> > >> >> if we >>>>> > > > >> > >> >> >>> > could >>>>> > > > >> > >> >> >>> > >> identify 1 or 2 strong use cases for >>>> headers >>>>> > in the >>>>> > > > space >>>>> > > > >> > for >>>>> > > > >> > >> >> >>> > third-party >>>>> > > > >> > >> >> >>> > >> vendors. For use cases within an >>>> organization, >>>>> > one >>>>> > > > could >>>>> > > > >> > always >>>>> > > > >> > >> >> use >>>>> > > > >> > >> >> >>> > other >>>>> > > > >> > >> >> >>> > >> approaches such as company-wise >>>> containers to >>>>> > get >>>>> > > > around >>>>> > > > >> w/o >>>>> > > > >> > >> >> >>> headers. I >>>>> > > > >> > >> >> >>> > >> went through the use cases in the KIP >>>> and in >>>>> > Radai's >>>>> > > > wiki >>>>> > > > >> ( >>>>> > > > >> > >> >> >>> > >> https://cwiki.apache.org/confl >>>>> > uence/display/KAFKA/A+ >>>>> > > > >> > >> >> >>> > Case+for+Kafka+Headers >>>>> > > > >> > >> >> >>> > >> ). >>>>> > > > >> > >> >> >>> > >> The following are the ones that that I >>>>> > understand and >>>>> > > > >> could >>>>> > > > >> > be >>>>> > > > >> > >> in >>>>> > > > >> > >> >> the >>>>> > > > >> > >> >> >>> > >> third-party use case category. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> A. content-type >>>>> > > > >> > >> >> >>> > >> It seems that in general, content-type >>>> should >>>>> > be set >>>>> > > at >>>>> > > > >> the >>>>> > > > >> > >> topic >>>>> > > > >> > >> >> >>> level. >>>>> > > > >> > >> >> >>> > >> Not sure if mixing messages with >>>> different >>>>> > content >>>>> > > > types >>>>> > > > >> > >> should be >>>>> > > > >> > >> >> >>> > >> encouraged. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> B. schema id >>>>> > > > >> > >> >> >>> > >> Since the value is mostly useless >>>> without >>>>> > schema id, >>>>> > > it >>>>> > > > >> > seems >>>>> > > > >> > >> that >>>>> > > > >> > >> >> >>> > storing >>>>> > > > >> > >> >> >>> > >> the schema id together with serialized >>>> bytes >>>>> > in the >>>>> > > > value >>>>> > > > >> is >>>>> > > > >> > >> >> better? >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> C. per message encryption >>>>> > > > >> > >> >> >>> > >> One drawback of this approach is that >>>> this >>>>> > > > significantly >>>>> > > > >> > reduce >>>>> > > > >> > >> >> the >>>>> > > > >> > >> >> >>> > >> effectiveness of compression, which >>>> happens on >>>>> > a set >>>>> > > of >>>>> > > > >> > >> serialized >>>>> > > > >> > >> >> >>> > >> messages. An alternative is to enable >>>> SSL for >>>>> > wire >>>>> > > > >> > encryption >>>>> > > > >> > >> and >>>>> > > > >> > >> >> >>> rely >>>>> > > > >> > >> >> >>> > on >>>>> > > > >> > >> >> >>> > >> the storage system (e.g. LUKS) for at >>>> rest >>>>> > > encryption. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> D. cluster ID for mirroring across >>>> Kafka >>>>> > clusters >>>>> > > > >> > >> >> >>> > >> This is actually interesting. Today, >>>> to avoid >>>>> > > > introducing >>>>> > > > >> > >> cycles >>>>> > > > >> > >> >> when >>>>> > > > >> > >> >> >>> > doing >>>>> > > > >> > >> >> >>> > >> mirroring across data centers, one >>>> would >>>>> > either have >>>>> > > to >>>>> > > > >> set >>>>> > > > >> > up >>>>> > > > >> > >> two >>>>> > > > >> > >> >> >>> Kafka >>>>> > > > >> > >> >> >>> > >> clusters (a local and an aggregate) >>>> per data >>>>> > center >>>>> > > or >>>>> > > > >> > rename >>>>> > > > >> > >> >> topics. >>>>> > > > >> > >> >> >>> > >> Neither is ideal. With headers, the >>>> producer >>>>> > could >>>>> > > tag >>>>> > > > >> each >>>>> > > > >> > >> >> message >>>>> > > > >> > >> >> >>> with >>>>> > > > >> > >> >> >>> > >> the producing cluster ID in the header. >>>>> > MirrorMaker >>>>> > > > could >>>>> > > > >> > then >>>>> > > > >> > >> >> avoid >>>>> > > > >> > >> >> >>> > >> mirroring messages to a cluster if >>>> they are >>>>> > tagged >>>>> > > with >>>>> > > > >> the >>>>> > > > >> > >> same >>>>> > > > >> > >> >> >>> cluster >>>>> > > > >> > >> >> >>> > >> id. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> However, an alternative approach is to >>>>> > introduce sth >>>>> > > > like >>>>> > > > >> > >> >> >>> hierarchical >>>>> > > > >> > >> >> >>> > >> topic and store messages from different >>>>> > clusters in >>>>> > > > >> > different >>>>> > > > >> > >> >> >>> partitions >>>>> > > > >> > >> >> >>> > >> under the same topic. This approach >>>> avoids >>>>> > filtering >>>>> > > > out >>>>> > > > >> > >> unneeded >>>>> > > > >> > >> >> >>> data >>>>> > > > >> > >> >> >>> > and >>>>> > > > >> > >> >> >>> > >> makes offset preserving easier to >>>> support. It >>>>> > may >>>>> > > make >>>>> > > > >> > >> compaction >>>>> > > > >> > >> >> >>> > trickier >>>>> > > > >> > >> >> >>> > >> though since the same key may show up >>>> in >>>>> > different >>>>> > > > >> > partitions. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> E. record-level lineage >>>>> > > > >> > >> >> >>> > >> For example, a source connector could >>>> store in >>>>> > the >>>>> > > > message >>>>> > > > >> > the >>>>> > > > >> > >> >> >>> metadata >>>>> > > > >> > >> >> >>> > >> (e.g. UUID) of the source record. >>>> Similarly, >>>>> > if a >>>>> > > > stream >>>>> > > > >> job >>>>> > > > >> > >> >> >>> transforms >>>>> > > > >> > >> >> >>> > >> messages from topic A to topic B, the >>>> library >>>>> > could >>>>> > > > >> include >>>>> > > > >> > the >>>>> > > > >> > >> >> >>> source >>>>> > > > >> > >> >> >>> > >> message offset in each of the >>>> transformed >>>>> > message in >>>>> > > > the >>>>> > > > >> > >> header. >>>>> > > > >> > >> >> Not >>>>> > > > >> > >> >> >>> > sure >>> >>>>> > > > >> > >> >> >>> > >> how widely useful record-level lineage >>>> is >>>>> > though >>>>> > > since >>>>> > > > the >>>>> > > > >> > >> >> overhead >>>>> > > > >> > >> >> >>> > could >>>>> > > > >> > >> >> >>> > >> be significant. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> F. auditing metadata >>>>> > > > >> > >> >> >>> > >> We could put things like >>>> clientId/host/user in >>>>> > the >>>>> > > > header >>>>> > > > >> in >>>>> > > > >> > >> each >>>>> > > > >> > >> >> >>> > message >>>>> > > > >> > >> >> >>> > >> for auditing. These metadata are >>>> really at the >>>>> > > producer >>>>> > > > >> > level >>>>> > > > >> > >> >> though. >>>>> > > > >> > >> >> >>> > So, a >>>>> > > > >> > >> >> >>> > >> more efficient way is to only include a >>>>> > "producerId" >>>>> > > > per >>>>> > > > >> > >> message >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> > send >>>>> > > > >> > >> >> >>> > >> the producerId -> metadata mapping >>>>> > independently. >>>>> > > > KIP-98 >>>>> > > > >> is >>>>> > > > >> > >> >> actually >>>>> > > > >> > >> >> >>> > >> proposing including such a producerId >>>> natively >>>>> > in the >>>>> > > > >> > message. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> So, overall, I not sure that I am fully >>>>> > convinced of >>>>> > > > the >>>>> > > > >> > strong >>>>> > > > >> > >> >> >>> > third-party >>>>> > > > >> > >> >> >>> > >> use cases of headers yet. Perhaps we >>>> could >>>>> > discuss a >>>>> > > > bit >>>>> > > > >> > more >>>>> > > > >> > >> to >>>>> > > > >> > >> >> make >>>>> > > > >> > >> >> >>> > one >>>>> > > > >> > >> >> >>> > >> or two really convincing use cases. >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> Another orthogonal question is >>>> whether header >>>>> > should >>>>> > > > be >>>>> > > > >> > >> exposed >>>>> > > > >> > >> >> in >>>>> > > > >> > >> >> >>> > stream >>>>> > > > >> > >> >> >>> > >> processing systems such Kafka stream, >>>> Samza, >>>>> > and >>>>> > > Spark >>>>> > > > >> > >> streaming. >>>>> > > > >> > >> >> >>> > >> Currently, those systems just deal with >>>>> > key/value >>>>> > > > pairs. >>>>> > > > >> > >> Should we >>>>> > > > >> > >> >> >>> > expose a >>>>> > > > >> > >> >> >>> > >> third thing header there too or >>>> somehow map >>>>> > header to >>>>> > > > key >>>>> > > > >> or >>>>> > > > >> > >> >> value? >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> Thanks, >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> Jun >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, >>>> Michael >>>>> > Pearce < >>>>> > > > >> > >> >> >>> michael.pea...@ig.com> >>>>> > > > >> > >> >> >>> > >> wrote: >>>>> > > > >> > >> >> >>> > >> >>>>> > > > >> > >> >> >>> > >> > I assume, that after a period of a >>>> week, >>>>> > that there >>>>> > > > is >>>>> > > > >> no >>>>> > > > >> > >> >> concerns >>>>> > > > >> > >> >> >>> now >>>>> > > > >> > >> >> >>> > >> > with points 1, and 2 and now we have >>>>> > agreement that >>>>> > > > >> > headers >>>>> > > > >> > >> are >>>>> > > > >> > >> >> >>> useful >>>>> > > > >> > >> >> >>> > >> and >>>>> > > > >> > >> >> >>> > >> > needed in Kafka. As such if put to a >>>> KIP >>>>> > vote, this >>>>> > > > >> > wouldn’t >>>>> > > > >> > >> be >>>>> > > > >> > >> >> a >>>>> > > > >> > >> >> >>> > reason >>>>> > > > >> > >> >> >>> > >> to >>>>> > > > >> > >> >> >>> > >> > reject. >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > @ >>>>> > > > >> > >> >> >>> > >> > Ignacio on point 4). >>>>> > > > >> > >> >> >>> > >> > I think for purpose of getting this >>>> KIP >>>>> > moving past >>>>> > > > >> this, >>>>> > > > >> > we >>>>> > > > >> > >> can >>>>> > > > >> > >> >> >>> state >>>>> > > > >> > >> >> >>> > >> the >>>>> > > > >> > >> >> >>> > >> > key will be a 4 bytes space that can >>>> will be >>>>> > > > naturally >>>>> > > > >> > >> >> interpreted >>>>> > > > >> > >> >> >>> as >>>>> > > > >> > >> >> >>> > an >>>>> > > > >> > >> >> >>> > >> > Int32 (if namespacing is later >>>> wanted you can >>>>> > > easily >>>>> > > > >> split >>>>> > > > >> > >> this >>>>> > > > >> > >> >> >>> into >>>>> > > > >> > >> >> >>> > two >>>>> > > > >> > >> >> >>> > >> > int16 spaces), from the wire protocol >>>>> > > implementation >>>>> > > > >> this >>>>> > > > >> > >> makes >>>>> > > > >> > >> >> no >>>>> > > > >> > >> >> >>> > >> > difference I don’t believe. Is this >>>>> > reasonable to >>>>> > > > all? >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > On 5) as per point 4 therefor happy >>>> we keep >>>>> > with 32 >>>>> > > > >> bits. >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > On 18/11/2016, 20:34, " >>>>> > ignacio.so...@gmail.com on >>>>> > > > >> behalf >>>>> > > > >> > of >>>>> > > > >> > >> >> >>> Ignacio >>>>> > > > >> > >> >> >>> > >> > Solis" <ignacio.so...@gmail.com on >>>> behalf of >>>>> > > > >> > iso...@igso.net >>>>> > > > >> > >> > >>>>> > > > >> > >> >> >>> wrote: >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > Summary: >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > 3) Yes - Header value as byte[] >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > 4a) Int,Int - No >>>>> > > > >> > >> >> >>> > >> > 4b) Int - Yes >>>>> > > > >> > >> >> >>> > >> > 4c) String - Reluctant maybe >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > 5) I believe the header system >>>> should >>>>> > take a >>>>> > > > single >>>>> > > > >> > >> int. I >>>>> > > > >> > >> >> >>> think >>>>> > > > >> > >> >> >>> > >> > 32bits is >>>>> > > > >> > >> >> >>> > >> > a good size, if you want to >>>> interpret >>>>> > this as >>>>> > > to >>>>> > > > >> 16bit >>>>> > > > >> > >> >> numbers >>>>> > > > >> > >> >> >>> in >>>>> > > > >> > >> >> >>> > the >>>>> > > > >> > >> >> >>> > >> > layer >>>>> > > > >> > >> >> >>> > >> > above go right ahead. If >>>> somebody wants >>>>> > to >>>>> > > argue >>>>> > > > >> for >>>>> > > > >> > 16 >>>>> > > > >> > >> >> bits >>>>> > > > >> > >> >> >>> or >>>>> > > > >> > >> >> >>> > 64 >>>>> > > > >> > >> >> >>> > >> > bits of >>>>> > > > >> > >> >> >>> > >> > header key space I would listen. >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > Discussion: >>>>> > > > >> > >> >> >>> > >> > Dividing the key space into >>>> sub_key_1 and >>>>> > > > sub_key_2 >>>>> > > > >> > >> makes no >>>>> > > > >> > >> >> >>> > sense to >>>>> > > > >> > >> >> >>> > >> > me at >>>>> > > > >> > >> >> >>> > >> > this layer. Are we going to >>>> start >>>>> > providing >>>>> > > > APIs to >>>>> > > > >> > get >>>>> > > > >> > >> all >>>>> > > > >> > >> >> >>> the >>>>> > > > >> > >> >> >>> > >> > sub_key_1s? or all the >>>> sub_key_2s? If >>>>> > there is >>>>> > > > no >>>>> > > > >> > >> >> >>> distinguishing >>>>> > > > >> > >> >> >>> > >> > functions >>>>> > > > >> > >> >> >>> > >> > that are applied to each one >>>> then they >>>>> > should >>>>> > > be >>>>> > > > a >>>>> > > > >> > single >>>>> > > > >> > >> >> >>> value. >>>>> > > > >> > >> >> >>> > At >>>>> > > > >> > >> >> >>> > >> > this >>>>> > > > >> > >> >> >>> > >> > layer all we're doing is >>>> equality. >>>>> > > > >> > >> >> >>> > >> > If the above layer wants to >>>> interpret >>>>> > this as >>>>> > > 2, >>>>> > > > 3 >>>>> > > > >> or >>>>> > > > >> > >> more >>>>> > > > >> > >> >> >>> values >>>>> > > > >> > >> >> >>> > >> > that's a >>>>> > > > >> > >> >> >>> > >> > different question. I >>>> personally think >>>>> > it's >>>>> > > all >>>>> > > > one >>>>> > > > >> > >> >> keyspace >>>>> > > > >> > >> >> >>> > that is >>>>> > > > >> > >> >> >>> > >> > getting assigned using some >>>> structure, >>>>> > but if >>>>> > > you >>>>> > > > >> > want to >>>>> > > > >> > >> >> >>> > sub-assign >>>>> > > > >> > >> >> >>> > >> > parts >>>>> > > > >> > >> >> >>> > >> > of it then that's fine. >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > The same discussion applies to >>>> strings. >>>>> > If >>>>> > > > somebody >>>>> > > > >> > >> argued >>>>> > > > >> > >> >> for >>>>> > > > >> > >> >> >>> > >> > strings, >>>>> > > > >> > >> >> >>> > >> > would we be arguing to divide the >>>>> > strings with >>>>> > > > dots >>>>> > > > >> > ('.') >>>>> > > > >> > >> >> as a >>>>> > > > >> > >> >> >>> > >> > requirement? >>>>> > > > >> > >> >> >>> > >> > Would we want them to give us the >>>>> > different >>>>> > > name >>>>> > > > >> > segments >>>>> > > > >> > >> >> >>> > separately? >>>>> > > > >> > >> >> >>> > >> > Would we be performing any >>>> actions on >>>>> > this key >>>>> > > > other >>>>> > > > >> > than >>>>> > > > >> > >> >> >>> > matching? >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > Nacho >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, >>>> Michael >>>>> > > Pearce < >>>>> > > > >> > >> >> >>> > >> michael.pea...@ig.com >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > wrote: >>>>> > > > >> > >> >> >>> > >> > >>>>> > > > >> > >> >> >>> > >> > > #jay #jun any concerns on 1 >>>> and 2 >>>>> > still? >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > @all >>>>> > > > >> > >> >> >>> > >> > > To get this moving along a bit >>>> more >>>>> > I'd also >>>>> > > > like >>>>> > > > >> to >>>>> > > > >> > >> ask >>>>> > > > >> > >> >> to >>>>> > > > >> > >> >> >>> get >>>>> > > > >> > >> >> >>> > >> > clarity on >>>>> > > > >> > >> >> >>> > >> > > the below last points: >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > 3) I believe we're all roughly >>>> happy >>>>> > with the >>>>> > > > >> header >>>>> > > > >> > >> value >>>>> > > > >> > >> >> >>> > being a >>>>> > > > >> > >> >> >>> > >> > byte[]? >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > 4) I believe consensus has >>>> been for an >>>>> > > > namespace >>>>> > > > >> > based >>>>> > > > >> > >> int >>>>> > > > >> > >> >> >>> > approach >>>>> > > > >> > >> >> >>> > >> > > {int,int} for the key. Any >>>> objections >>>>> > if this >>>>> > > > is >>>>> > > > >> > what >>>>> > > > >> > >> we >>>>> > > > >> > >> >> go >>>>> > > > >> > >> >> >>> > with? >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > 5) as we have if assumption in >>>> (4) is >>>>> > > correct, >>>>> > > > >> > >> {int,int} >>>>> > > > >> > >> >> >>> keys. >>>>> > > > >> > >> >> >>> > >> > > Should both int's be int16 or >>>> int32? >>>>> > > > >> > >> >> >>> > >> > > I'm for them being int16(2 >>>> bytes) as >>>>> > combined >>>>> > > > is >>>>> > > > >> > space >>>>> > > > >> > >> of >>>>> > > > >> > >> >> >>> > 4bytes as >>>>> > > > >> > >> >> >>> > >> > per >>>>> > > > >> > >> >> >>> > >> > > original and gives plenty of >>>>> > combinations for >>>>> > > > the >>>>> > > > >> > >> >> >>> foreseeable, >>>>> > > > >> > >> >> >>> > and >>>>> > > > >> > >> >> >>> > >> > keeps >>>>> > > > >> > >> >> >>> > >> > > the overhead small. >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > Do we see any benefit in >>>> another kip >>>>> > call to >>>>> > > > >> discuss >>>>> > > > >> > >> >> these at >>>>> > > > >> > >> >> >>> > all? >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > Cheers >>>>> > > > >> > >> >> >>> > >> > > Mike >>>>> > > > >> > >> >> >>> > >> > > ______________________________ >>>>> > __________ >>>>> > > > >> > >> >> >>> > >> > > From: K Burstev < >>>> k.burs...@yandex.com> >>>>> > > > >> > >> >> >>> > >> > > Sent: Friday, November 18, 2016 >>>>> > 7:07:07 AM >>>>> > > > >> > >> >> >>> > >> > > To: dev@kafka.apache.org >>>>> > > > >> > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 >>>> - Add >>>>> > Record >>>>> > > > Headers >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > For what it is worth also i >>>> agree. As >>>>> > a user: >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > 1) Yes - Headers are >>>> worthwhile >>>>> > > > >> > >> >> >>> > >> > > 2) Yes - Headers should be a >>>> top level >>>>> > > option >>>>> > > > >> > >> >> >>> > >> > > >>>>> > > > >> > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio >>>> Solis" < >>>>> > > > >> iso...@igso.net >>>>> > > > >> > >: >>>>> > > > >> > >> >> >>> > >> > > > 1) Yes - Headers are >>>> worthwhile >>>>> > > > >> > >> >> >>> > >> > > > 2) Yes - Headers should be a >>>> top >>>>> > level >>>>> > > option >>>>> > > > >> > >> >> >>> > >> > > > >>>>> > > > >> > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 >>>> AM, >>>>> > Michael >>>>> > > > Pearce >>>>> > > > >> < >>>>> > > > >> > >> >> >>> > >> > michael.pea...@ig.com> >>>>> > > > >> > >> >> >>> > >> > > > wrote: >>>>> > > > >> > >> >> >>> > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> Hi Roger, >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> The kip details/examples >>>> the >>>>> > original >>>>> > > > proposal >>>>> > > > >> > for >>>>> > > > >> > >> key >>>>> > > > >> > >> >> >>> > spacing >>>>> > > > >> > >> >> >>> > >> , >>>>> > > > >> > >> >> >>> > >> > not >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> new mentioned as per >>>> discussion >>>>> > namespace >>>>> > > > >> idea. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> We will need to update the >>>> kip, >>>>> > when we >>>>> > > get >>>>> > > > >> > >> agreement >>>>> > > > >> > >> >> >>> this >>>>> > > > >> > >> >> >>> > is a >>>>> > > > >> > >> >> >>> > >> > better >>>>> > > > >> > >> >> >>> > >> > > >> approach (which seems to >>>> be the >>>>> > case if I >>>>> > > > have >>>>> > > > >> > >> >> understood >>>>> > > > >> > >> >> >>> > the >>>>> > > > >> > >> >> >>> > >> > general >>>>> > > > >> > >> >> >>> > >> > > >> feeling in the >>>> conversation) >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Re the variable ints, at >>>> very >>>>> > early stage >>>>> > > > we >>>>> > > > >> did >>>>> > > > >> > >> think >>>>> > > > >> > >> >> >>> about >>>>> > > > >> > >> >> >>> > >> > this. I >>>>> > > > >> > >> >> >>> > >> > > think >>>>> > > > >> > >> >> >>> > >> > > >> the added complexity for >>>> the >>>>> > saving isn't >>>>> > > > >> worth >>>>> > > > >> > it. >>>>> > > > >> > >> >> I'd >>>>> > > > >> > >> >> >>> > rather >>>>> > > > >> > >> >> >>> > >> go >>>>> > > > >> > >> >> >>> > >> > > with, if >>>>> > > > >> > >> >> >>> > >> > > >> we want to reduce >>>> overheads and >>>>> > size >>>>> > > int16 >>>>> > > > >> > (2bytes) >>>>> > > > >> > >> >> keys >>>>> > > > >> > >> >> >>> as >>>>> > > > >> > >> >> >>> > it >>>>> > > > >> > >> >> >>> > >> > keeps it >>>>> > > > >> > >> >> >>> > >> > > >> simple. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> On the note of no headers, >>>> there >>>>> > is as >>>>> > > per >>>>> > > > the >>>>> > > > >> > kip >>>>> > > > >> > >> as >>>>> > > > >> > >> >> we >>>>> > > > >> > >> >> >>> > use an >>>>> > > > >> > >> >> >>> > >> > > attribute >>>>> > > > >> > >> >> >>> > >> > > >> bit to denote if headers >>>> are >>>>> > present or >>>>> > > > not as >>>>> > > > >> > such >>>>> > > > >> > >> >> >>> > provides a >>>>> > > > >> > >> >> >>> > >> > zero >>>>> > > > >> > >> >> >>> > >> > > >> overhead currently if >>>> headers are >>>>> > not >>>>> > > used. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> I think as radai mentions >>>> would be >>>>> > good >>>>> > > > first >>>>> > > > >> > if we >>>>> > > > >> > >> >> can >>>>> > > > >> > >> >> >>> get >>>>> > > > >> > >> >> >>> > >> > clarity if >>>>> > > > >> > >> >> >>> > >> > > do >>>>> > > > >> > >> >> >>> > >> > > >> we now have general >>>> consensus that >>>>> > (1) >>>>> > > > headers >>>>> > > > >> > are >>>>> > > > >> > >> >> >>> > worthwhile >>>>> > > > >> > >> >> >>> > >> and >>>>> > > > >> > >> >> >>> > >> > > useful, >>>>> > > > >> > >> >> >>> > >> > > >> and (2) we want it as a >>>> top level >>>>> > entity. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Just to state the obvious i >>>>> > believe (1) >>>>> > > > >> headers >>>>> > > > >> > are >>>>> > > > >> > >> >> >>> > worthwhile >>>>> > > > >> > >> >> >>> > >> > and (2) >>>>> > > > >> > >> >> >>> > >> > > >> agree as a top level >>>> entity. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Cheers >>>>> > > > >> > >> >> >>> > >> > > >> Mike >>>>> > > > >> > >> >> >>> > >> > > >> >>>> ______________________________ >>>>> > __________ >>>>> > > > >> > >> >> >>> > >> > > >> From: Roger Hoover < >>>>> > > roger.hoo...@gmail.com >>>>> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> Sent: Wednesday, November >>>> 9, 2016 >>>>> > 9:10:47 >>>>> > > > PM >>>>> > > > >> > >> >> >>> > >> > > >> To: dev@kafka.apache.org >>>>> > > > >> > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] >>>> KIP-82 - Add >>>>> > > Record >>>>> > > > >> > Headers >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Sorry for going a little >>>> in the >>>>> > weeds but >>>>> > > > >> thanks >>>>> > > > >> > >> for >>>>> > > > >> > >> >> the >>>>> > > > >> > >> >> >>> > >> replies >>>>> > > > >> > >> >> >>> > >> > > regarding >>>>> > > > >> > >> >> >>> > >> > > >> varint. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Agreed that a prefix and >>>> {int, >>>>> > int} can >>>>> > > be >>>>> > > > the >>>>> > > > >> > >> same. >>>>> > > > >> > >> >> It >>>>> > > > >> > >> >> >>> > doesn't >>>>> > > > >> > >> >> >>> > >> > look >>>>> > > > >> > >> >> >>> > >> > > like >>>>> > > > >> > >> >> >>> > >> > > >> that's what the KIP is >>>> saying the >>>>> > "Open" >>>>> > > > >> > section. >>>>> > > > >> > >> The >>>>> > > > >> > >> >> >>> > example >>>>> > > > >> > >> >> >>> > >> > shows >>>>> > > > >> > >> >> >>> > >> > > >> 2100001 >>>>> > > > >> > >> >> >>> > >> > > >> for New Relic and 210002 >>>> for App >>>>> > Dynamics >>>>> > > > >> > implying >>>>> > > > >> > >> >> that >>>>> > > > >> > >> >> >>> the >>>>> > > > >> > >> >> >>> > New >>>>> > > > >> > >> >> >>> > >> > Relic >>>>> > > > >> > >> >> >>> > >> > > >> organization will have >>>> only a >>>>> > single >>>>> > > > header id >>>>> > > > >> > to >>>>> > > > >> > >> work >>>>> > > > >> > >> >> >>> > with. Or >>>>> > > > >> > >> >> >>> > >> > is >>>>> > > > >> > >> >> >>> > >> > > 2100001 >>>>> > > > >> > >> >> >>> > >> > > >> a prefix? The main point >>>> of a >>>>> > namespace >>>>> > > or >>>>> > > > >> > prefix >>>>> > > > >> > >> is >>>>> > > > >> > >> >> to >>>>> > > > >> > >> >> >>> > reduce >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > >> overhead of config mapping >>>> or >>>>> > > registration >>>>> > > > >> > >> depending >>>>> > > > >> > >> >> on >>>>> > > > >> > >> >> >>> how >>>>> > > > >> > >> >> >>> > >> > > >> namespaces/prefixes are >>>> managed. >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Would love to hear more >>>> feedback >>>>> > on the >>>>> > > > >> > >> higher-level >>>>> > > > >> > >> >> >>> > questions >>>>> > > > >> > >> >> >>> > >> > > though... >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Cheers, >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> Roger >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at >>>> 11:38 AM, >>>>> > radai < >>>>> > > > >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> >>>>> > > > >> > >> >> >>> > >> > > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> >>>>> > > > >> > >> >> >>> > >> > > >> > I think this discussion >>>> is >>>>> > getting a >>>>> > > bit >>>>> > > > >> into >>>>> > > > >> > the >>>>> > > > >> > >> >> >>> weeds on >>>>> > > > >> > >> >> >>> > >> > technical >>>>> > > > >> > >> >> >>> > >> > > >> > implementation details. >>>>> > > > >> > >> >> >>> > >> > > >> > I'd liek to step back a >>>> minute >>>>> > and try >>>>> > > > and >>>>> > > > >> > >> establish >>>>> > > > >> > >> >> >>> > where we >>>>> > > > >> > >> >> >>> > >> > are in >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> > larger picture: >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > (re-wording nacho's last >>>>> > paragraph) >>>>> > > > >> > >> >> >>> > >> > > >> > 1. are we all in >>>> agreement that >>>>> > headers >>>>> > > > are >>>>> > > > >> a >>>>> > > > >> > >> >> >>> worthwhile >>>>> > > > >> > >> >> >>> > and >>>>> > > > >> > >> >> >>> > >> > useful >>>>> > > > >> > >> >> >>> > >> > > >> > addition to have? this >>>> was >>>>> > contested >>>>> > > > early >>>>> > > > >> on >>>>> > > > >> > >> >> >>> > >> > > >> > 2. are we all in >>>> agreement on >>>>> > headers >>>>> > > as >>>>> > > > top >>>>> > > > >> > >> level >>>>> > > > >> > >> >> >>> entity >>>>> > > > >> > >> >> >>> > vs >>>>> > > > >> > >> >> >>> > >> > headers >>>>> > > > >> > >> >> >>> > >> > > >> > squirreled-away in V? >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > if there are still >>>> concerns >>>>> > around >>>>> > > these >>>>> > > > #2 >>>>> > > > >> > >> points >>>>> > > > >> > >> >> >>> (#jay? >>>>> > > > >> > >> >> >>> > >> > #jun?)? >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > (and now back to our >>>> normal >>>>> > programming >>>>> > > > ...) >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > varints are nice. having >>>> said >>>>> > that, its >>>>> > > > >> adding >>>>> > > > >> > >> >> >>> complexity >>>>> > > > >> > >> >> >>> > >> (see >>>>> > > > >> > >> >> >>> > >> > > >> > >>>> https://github.com/addthis/ >>>>> > > > >> > >> >> stream-lib/blob/master/src/ >>>>> > > > >> > >> >> >>> > >> > > >> > >>>> main/java/com/clearspring/ >>>>> > > > >> > >> >> analytics/util/Varint.java >>>>> > > > >> > >> >> >>> > >> > > >> > as 1st google result) >>>> and would >>>>> > require >>>>> > > > >> anyone >>>>> > > > >> > >> >> writing >>>>> > > > >> > >> >> >>> > other >>>>> > > > >> > >> >> >>> > >> > clients >>>>> > > > >> > >> >> >>> > >> > > (C? >>>>> > > > >> > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) >>>> to >>>>> > > get/implement >>>>> > > > the >>>>> > > > >> > >> same, >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> for >>>>> > > > >> > >> >> >>> > >> > relatively >>>>> > > > >> > >> >> >>> > >> > > >> > little gain (int vs >>>> string is >>>>> > order of >>>>> > > > >> > magnitude, >>>>> > > > >> > >> >> this >>>>> > > > >> > >> >> >>> > isnt). >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > int namespacing vs {int, >>>> int} >>>>> > > namespacing >>>>> > > > >> are >>>>> > > > >> > >> >> basically >>>>> > > > >> > >> >> >>> > the >>>>> > > > >> > >> >> >>> > >> > same >>>>> > > > >> > >> >> >>> > >> > > thing - >>>>> > > > >> > >> >> >>> > >> > > >> > youre just namespacing >>>> an int64 >>>>> > and >>>>> > > > giving >>>>> > > > >> > people >>>>> > > > >> > >> >> while >>>>> > > > >> > >> >> >>> > 2^32 >>>>> > > > >> > >> >> >>> > >> > ranges >>>>> > > > >> > >> >> >>> > >> > > at a >>>>> > > > >> > >> >> >>> > >> > > >> > time. the part i like >>>> about this >>>>> > is >>>>> > > > letting >>>>> > > > >> > >> people >>>>> > > > >> > >> >> >>> have a >>>>> > > > >> > >> >> >>> > >> large >>>>> > > > >> > >> >> >>> > >> > > swath of >>>>> > > > >> > >> >> >>> > >> > > >> > numbers with one >>>> registration so >>>>> > they >>>>> > > > dont >>>>> > > > >> > have >>>>> > > > >> > >> to >>>>> > > > >> > >> >> come >>>>> > > > >> > >> >> >>> > back >>>>> > > > >> > >> >> >>> > >> > for >>>>> > > > >> > >> >> >>> > >> > > every >>>>> > > > >> > >> >> >>> > >> > > >> > single plugin/header >>>> they want to >>>>> > > > "reserve". >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at >>>> 11:01 AM, >>>>> > Roger >>>>> > > > >> Hoover >>>>> > > > >> > < >>>>> > > > >> > >> >> >>> > >> > > roger.hoo...@gmail.com> >>>>> > > > >> > >> >> >>> > >> > > >> > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > Since some of the >>>> debate has >>>>> > been >>>>> > > about >>>>> > > > >> > >> overhead + >>>>> > > > >> > >> >> >>> > >> > performance, I'm >>>>> > > > >> > >> >> >>> > >> > > >> > > wondering if we have >>>>> > considered a >>>>> > > > varint >>>>> > > > >> > >> encoding >>>>> > > > >> > >> >> ( >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>> https://developers.google.com/ >>>>> > > > >> > >> >> protocol-buffers/docs/ >>>>> > > > >> > >> >> >>> > >> > > encoding#varints) >>>>> > > > >> > >> >> >>> > >> > > >> > for >>>>> > > > >> > >> >> >>> > >> > > >> > > the header length >>>> field (int32 >>>>> > in the >>>>> > > > >> > proposal) >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> for >>>>> > > > >> > >> >> >>> > >> > header >>>>> > > > >> > >> >> >>> > >> > > ids? If >>>>> > > > >> > >> >> >>> > >> > > >> > you >>>>> > > > >> > >> >> >>> > >> > > >> > > don't use headers, the >>>>> > overhead would >>>>> > > > be a >>>>> > > > >> > >> single >>>>> > > > >> > >> >> >>> byte >>>>> > > > >> > >> >> >>> > and >>>>> > > > >> > >> >> >>> > >> > for each >>>>> > > > >> > >> >> >>> > >> > > >> > header >>>>> > > > >> > >> >> >>> > >> > > >> > > id < 128 would also >>>> need only a >>>>> > > single >>>>> > > > >> byte? >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at >>>> 6:43 AM, >>>>> > > radai < >>>>> > > > >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> >>>>> > > > >> > >> >> >>> > >> > > >> > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > @magnus - and very >>>> dangerous >>>>> > (youre >>>>> > > > >> > >> essentially >>>>> > > > >> > >> >> >>> > >> > downloading and >>>>> > > > >> > >> >> >>> > >> > > >> > executing >>>>> > > > >> > >> >> >>> > >> > > >> > > > arbitrary code off >>>> the >>>>> > internet on >>>>> > > > your >>>>> > > > >> > >> servers >>>>> > > > >> > >> >> ... >>>>> > > > >> > >> >> >>> > bad >>>>> > > > >> > >> >> >>> > >> > idea >>>>> > > > >> > >> >> >>> > >> > > without >>>>> > > > >> > >> >> >>> > >> > > >> a >>>>> > > > >> > >> >> >>> > >> > > >> > > > sandbox, even with) >>>>> > > > >> > >> >> >>> > >> > > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > as for it being a >>>> purely >>>>> > > > administrative >>>>> > > > >> > task >>>>> > > > >> > >> - i >>>>> > > > >> > >> >> >>> > >> disagree. >>>>> > > > >> > >> >> >>> > >> > > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > i wish it would, >>>> really, >>>>> > because >>>>> > > > then my >>>>> > > > >> > >> earlier >>>>> > > > >> > >> >> >>> > point on >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > >> > complexity >>>>> > > > >> > >> >> >>> > >> > > >> > > of >>>>> > > > >> > >> >> >>> > >> > > >> > > > the remapping >>>> process would >>>>> > be >>>>> > > > invalid, >>>>> > > > >> > but >>>>> > > > >> > >> at >>>>> > > > >> > >> >> >>> > linkedin, >>>>> > > > >> > >> >> >>> > >> > for >>>>> > > > >> > >> >> >>> > >> > > example, >>>>> > > > >> > >> >> >>> > >> > > >> > we >>>>> > > > >> > >> >> >>> > >> > > >> > > > (the team im in) run >>>> kafka >>>>> > as a >>>>> > > > service. >>>>> > > > >> > we >>>>> > > > >> > >> dont >>>>> > > > >> > >> >> >>> > really >>>>> > > > >> > >> >> >>> > >> > know >>>>> > > > >> > >> >> >>> > >> > > what our >>>>> > > > >> > >> >> >>> > >> > > >> > > users >>>>> > > > >> > >> >> >>> > >> > > >> > > > (developing >>>> applications >>>>> > that use >>>>> > > > kafka) >>>>> > > > >> > are >>>>> > > > >> > >> up >>>>> > > > >> > >> >> to >>>>> > > > >> > >> >> >>> at >>>>> > > > >> > >> >> >>> > any >>>>> > > > >> > >> >> >>> > >> > given >>>>> > > > >> > >> >> >>> > >> > > >> moment. >>>>> > > > >> > >> >> >>> > >> > > >> > > it >>>>> > > > >> > >> >> >>> > >> > > >> > > > is very possible >>>> (given the >>>>> > > > existance of >>>>> > > > >> > >> headers >>>>> > > > >> > >> >> >>> and a >>>>> > > > >> > >> >> >>> > >> > > corresponding >>>>> > > > >> > >> >> >>> > >> > > >> > > plugin >>>>> > > > >> > >> >> >>> > >> > > >> > > > ecosystem) for some >>>>> > application to >>>>> > > > >> "equip" >>>>> > > > >> > >> their >>>>> > > > >> > >> >> >>> > >> producers >>>>> > > > >> > >> >> >>> > >> > and >>>>> > > > >> > >> >> >>> > >> > > >> > consumers >>>>> > > > >> > >> >> >>> > >> > > >> > > > with the required >>>> plugin >>>>> > without us >>>>> > > > >> > knowing. >>>>> > > > >> > >> i >>>>> > > > >> > >> >> dont >>>>> > > > >> > >> >> >>> > mean >>>>> > > > >> > >> >> >>> > >> > to imply >>>>> > > > >> > >> >> >>> > >> > > >> thats >>>>> > > > >> > >> >> >>> > >> > > >> > > > bad, i just want to >>>> make the >>>>> > point >>>>> > > > that >>>>> > > > >> > its >>>>> > > > >> > >> not >>>>> > > > >> > >> >> as >>>>> > > > >> > >> >> >>> > simple >>>>> > > > >> > >> >> >>> > >> > > keeping it >>>>> > > > >> > >> >> >>> > >> > > >> in >>>>> > > > >> > >> >> >>> > >> > > >> > > > sync across a >>>> large-enough >>>>> > > > organization. >>>>> > > > >> > >> >> >>> > >> > > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 >>>> at 6:17 >>>>> > AM, >>>>> > > > Magnus >>>>> > > > >> > >> Edenhill >>>>> > > > >> > >> >> < >>>>> > > > >> > >> >> >>> > >> > > mag...@edenhill.se> >>>>> > > > >> > >> >> >>> > >> > > >> > > > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > I think there is a >>>> piece >>>>> > missing >>>>> > > in >>>>> > > > >> the >>>>> > > > >> > >> >> Strings >>>>> > > > >> > >> >> >>> > >> > discussion, >>>>> > > > >> > >> >> >>> > >> > > where >>>>> > > > >> > >> >> >>> > >> > > >> > > > > pro-Stringers >>>>> > > > >> > >> >> >>> > >> > > >> > > > > reason that by >>>> providing >>>>> > unique >>>>> > > > string >>>>> > > > >> > >> >> >>> identifiers >>>>> > > > >> > >> >> >>> > for >>>>> > > > >> > >> >> >>> > >> > each >>>>> > > > >> > >> >> >>> > >> > > header >>>>> > > > >> > >> >> >>> > >> > > >> > > > > everything will >>>> just >>>>> > > > >> > >> >> >>> > >> > > >> > > > > magically work for >>>> all >>>>> > parts of >>>>> > > the >>>>> > > > >> > stream >>>>> > > > >> > >> >> >>> pipeline. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > But the strings >>>> dont mean >>>>> > > anything >>>>> > > > by >>>>> > > > >> > >> >> themselves, >>>>> > > > >> > >> >> >>> > and >>>>> > > > >> > >> >> >>> > >> > while we >>>>> > > > >> > >> >> >>> > >> > > >> could >>>>> > > > >> > >> >> >>> > >> > > >> > > > > probably envision >>>>> > > > >> > >> >> >>> > >> > > >> > > > > some auto plugin >>>> loader >>>>> > that >>>>> > > > >> downloads, >>>>> > > > >> > >> >> compiles, >>>>> > > > >> > >> >> >>> > links >>>>> > > > >> > >> >> >>> > >> > and >>>>> > > > >> > >> >> >>> > >> > > runs >>>>> > > > >> > >> >> >>> > >> > > >> > > plugins >>>>> > > > >> > >> >> >>> > >> > > >> > > > > on-demand >>>>> > > > >> > >> >> >>> > >> > > >> > > > > as soon as they're >>>> seen by >>>>> > a >>>>> > > > >> consumer, I >>>>> > > > >> > >> dont >>>>> > > > >> > >> >> >>> really >>>>> > > > >> > >> >> >>> > >> see >>>>> > > > >> > >> >> >>> > >> > a >>>>> > > > >> > >> >> >>> > >> > > use-case >>>>> > > > >> > >> >> >>> > >> > > >> > for >>>>> > > > >> > >> >> >>> > >> > > >> > > > > something >>>>> > > > >> > >> >> >>> > >> > > >> > > > > so dynamic (and >>>> fragile) in >>>>> > > > practice. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > In the real world >>>> an >>>>> > application >>>>> > > > will >>>>> > > > >> be >>>>> > > > >> > >> >> >>> configured >>>>> > > > >> > >> >> >>> > >> with >>>>> > > > >> > >> >> >>> > >> > a set >>>>> > > > >> > >> >> >>> > >> > > of >>>>> > > > >> > >> >> >>> > >> > > >> > > plugins >>>>> > > > >> > >> >> >>> > >> > > >> > > > > to either add >>>> (producer) >>>>> > > > >> > >> >> >>> > >> > > >> > > > > or read (consumer) >>>> headers. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > This is an >>>> administrative >>>>> > task >>>>> > > > based >>>>> > > > >> on >>>>> > > > >> > >> what >>>>> > > > >> > >> >> >>> > features a >>>>> > > > >> > >> >> >>> > >> > client >>>>> > > > >> > >> >> >>> > >> > > >> > > > > needs/provides and >>>> results >>>>> > in >>>>> > > > >> > >> >> >>> > >> > > >> > > > > some sort of >>>> configuration >>>>> > to >>>>> > > > enable >>>>> > > > >> and >>>>> > > > >> > >> >> >>> configure >>>>> > > > >> > >> >> >>> > the >>>>> > > > >> > >> >> >>> > >> > desired >>>>> > > > >> > >> >> >>> > >> > > >> > plugins. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > Since this needs >>>> to be kept >>>>> > > > somewhat >>>>> > > > >> in >>>>> > > > >> > >> sync >>>>> > > > >> > >> >> >>> across >>>>> > > > >> > >> >> >>> > an >>>>> > > > >> > >> >> >>> > >> > > organisation >>>>> > > > >> > >> >> >>> > >> > > >> > > > (there >>>>> > > > >> > >> >> >>> > >> > > >> > > > > is no point in >>>> having >>>>> > producers >>>>> > > > >> > >> >> >>> > >> > > >> > > > > add headers no >>>> consumers >>>>> > will >>>>> > > read, >>>>> > > > >> and >>>>> > > > >> > >> vice >>>>> > > > >> > >> >> >>> versa), >>>>> > > > >> > >> >> >>> > >> the >>>>> > > > >> > >> >> >>> > >> > added >>>>> > > > >> > >> >> >>> > >> > > >> > > complexity >>>>> > > > >> > >> >> >>> > >> > > >> > > > > of assigning an id >>>>> > namespace >>>>> > > > >> > >> >> >>> > >> > > >> > > > > for each plugin as >>>> it is >>>>> > being >>>>> > > > >> > configured >>>>> > > > >> > >> >> should >>>>> > > > >> > >> >> >>> be >>>>> > > > >> > >> >> >>> > >> > tolerable. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > /Magnus >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 >>>> GMT+01:00 >>>>> > > Michael >>>>> > > > >> > Pearce < >>>>> > > > >> > >> >> >>> > >> > > michael.pea...@ig.com>: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > Just >>>> following/catching >>>>> > up on >>>>> > > > what >>>>> > > > >> > seems >>>>> > > > >> > >> to >>>>> > > > >> > >> >> be >>>>> > > > >> > >> >> >>> an >>>>> > > > >> > >> >> >>> > >> > active >>>>> > > > >> > >> >> >>> > >> > > night :) >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > @Radai sorry if >>>> it may >>>>> > seem >>>>> > > > obvious >>>>> > > > >> > but >>>>> > > > >> > >> what >>>>> > > > >> > >> >> >>> does >>>>> > > > >> > >> >> >>> > MD >>>>> > > > >> > >> >> >>> > >> > stand >>>>> > > > >> > >> >> >>> > >> > > for? >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > My take on >>>> String vs Int: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > I will state >>>> first I am >>>>> > pro Int >>>>> > > > (16 >>>>> > > > >> or >>>>> > > > >> > >> 32). >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > I do though >>>> playing >>>>> > devils >>>>> > > > advocate >>>>> > > > >> > see a >>>>> > > > >> > >> >> big >>>>> > > > >> > >> >> >>> plus >>>>> > > > >> > >> >> >>> > >> > with the >>>>> > > > >> > >> >> >>> > >> > > >> > argument >>>>> > > > >> > >> >> >>> > >> > > >> > > of >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > String keys, >>>> this is >>>>> > around >>>>> > > > >> > integrating >>>>> > > > >> > >> >> into an >>>>> > > > >> > >> >> >>> > >> > existing >>>>> > > > >> > >> >> >>> > >> > > >> > eco-system. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > As many other >>>> systems use >>>>> > > String >>>>> > > > >> based >>>>> > > > >> > >> >> headers >>>>> > > > >> > >> >> >>> > >> (Flume, >>>>> > > > >> > >> >> >>> > >> > JMS) >>>>> > > > >> > >> >> >>> > >> > > it >>>>> > > > >> > >> >> >>> > >> > > >> > makes >>>>> > > > >> > >> >> >>> > >> > > >> > > > it >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > much easier for >>>> these to >>>>> > be >>>>> > > > >> > >> >> >>> > incorporated/integrated >>>>> > > > >> > >> >> >>> > >> > into. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > How with Int >>>> based >>>>> > headers >>>>> > > could >>>>> > > > we >>>>> > > > >> > >> provide >>>>> > > > >> > >> >> a >>>>> > > > >> > >> >> >>> > >> > way/guidence to >>>>> > > > >> > >> >> >>> > >> > > >> make >>>>> > > > >> > >> >> >>> > >> > > >> > > this >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > integration >>>> simple / >>>>> > easy with >>>>> > > > >> > transition >>>>> > > > >> > >> >> flows >>>>> > > > >> > >> >> >>> > over >>>>> > > > >> > >> >> >>> > >> to >>>>> > > > >> > >> >> >>> > >> > > kafka? >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > * tough luck >>>> buddy >>>>> > you're on >>>>> > > your >>>>> > > > >> own >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > * simply hash >>>> the string >>>>> > into >>>>> > > int >>>>> > > > >> code >>>>> > > > >> > >> and >>>>> > > > >> > >> >> hope >>>>> > > > >> > >> >> >>> > for >>>>> > > > >> > >> >> >>> > >> no >>>>> > > > >> > >> >> >>> > >> > > collisions >>>>> > > > >> > >> >> >>> > >> > > >> > > (how >>>>> > > > >> > >> >> >>> > >> > > >> > > > to >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > convert back >>>> though?) >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > * http2 style as >>>>> > mentioned by >>>>> > > > nacho. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > cheers, >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > Mike >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > ______________________________ >>>>> > > > >> > __________ >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > From: radai < >>>>> > > > >> > radai.rosenbl...@gmail.com> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, >>>>> > November 9, >>>>> > > 2016 >>>>> > > > >> > 8:12 AM >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > To: >>>> dev@kafka.apache.org >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > Subject: Re: >>>> [DISCUSS] >>>>> > KIP-82 - >>>>> > > > Add >>>>> > > > >> > >> Record >>>>> > > > >> > >> >> >>> Headers >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > thinking about >>>> it some >>>>> > more, >>>>> > > the >>>>> > > > >> best >>>>> > > > >> > >> way to >>>>> > > > >> > >> >> >>> > transmit >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > header >>>>> > > > >> > >> >> >>> > >> > > >> > > > > remapping >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > data to >>>> consumers would >>>>> > be to >>>>> > > > put it >>>>> > > > >> > in >>>>> > > > >> > >> the >>>>> > > > >> > >> >> MD >>>>> > > > >> > >> >> >>> > >> response >>>>> > > > >> > >> >> >>> > >> > > payload, >>>>> > > > >> > >> >> >>> > >> > > >> so >>>>> > > > >> > >> >> >>> > >> > > >> > > > maybe >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > it should be >>>> discussed >>>>> > now. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, >>>> 2016 at >>>>> > 12:09 >>>>> > > AM, >>>>> > > > >> > radai < >>>>> > > > >> > >> >> >>> > >> > > >> radai.rosenbl...@gmail.com >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > im not opposed >>>> to the >>>>> > idea of >>>>> > > > >> > namespace >>>>> > > > >> > >> >> >>> mapping. >>>>> > > > >> > >> >> >>> > >> all >>>>> > > > >> > >> >> >>> > >> > im >>>>> > > > >> > >> >> >>> > >> > > saying >>>>> > > > >> > >> >> >>> > >> > > >> is >>>>> > > > >> > >> >> >>> > >> > > >> > > > that >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > its >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > not part of >>>> the "mvp" >>>>> > and, >>>>> > > > since >>>>> > > > >> it >>>>> > > > >> > >> >> requires >>>>> > > > >> > >> >> >>> no >>>>> > > > >> > >> >> >>> > >> wire >>>>> > > > >> > >> >> >>> > >> > format >>>>> > > > >> > >> >> >>> > >> > > >> > change, >>>>> > > > >> > >> >> >>> > >> > > >> > > > can >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > always be >>>> added later. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > also, its not >>>> as >>>>> > simple as >>>>> > > just >>>>> > > > >> > >> >> configuring >>>>> > > > >> > >> >> >>> MM >>>>> > > > >> > >> >> >>> > to >>>>> > > > >> > >> >> >>> > >> do >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > >> > transform: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > lets >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > say i've >>>> implemented >>>>> > large >>>>> > > > message >>>>> > > > >> > >> >> support as >>>>> > > > >> > >> >> >>> > >> > {666,1} and >>>>> > > > >> > >> >> >>> > >> > > on >>>>> > > > >> > >> >> >>> > >> > > >> some >>>>> > > > >> > >> >> >>> > >> > > >> > > > > mirror >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > target cluster >>>> its been >>>>> > > > remapped >>>>> > > > >> to >>>>> > > > >> > >> >> {999,1}. >>>>> > > > >> > >> >> >>> the >>>>> > > > >> > >> >> >>> > >> > consumer >>>>> > > > >> > >> >> >>> > >> > > >> plugin >>>>> > > > >> > >> >> >>> > >> > > >> > > code >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > would >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > also need to >>>> be told >>>>> > to look >>>>> > > > for >>>>> > > > >> the >>>>> > > > >> > >> large >>>>> > > > >> > >> >> >>> > message >>>>> > > > >> > >> >> >>> > >> > "part X >>>>> > > > >> > >> >> >>> > >> > > of >>>>> > > > >> > >> >> >>> > >> > > >> Y" >>>>> > > > >> > >> >> >>> > >> > > >> > > > header >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > under {999,1}. >>>> doable, >>>>> > but >>>>> > > > tricky. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, >>>> 2016 at >>>>> > 10:29 >>>>> > > > PM, >>>>> > > > >> > Gwen >>>>> > > > >> > >> >> >>> Shapira < >>>>> > > > >> > >> >> >>> > >> > > >> g...@confluent.io >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> While you can >>>> do >>>>> > whatever >>>>> > > you >>>>> > > > >> want >>>>> > > > >> > >> with a >>>>> > > > >> > >> >> >>> > >> namespace >>>>> > > > >> > >> >> >>> > >> > and >>>>> > > > >> > >> >> >>> > >> > > your >>>>> > > > >> > >> >> >>> > >> > > >> > code, >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> what I'd >>>> expect is >>>>> > for each >>>>> > > > app >>>>> > > > >> to >>>>> > > > >> > >> >> >>> namespaces >>>>> > > > >> > >> >> >>> > >> > > configurable... >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> So if I >>>> accidentally >>>>> > used >>>>> > > 666 >>>>> > > > for >>>>> > > > >> > my >>>>> > > > >> > >> HR >>>>> > > > >> > >> >> >>> > >> department, >>>>> > > > >> > >> >> >>> > >> > and >>>>> > > > >> > >> >> >>> > >> > > still >>>>> > > > >> > >> >> >>> > >> > > >> > want >>>>> > > > >> > >> >> >>> > >> > > >> > > > to >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, >>>> I can >>>>> > config >>>>> > > > >> > >> "namespace=42" >>>>> > > > >> > >> >> >>> for >>>>> > > > >> > >> >> >>> > >> > RadaiApp and >>>>> > > > >> > >> >> >>> > >> > > >> > > > everything >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> will look >>>> normal. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> This means >>>> you only >>>>> > need to >>>>> > > > sync >>>>> > > > >> > usage >>>>> > > > >> > >> >> >>> inside >>>>> > > > >> > >> >> >>> > your >>>>> > > > >> > >> >> >>> > >> > own >>>>> > > > >> > >> >> >>> > >> > > >> > > organization. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> Still hard, >>>> but >>>>> > somewhat >>>>> > > > easier >>>>> > > > >> > than >>>>> > > > >> > >> >> syncing >>>>> > > > >> > >> >> >>> > with >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > entire >>>>> > > > >> > >> >> >>> > >> > > >> > > world. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov >>>> 8, 2016 >>>>> > at 10:07 >>>>> > > > PM, >>>>> > > > >> > >> radai < >>>>> > > > >> > >> >> >>> > >> > > >> > > >>>> radai.rosenbl...@gmail.com> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > and we can >>>> start >>>>> > with >>>>> > > > >> {namespace, >>>>> > > > >> > >> id} >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> no >>>>> > > > >> > >> >> >>> > >> > re-mapping >>>>> > > > >> > >> >> >>> > >> > > >> > support >>>>> > > > >> > >> >> >>> > >> > > >> > > > and >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> always >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > add it >>>> later on >>>>> > if/when >>>>> > > > >> > collisions >>>>> > > > >> > >> >> >>> actually >>>>> > > > >> > >> >> >>> > >> > happen (i >>>>> > > > >> > >> >> >>> > >> > > dont >>>>> > > > >> > >> >> >>> > >> > > >> > think >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > they'd >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> be >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > a problem). >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > every >>>> interested >>>>> > party (so >>>>> > > > orgs >>>>> > > > >> > or >>>>> > > > >> > >> >> >>> > individuals) >>>>> > > > >> > >> >> >>> > >> > could >>>>> > > > >> > >> >> >>> > >> > > then >>>>> > > > >> > >> >> >>> > >> > > >> > > > register >>>>> > > > >> > >> >> >>> > >> > > >> > > > > a >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = >>>>> > reserved, 1 = >>>>> > > > >> > confluent >>>>> > > > >> > >> ... >>>>> > > > >> > >> >> >>> 666 >>>>> > > > >> > >> >> >>> > = me >>>>> > > > >> > >> >> >>> > >> > :-) ) >>>>> > > > >> > >> >> >>> > >> > > and >>>>> > > > >> > >> >> >>> > >> > > >> do >>>>> > > > >> > >> >> >>> > >> > > >> > > > > whatever >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> with >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID >>>> - so once >>>>> > > > linkedin >>>>> > > > >> > >> >> registers, >>>>> > > > >> > >> >> >>> say >>>>> > > > >> > >> >> >>> > 3, >>>>> > > > >> > >> >> >>> > >> > then >>>>> > > > >> > >> >> >>> > >> > > >> linkedin >>>>> > > > >> > >> >> >>> > >> > > >> > > devs >>>>> > > > >> > >> >> >>> > >> > > >> > > > > are >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> free >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > to use {3, >>>> *} with a >>>>> > > > reasonable >>>>> > > > >> > >> >> >>> expectation >>>>> > > > >> > >> >> >>> > to >>>>> > > > >> > >> >> >>> > >> to >>>>> > > > >> > >> >> >>> > >> > > collide >>>>> > > > >> > >> >> >>> > >> > > >> with >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > anything >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > else. >>>> further >>>>> > partitioning >>>>> > > > of >>>>> > > > >> > that * >>>>> > > > >> > >> >> >>> becomes >>>>> > > > >> > >> >> >>> > >> > linkedin's >>>>> > > > >> > >> >> >>> > >> > > >> > problem, >>>>> > > > >> > >> >> >>> > >> > > >> > > > but >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > the >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > "upstream >>>>> > registration" >>>>> > > of a >>>>> > > > >> > >> namespace >>>>> > > > >> > >> >> >>> only >>>>> > > > >> > >> >> >>> > has >>>>> > > > >> > >> >> >>> > >> to >>>>> > > > >> > >> >> >>> > >> > > happen >>>>> > > > >> > >> >> >>> > >> > > >> > once. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov >>>> 8, 2016 >>>>> > at >>>>> > > 9:03 >>>>> > > > PM, >>>>> > > > >> > >> James >>>>> > > > >> > >> >> >>> Cheng < >>>>> > > > >> > >> >> >>> > >> > > >> > > wushuja...@gmail.com >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > On Nov >>>> 8, 2016, >>>>> > at 5:54 >>>>> > > > PM, >>>>> > > > >> > Gwen >>>>> > > > >> > >> >> >>> Shapira < >>>>> > > > >> > >> >> >>> > >> > > >> > g...@confluent.io> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Thank >>>> you so >>>>> > much for >>>>> > > > this >>>>> > > > >> > clear >>>>> > > > >> > >> and >>>>> > > > >> > >> >> >>> fair >>>>> > > > >> > >> >> >>> > >> > summary of >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> > > > > arguments. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > I'm in >>>> favor of >>>>> > ints. >>>>> > > > Not a >>>>> > > > >> > >> >> >>> deal-breaker, >>>>> > > > >> > >> >> >>> > but >>>>> > > > >> > >> >> >>> > >> > in >>>>> > > > >> > >> >> >>> > >> > > favor. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Even >>>> more in >>>>> > favor of >>>>> > > > >> Magnus's >>>>> > > > >> > >> >> >>> > decentralized >>>>> > > > >> > >> >> >>> > >> > > suggestion >>>>> > > > >> > >> >> >>> > >> > > >> > with >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > Roger's >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > tweak: >>>> add a >>>>> > namespace >>>>> > > > for >>>>> > > > >> > >> headers. >>>>> > > > >> > >> >> >>> This >>>>> > > > >> > >> >> >>> > will >>>>> > > > >> > >> >> >>> > >> > allow >>>>> > > > >> > >> >> >>> > >> > > each >>>>> > > > >> > >> >> >>> > >> > > >> > app >>>>> > > > >> > >> >> >>> > >> > > >> > > to >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > just >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > use >>>> whatever IDs >>>>> > it >>>>> > > wants >>>>> > > > >> > >> >> internally, >>>>> > > > >> > >> >> >>> and >>>>> > > > >> > >> >> >>> > >> then >>>>> > > > >> > >> >> >>> > >> > let >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> > admin >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> deploying >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > the app >>>> figure >>>>> > out an >>>>> > > > >> > available >>>>> > > > >> > >> >> >>> namespace >>>>> > > > >> > >> >> >>> > ID >>>>> > > > >> > >> >> >>> > >> > for the >>>>> > > > >> > >> >> >>> > >> > > app >>>>> > > > >> > >> >> >>> > >> > > >> to >>>>> > > > >> > >> >> >>> > >> > > >> > > > live >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > in. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > So >>>>> > > > >> > io.confluent.schema-registry >>>>> > > > >> > >> can >>>>> > > > >> > >> >> be >>>>> > > > >> > >> >> >>> > >> > namespace >>>>> > > > >> > >> >> >>> > >> > > 0x01 on >>>>> > > > >> > >> >> >>> > >> > > >> my >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> deployment >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > and 0x57 >>>> on >>>>> > yours, and >>>>> > > > the >>>>> > > > >> > poor >>>>> > > > >> > >> guys >>>>> > > > >> > >> >> >>> > >> > developing the >>>>> > > > >> > >> >> >>> > >> > > app >>>>> > > > >> > >> >> >>> > >> > > >> > don't >>>>> > > > >> > >> >> >>> > >> > > >> > > > > need >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > to >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > worry >>>> about that. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> Gwen, if I >>>>> > understand >>>>> > > your >>>>> > > > >> > example >>>>> > > > >> > >> >> >>> right, an >>>>> > > > >> > >> >> >>> > >> > > application >>>>> > > > >> > >> >> >>> > >> > > >> > > deployer >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > might >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> decide to >>>> use 0x01 >>>>> > in one >>>>> > > > >> > >> deployment, >>>>> > > > >> > >> >> and >>>>> > > > >> > >> >> >>> > that >>>>> > > > >> > >> >> >>> > >> > means >>>>> > > > >> > >> >> >>> > >> > > that >>>>> > > > >> > >> >> >>> > >> > > >> > once >>>>> > > > >> > >> >> >>> > >> > > >> > > > the >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> message >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> is written >>>> into the >>>>> > > > broker, it >>>>> > > > >> > >> will be >>>>> > > > >> > >> >> >>> > saved on >>>>> > > > >> > >> >> >>> > >> > the >>>>> > > > >> > >> >> >>> > >> > > broker >>>>> > > > >> > >> >> >>> > >> > > >> > with >>>>> > > > >> > >> >> >>> > >> > > >> > > > > that >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> specific >>>> namespace >>>>> > > (0x01). >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> If you >>>> were to >>>>> > mirror >>>>> > > that >>>>> > > > >> > message >>>>> > > > >> > >> >> into >>>>> > > > >> > >> >> >>> > another >>>>> > > > >> > >> >> >>> > >> > > cluster, >>>>> > > > >> > >> >> >>> > >> > > >> the >>>>> > > > >> > >> >> >>> > >> > > >> > > 0x01 >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > would >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> accompany >>>> the >>>>> > message, >>>>> > > > right? >>>>> > > > >> > What >>>>> > > > >> > >> if >>>>> > > > >> > >> >> the >>>>> > > > >> > >> >> >>> > >> > deployers of >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> > same >>>>> > > > >> > >> >> >>> > >> > > >> > > > app >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > in >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> the >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> other >>>> cluster uses >>>>> > 0x57? >>>>> > > > They >>>>> > > > >> > won't >>>>> > > > >> > >> >> >>> > understand >>>>> > > > >> > >> >> >>> > >> > each >>>>> > > > >> > >> >> >>> > >> > > other? >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> I'm not >>>> sure >>>>> > that's an >>>>> > > > >> avoidable >>>>> > > > >> > >> >> >>> problem. I >>>>> > > > >> > >> >> >>> > >> > think it >>>>> > > > >> > >> >> >>> > >> > > simply >>>>> > > > >> > >> >> >>> > >> > > >> > > means >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > that >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> in >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> order to >>>> share >>>>> > data, you >>>>> > > > have >>>>> > > > >> to >>>>> > > > >> > >> also >>>>> > > > >> > >> >> >>> have a >>>>> > > > >> > >> >> >>> > >> > shared >>>>> > > > >> > >> >> >>> > >> > > (agreed >>>>> > > > >> > >> >> >>> > >> > > >> > > upon) >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>> understanding of >>>>> > what the >>>>> > > > >> > >> namespaces >>>>> > > > >> > >> >> >>> mean. >>>>> > > > >> > >> >> >>> > >> Which >>>>> > > > >> > >> >> >>> > >> > I >>>>> > > > >> > >> >> >>> > >> > > think >>>>> > > > >> > >> >> >>> > >> > > >> > makes >>>>> > > > >> > >> >> >>> > >> > > >> > > > > sense, >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> because the >>>>> > alternate >>>>> > > > (sharing >>>>> > > > >> > >> >> *nothing* >>>>> > > > >> > >> >> >>> at >>>>> > > > >> > >> >> >>> > >> all) >>>>> > > > >> > >> >> >>> > >> > would >>>>> > > > >> > >> >> >>> > >> > > mean >>>>> > > > >> > >> >> >>> > >> > > >> > > that >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > there >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> would be >>>> no way to >>>>> > > > understand >>>>> > > > >> > each >>>>> > > > >> > >> >> other. >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> -James >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > Gwen >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> > On Tue, >>>> Nov 8, >>>>> > 2016 at >>>>> > > > 4:23 >>>>> > > > >> > PM, >>>>> > > > >> > >> >> radai < >>>>> > > > >> > >> >> >>> > >> > > >> > > > > >>>> radai.rosenbl...@gmail.com >>>>> > > >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> wrote: >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> +1 for >>>> sean's >>>>> > > document. >>>>> > > > it >>>>> > > > >> > >> covers >>>>> > > > >> > >> >> >>> pretty >>>>> > > > >> > >> >> >>> > >> much >>>>> > > > >> > >> >> >>> > >> > all >>>>> > > > >> > >> >> >>> > >> > > the >>>>> > > > >> > >> >> >>> > >> > > >> > > > trade-offs >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > and >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> provides >>>>> > concrete >>>>> > > > figures >>>>> > > > >> to >>>>> > > > >> > >> argue >>>>> > > > >> > >> >> >>> about >>>>> > > > >> > >> >> >>> > :-) >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> >> >> >>>> (nit-picking - >>>>> > used >>>>> > > the >>>>> > > > >> same >>>>> > > > >> > >> xkcd >>>>> > > > >> > >> >> >>> twice, >>>>> > > > >> > >> >> >>> > >> also >>>>> > > > >> > >> >> >>> > >> > trove >>>>> > > > >> > >> >> >>> > >> > > has >>>>> > > > >> > >> >> >>> > >> > > >> > been >>>>> > > > >> > >> >> >>> > >> > > >> > > > > > >> superceded >>>>> > > > >> > >> > >>>>> > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>>> > > > >> > -- >>>>> > > > >> > Gwen Shapira >>>>> > > > >> > Product Manager | Confluent >>>>> > > > >> > 650.450.2760 | @gwenshap >>>>> > > > >> > Follow us: Twitter | blog >>>>> > > > >> > >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> -- >>>>> > > > >> *Todd Palino* >>>>> > > > >> Staff Site Reliability Engineer >>>>> > > > >> Data Infrastructure Streaming >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> linkedin.com/in/toddpalino >>>>> > > > >> >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > -- >>>>> > > > Gwen Shapira >>>>> > > > Product Manager | Confluent >>>>> > > > 650.450.2760 | @gwenshap >>>>> > > > Follow us: Twitter | blog >>>>> > > > >>>>> > > >>>>> > >>>>> > >>>>> > The information contained in this email is strictly confidential >>>> and for >>>>> > the use of the addressee only, unless otherwise indicated. If you >>>> are not >>>>> > the intended recipient, please do not read, copy, use or disclose >>>> to others >>>>> > this message or any attachment. Please also notify the sender by >>>> replying >>>>> > to this email or by telephone (+44(020 7896 0011) and then delete >>>> the email >>>>> > and any copies of it. Opinions, conclusion (etc) that do not >>>> relate to the >>>>> > official business of this company shall be understood as neither >>>> given nor >>>>> > endorsed by it. IG is a trading name of IG Markets Limited (a >>>> company >>>>> > registered in England and Wales, company number 04008957) and IG >>>> Index >>>>> > Limited (a company registered in England and Wales, company number >>>>> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate >>>> Hill, >>>>> > London EC4R 2YA. Both IG Markets Limited (register number 195355) >>>> and IG >>>>> > Index Limited (register number 114059) are authorised and >>>> regulated by the >>>>> > Financial Conduct Authority. >>>>> > >>>>> >>>>> >>>>> The information contained in this email is strictly confidential and for >>>> the use of the addressee only, unless otherwise indicated. If you are not >>>> the intended recipient, please do not read, copy, use or disclose to others >>>> this message or any attachment. Please also notify the sender by replying >>>> to this email or by telephone (+44(020 7896 0011) and then delete the email >>>> and any copies of it. Opinions, conclusion (etc) that do not relate to the >>>> official business of this company shall be understood as neither given nor >>>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>>> registered in England and Wales, company number 04008957) and IG Index >>>> Limited (a company registered in England and Wales, company number >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >>>> Index Limited (register number 114059) are authorised and regulated by the >>>> Financial Conduct Authority. >>>>> >>>> >>>> >>> >> > > >
signature.asc
Description: OpenPGP digital signature