Magnus and I had a brief discussion following the KIP call. KIP-35 <https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version> wiki has been updated accordingly. Please review the KIP and vote on the corresponding vote thread.
On Mon, Mar 14, 2016 at 11:27 PM, Ashish Singh <asi...@cloudera.com> wrote: > I think there is a bit of misunderstanding going on here regarding > protocol documentation and its versioning. It could be that I am the one > who misunderstood it, please correct me if so. > > Taking Gwen's example. > > 1. 0.10.0 (protocol v4) is released with current KIP-35 > 2. On trunk, modify produce requests and bump to v5 > 3. On trunk, we modify metadata requests and bump to v6 > 4. Now we decide that the metadata change fixes a super critical issue and > want to backport the change. What's the protocol version of the next > release of 0.10.0 - which supports v6 protocol only partially? > > As per my understanding, this will be v7. When we say a broker is on > ApiVersion 7, we do not necessarily mean that it also supports ApiVersion > up to v7. A broker on ApiVersion v7 should probably mean, please refer v7 > of protocol documentation to find out supported protocol versions of this > broker. > > I just added an example on the KIP wiki to elaborate more on protocol > documentation versioning. Below is the excerpt. > > For instance say we have two brokers, BrokerA has ApiVersion 4 and BrokerB > has ApiVersion 5. This means we should have protocol documentations for > ApiVersions 4 and 5. Say we have the following as protocol documentation > for these two versions. > > Sample Protocol Documentation V4 > Version: 4 // Comes from ApiVersion > REQ_A_0: ... > REQ_A_1: ... > RESP_A_0: ... > RESP_A_1: ... > > Sample Protocol Documentation V5 > Version: 5 // Comes from ApiVersion > REQ_A_1: ... > REQ_A_2: ... > RESP_A_1: ... > RESP_A_2: ... > > All a client needs to know to be able to successfully communicate with a > broker is what is the supported ApiVersion of the broker. Say via some > mechanism, discussed below, client gets to know that BrokerA has ApiVersion > 4 and BrokerB has ApiVersion 5. With that information, and the available > protocol documentations for those ApiVersions client can deduce what > protocol versions does the broker supports. In this case client will deduce > that it can use v0 and v1 of REQ_A and RESP_A while talking to BrokerA, > while it can use v1 and v2 of REQ_A and RESP_A while talking to BrokerB. > > On Mon, Mar 14, 2016 at 10:50 PM, Ewen Cheslack-Postava <e...@confluent.io > > wrote: > >> Yeah, Gwen's example is a good one. And it doesn't even have to be thought >> of in terms of the implementation -- you can think of the protocol itself >> as effectively being possible to branch and have changes cherry-picked. >> Given the way some changes interact and that only some may be feasible to >> backport, this may be important. >> >> Similarly, it's difficult to make that definition . In practice, we >> sometimes branch and effectively merge the protocol -- i.e. we develop 2 >> KIPs with independent changes at the same time. If you force a linear >> model, you also *force* the ordering of implementation, which will be a >> pretty serious constraint in a lot of cases. Two protocol-changing KIPs >> may >> occur near in time, but one may be a much larger change. >> >> Finally, it might be worth noting that from a client developer's >> perspective, the linear order may not be all that intuitive when we pile >> on >> a bunch of protocol changes in one release. They probably don't actually >> care about that global protocol version. They'll care more about the types >> of things Dana was talking about previously: LZ4 support (which isn't even >> a protocol change, but an important feature clients might need to know >> about!), Kafka-backed offset storage (requires 2 protocol changes), etc. >> While we want to encourage supporting all features, we should be realistic >> about how client developers tackle feature development and limited >> bandwidth. They are probably more feature driven than version driven. >> >> This is what Gwen was saying I had mentioned. The idea of features is >> actually separate from what has been described so far and *does* require a >> mapping to protocol versions, but also allows you to capture more than >> that >> and at more flexible granularity (single request type protocol version >> bump >> or the whole set of requests could change). The idea isn't quite the same >> as browser feature detection, but that's my frame of reference for it (see >> https://developer.mozilla.org/en-US/docs/Browser_Feature_Detection), the >> process of trying to sort out supported features and protocols based on >> browser version IDs (sort of equivalent to broker implementation versions >> here) is a huge mess. Going entirely the other route (say, only enabling a >> feature in CSS3 if *all* CSS3 features are implemented) is really >> restrictive. >> >> I don't have a concrete proposal right now, but something like "features" >> that sit somewhere between a global protocol version number and only >> individual request versions more accurately captures what we want to >> express, in my opinion. >> >> -Ewen >> >> On Mon, Mar 14, 2016 at 6:45 PM, Gwen Shapira <g...@confluent.io> wrote: >> >> > Jay, >> > >> > Ewen had a good example: >> > >> > 1. 0.10.0 (protocol v4) is released with current KIP-35 >> > 2. On trunk, modify produce requests and bump to v5 >> > 3. On trunk, we modify metadata requests and bump to v6 >> > 4. Now we decide that the metadata change fixes a super critical issue >> and >> > want to backport the change. What's the protocol version of the next >> > release of 0.10.0 - which supports v6 protocol only partially? >> > >> > Gwen >> > >> > On Mon, Mar 14, 2016 at 6:38 PM, Jay Kreps <j...@confluent.io> wrote: >> > >> > > Hey Dana, >> > > >> > > I am actually thinking about it differently. Basically I think you are >> > > imagining a world in which the Kafka code is the source of truth, and >> > > the Kafka developers make random changes that inflict pain on you at >> > > will. The protocol documentation is basically just some semi-accurate >> > > description of what the code does. It sounds like this isn't too far >> > > from the actual world. :-) In that world I agree that the best we >> > > could do would be to assign some id to versions (the md5 of the Kafka >> > > jar, maybe) and put in various checks around that in clients to try to >> > > keep things working. >> > > >> > > But imagine a different approach where we try to really treat the >> > > protocol as a document and treat that as the source of truth. We try >> > > to make this document cover what is and isn't specified and make it >> > > cover enough to support client implementations and a given Kafka >> > > version covers some range of protocols explicitly. There is a version >> > > of this document for each protocol version. The code implements the >> > > protocol rather than vice versa. >> > > >> > > So in other words protocol changes are totally ordered and separate >> > > from code development (we might develop them together but the protocol >> > > assignment would come when you checked in the new protocol version >> > > which would happen with your code). >> > > >> > > This was really the intention with the protocol originally (though we >> > > were doing it on a per-api basis), but I think that understanding was >> > > not shared by the full team and we have not done a great job of >> > > important things like documentation or explaining how this are >> > > supposed to work so we fall back on the "the protocol is whatever the >> > > code does" thing. >> > > >> > > Does that make sense? In that sense think one of the more important >> > > things we could get out of this would not be more versioning features >> > > so much as clear docs and processes around protocol versioning. >> > > >> > > -Jay >> > > >> > > On Mon, Mar 14, 2016 at 6:22 PM, Dana Powers <dana.pow...@gmail.com> >> > > wrote: >> > > > Is a linear protocol int consistent with the current release model? >> It >> > > > seems like that would break down w/ the multiple release branches >> that >> > > are >> > > > all simultaneously maintained? Or is it implicit that no patch >> release >> > > can >> > > > ever bump the protocol int? Or maybe the protocol int gets some >> extra >> > > > "wiggle" on minor / major releases to create unallocated version >> ints >> > > that >> > > > could be used on future patch releases / backports? >> > > > >> > > > I think the protocol version int does make sense for folks deploying >> > from >> > > > trunk. >> > > > >> > > > -Dana >> > > > >> > > > On Mon, Mar 14, 2016 at 6:13 PM, Jay Kreps <j...@confluent.io> >> wrote: >> > > > >> > > >> Yeah I think that is the point--we have a proposal for a new >> protocol >> > > >> versioning scheme and a vote on it but it doesn't actually describe >> > > >> how versioning will work yet! I gave my vague impression based on >> this >> > > >> thread, but I want to make sure that is correct and get it written >> > > >> down before we adopt it. >> > > >> >> > > >> -Jay >> > > >> >> > > >> On Mon, Mar 14, 2016 at 5:58 PM, Gwen Shapira <g...@confluent.io> >> > > wrote: >> > > >> > On Mon, Mar 14, 2016 at 5:31 PM, Jay Kreps <j...@confluent.io> >> > wrote: >> > > >> > >> > > >> >> Couple of missing things: >> > > >> >> >> > > >> >> This KIP doesn't have a proposal on versioning it just gives >> > > different >> > > >> >> options, it'd be good to get a concrete proposal in the KIP. >> Here >> > is >> > > my >> > > >> >> understanding of what we are proposing (can someone sanity check >> > and >> > > if >> > > >> >> correct, update the kip): >> > > >> >> >> > > >> >> 1. We will augment the existing api_version field in the >> header >> > > with >> > > >> a >> > > >> >> protocol_version that will begin at some initial value and >> > > increment >> > > >> by >> > > >> >> 1 >> > > >> >> every time we make a changes to any of the api_versions >> > (question: >> > > >> >> including internal apis?). >> > > >> >> >> > > >> > >> > > >> > Jay, this part was not in the KIP and was never discussed. >> > > >> > Are you proposing adding this? Or is it just an assumption you >> made? >> > > >> > >> > > >> > >> > > >> > >> > > >> >> 2. The protocol_version will be added to the metadata request >> > > >> >> 3. We will also add a string that this proposal is calling >> > > >> VersionString >> > > >> >> which will describe the build of kafka in some way. The >> clients >> > > >> should >> > > >> >> not >> > > >> >> under any circumstances do anything with this string other >> than >> > > >> print it >> > > >> >> out to the user. >> > > >> >> >> > > >> >> One thing I'm not sure about: I think currently metadata sits in >> > the >> > > >> client >> > > >> >> for 10 mins by default. Say a client bootstraps and then a >> server >> > is >> > > >> >> downgraded to an earlier version, won't the client's metadata >> > version >> > > >> >> indicate that that client handles a version it doesn't actually >> > > handle >> > > >> any >> > > >> >> more? We need to document how clients will handle this. >> > > >> >> >> > > >> >> Here are some comments on other details: >> > > >> >> >> > > >> >> 1. As a minor thing I think we should avoid naming the fields >> > > >> VersionId >> > > >> >> and VersionString which sort of implies they are both used >> for >> > > >> >> versioning. >> > > >> >> I think we should call them something like ProtocolVersion >> and >> > > >> >> BuildDescription, with BuildDescription being totally >> > unspecified >> > > >> other >> > > >> >> than that it is some kind of human readable string >> describing a >> > > >> >> particular >> > > >> >> Kafka build. We really don't want a client attempting to use >> > this >> > > >> >> string in >> > > >> >> any way as that would always be the wrong thing to do in the >> > > >> versioning >> > > >> >> scheme we are proposing, you should always use the protocol >> > > version. >> > > >> >> 2. Does making the topics field in the metadata request >> nullable >> > > >> >> actually make sense? We have a history of wanting to add >> magical >> > > >> values >> > > >> >> rather than fields. Currently topics=[a] means give me >> > information >> > > >> about >> > > >> >> topic a, topics=[] means give me information about all >> topics, >> > > and we >> > > >> >> are >> > > >> >> proposing topics=null would mean don't give me topics. I >> don't >> > > have a >> > > >> >> strong opinion. >> > > >> >> 3. I prefer Jason's proposal on using a conservative metadata >> > > version >> > > >> >> versus the empty response hack. However I think that may >> > actually >> > > >> >> exacerbate the downgrade scenario I described. >> > > >> >> 4. I agree with Jason that we should really look at the >> details >> > of >> > > >> the >> > > >> >> implementation so we know it works--implementing server >> support >> > > >> without >> > > >> >> actually trying it is kind of risky. >> > > >> >> >> > > >> >> As a meta comment: I'd really like to encourage us to think of >> the >> > > >> protocol >> > > >> >> as a document that includes the following things: >> > > >> >> >> > > >> >> - The binary format, error codes, etc >> > > >> >> - The request/response interaction >> > > >> >> - The semantics of each request in different cases >> > > >> >> - Instructions on how to use this to implement a client >> > > >> >> >> > > >> >> This document is versioned with the protocol number and is the >> > > source of >> > > >> >> truth for the protocol. >> > > >> >> >> > > >> >> Part of any protocol change needs to be an update to the >> > > instructions on >> > > >> >> how to use that part of the protocol. We should be opinionated. >> If >> > > there >> > > >> >> are two options there should be a reason, and then we need to >> > > document >> > > >> both >> > > >> >> and say exactly when to use each. >> > > >> >> >> > > >> >> I think we also need to get a "how to" document on protocol >> changes >> > > >> just so >> > > >> >> people know what they need to do to add a new protocol feature. >> > > >> >> >> > > >> >> -Jay >> > > >> >> >> > > >> >> On Mon, Mar 14, 2016 at 4:51 PM, Jason Gustafson < >> > ja...@confluent.io >> > > > >> > > >> >> wrote: >> > > >> >> >> > > >> >> > > >> > > >> >> > > Are you suggesting this as a solution for the problem where >> a >> > > KIP-35 >> > > >> >> > aware >> > > >> >> > > client sends a higher version of metadata req, say v2, to a >> > > KIP-35 >> > > >> >> aware >> > > >> >> > > broker that only supports up to v1. >> > > >> >> > >> > > >> >> > >> > > >> >> > Yes, that's right. In that case, the client first sends v1, >> finds >> > > out >> > > >> >> that >> > > >> >> > the broker supports v2, and then sends v2 (if it has any >> reason >> > to >> > > do >> > > >> >> so). >> > > >> >> > >> > > >> >> > We had a bit of discussion on such scenarios, and it seemed to >> > be a >> > > >> >> chicken >> > > >> >> > > and egg problem that is hard to avoid. Your suggestion >> > definitely >> > > >> makes >> > > >> >> > > sense, however it falls under the purview of clients. >> > > >> >> > >> > > >> >> > >> > > >> >> > That basically means clients should figure it out for >> themselves? >> > > >> Might >> > > >> >> be >> > > >> >> > nice to have a better answer. >> > > >> >> > >> > > >> >> > KIP-35 only aims on adding support for getting the version >> info >> > > from a >> > > >> >> > > broker. This definitely can be utilized by our clients. >> > However, >> > > >> that >> > > >> >> can >> > > >> >> > > follow KIP-35 changes. Does this sound reasonable to you? >> > > >> >> > >> > > >> >> > >> > > >> >> > It may be OK, but I'm a little concerned about offering a >> feature >> > > >> that we >> > > >> >> > don't support ourselves. Sometimes it's not until >> implementation >> > > that >> > > >> we >> > > >> >> > find out whether it really works as expected. And if we're >> > > eventually >> > > >> >> > planning to support it, I feel we should think through some of >> > the >> > > >> cases >> > > >> >> a >> > > >> >> > bit more. For example, the upgrade and downgrade cases that >> > Becket >> > > >> >> > mentioned earlier. It doesn't feel too great to support this >> > > feature >> > > >> >> unless >> > > >> >> > we can offer guidance on how to use it. >> > > >> >> > >> > > >> >> > -Jason >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > On Mon, Mar 14, 2016 at 4:20 PM, Ashish Singh < >> > asi...@cloudera.com >> > > > >> > > >> >> wrote: >> > > >> >> > >> > > >> >> > > Hi Jason, >> > > >> >> > > >> > > >> >> > > On Mon, Mar 14, 2016 at 4:04 PM, Jason Gustafson < >> > > >> ja...@confluent.io> >> > > >> >> > > wrote: >> > > >> >> > > >> > > >> >> > > > Perhaps clients should always send the oldest version of >> the >> > > >> metadata >> > > >> >> > > > request which supports KIP-35 when initially connecting to >> > the >> > > >> >> cluster. >> > > >> >> > > > Depending on the versions in the response, it can upgrade >> to >> > a >> > > >> more >> > > >> >> > > recent >> > > >> >> > > > version. Then maybe we don't need the empty response hack? >> > > >> >> > > > >> > > >> >> > > Are you suggesting this as a solution for the problem where >> a >> > > KIP-35 >> > > >> >> > aware >> > > >> >> > > client sends a higher version of metadata req, say v2, to a >> > > KIP-35 >> > > >> >> aware >> > > >> >> > > broker that only supports up to v1. >> > > >> >> > > >> > > >> >> > > We had a bit of discussion on such scenarios, and it seemed >> to >> > > be a >> > > >> >> > chicken >> > > >> >> > > and egg problem that is hard to avoid. Your suggestion >> > definitely >> > > >> makes >> > > >> >> > > sense, however it falls under the purview of clients. >> > > >> >> > > >> > > >> >> > > > >> > > >> >> > > > One thing that's not clear to me is whether the ultimate >> goal >> > > of >> > > >> this >> > > >> >> > KIP >> > > >> >> > > > is to have our clients support multiple broker versions. >> It >> > > would >> > > >> be >> > > >> >> a >> > > >> >> > > > little weird to have this feature if our own clients don't >> > use >> > > it. >> > > >> >> > > > >> > > >> >> > > KIP-35 only aims on adding support for getting the version >> info >> > > >> from a >> > > >> >> > > broker. This definitely can be utilized by our clients. >> > However, >> > > >> that >> > > >> >> can >> > > >> >> > > follow KIP-35 changes. Does this sound reasonable to you? >> > > >> >> > > >> > > >> >> > > > >> > > >> >> > > > -Jason >> > > >> >> > > > >> > > >> >> > > > On Mon, Mar 14, 2016 at 3:34 PM, Ashish Singh < >> > > >> asi...@cloudera.com> >> > > >> >> > > wrote: >> > > >> >> > > > >> > > >> >> > > > > On Mon, Mar 14, 2016 at 2:12 PM, Ismael Juma < >> > > ism...@juma.me.uk >> > > >> > >> > > >> >> > > wrote: >> > > >> >> > > > > >> > > >> >> > > > > > On Mon, Mar 14, 2016 at 8:45 PM, Gwen Shapira < >> > > >> g...@confluent.io >> > > >> >> > >> > > >> >> > > > wrote: >> > > >> >> > > > > > > >> > > >> >> > > > > > > > I don't see how it helps. If the client is >> > > communicating >> > > >> >> with a >> > > >> >> > > > > broker >> > > >> >> > > > > > > that >> > > >> >> > > > > > > > does not support KIP-35, that broker will simply >> > close >> > > the >> > > >> >> > > > > connection. >> > > >> >> > > > > > If >> > > >> >> > > > > > > > the broker supports KIP-35, then it will provide >> the >> > > >> broker >> > > >> >> > > > version. >> > > >> >> > > > > I >> > > >> >> > > > > > > > don't envisage a scenario where a broker does not >> > > support >> > > >> >> > KIP-35, >> > > >> >> > > > but >> > > >> >> > > > > > > > implements the new behaviour of sending an empty >> > > >> response. Do >> > > >> >> > > you? >> > > >> >> > > > > > > > >> > > >> >> > > > > > > > Are you sure about that? Per KIP-35, the broker >> > > supplies >> > > >> the >> > > >> >> > > > version >> > > >> >> > > > > in >> > > >> >> > > > > > > response to Metadata request, not in response to >> > anything >> > > >> else. >> > > >> >> > > > > > > If the client sends producer request version 42 >> > > >> (accidentally >> > > >> >> or >> > > >> >> > > due >> > > >> >> > > > to >> > > >> >> > > > > > > premature upgrade) to KIP-35-compactible broker - we >> > > want to >> > > >> >> see >> > > >> >> > an >> > > >> >> > > > > empty >> > > >> >> > > > > > > packet and not a connection close. >> > > >> >> > > > > > > Sending a broker version was deemed impractical >> IIRC. >> > > >> >> > > > > > > >> > > >> >> > > > > > >> > > >> >> > > > > > OK, so this is a different case than the one Ashish >> > > described >> > > >> >> > > ("client >> > > >> >> > > > > that >> > > >> >> > > > > > wants to support broker versions that do not provide >> > broker >> > > >> >> version >> > > >> >> > > in >> > > >> >> > > > > > metadata and broker versions that provides version >> info >> > in >> > > >> >> > > metadata"). >> > > >> >> > > > > So, >> > > >> >> > > > > > you are suggesting that if a client is communicating >> > with a >> > > >> >> broker >> > > >> >> > > that >> > > >> >> > > > > > implements KIP-35 and it receives an empty response, >> it >> > > will >> > > >> >> assume >> > > >> >> > > > that >> > > >> >> > > > > > the broker doesn't support the request version and it >> > won't >> > > >> try >> > > >> >> to >> > > >> >> > > > parse >> > > >> >> > > > > > the response? I think it would be good to explain this >> > > kind of >> > > >> >> > thing >> > > >> >> > > in >> > > >> >> > > > > > detail in the KIP. >> > > >> >> > > > > > >> > > >> >> > > > > Actually even in this case and the case I mentioned, >> > closing >> > > >> >> > connection >> > > >> >> > > > > should be fine. Lets think about possible reasons that >> > could >> > > >> lead >> > > >> >> to >> > > >> >> > > this >> > > >> >> > > > > issue. >> > > >> >> > > > > >> > > >> >> > > > > 1. Client has incorrect mapping of supported protocols >> for >> > a >> > > >> broker >> > > >> >> > > > > version. >> > > >> >> > > > > 2. Client misread broker version from metadata response. >> > > >> >> > > > > 3. Client constructed unsupported protocol version by >> > > mistake. >> > > >> >> > > > > >> > > >> >> > > > > In all the above cases irrespective of what broker does, >> > > client >> > > >> >> will >> > > >> >> > > keep >> > > >> >> > > > > sending wrong request version. >> > > >> >> > > > > >> > > >> >> > > > > At this point, I think sending an empty packet instead >> of >> > > >> closing >> > > >> >> > > > > connection is a nice to have and not mandatory >> requirement. >> > > >> Like in >> > > >> >> > the >> > > >> >> > > > > above case, a client can catch parsing error and be sure >> > that >> > > >> there >> > > >> >> > is >> > > >> >> > > > > something wrong in the protocol version it is sending. >> > > However, >> > > >> a >> > > >> >> > > generic >> > > >> >> > > > > connection close does not really provide any >> information on >> > > >> >> probable >> > > >> >> > > > cause. >> > > >> >> > > > > >> > > >> >> > > > > What do you guys suggest? >> > > >> >> > > > > >> > > >> >> > > > > > >> > > >> >> > > > > > Ismael >> > > >> >> > > > > > >> > > >> >> > > > > >> > > >> >> > > > > >> > > >> >> > > > > >> > > >> >> > > > > -- >> > > >> >> > > > > >> > > >> >> > > > > Regards, >> > > >> >> > > > > Ashish >> > > >> >> > > > > >> > > >> >> > > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > -- >> > > >> >> > > >> > > >> >> > > Regards, >> > > >> >> > > Ashish >> > > >> >> > > >> > > >> >> > >> > > >> >> >> > > >> >> > > >> > >> >> >> >> -- >> Thanks, >> Ewen >> > > > > -- > > Regards, > Ashish > -- Regards, Ashish