This is a fantastic and much-needed KIP. All third-party clients have had to deal with this issue. In my experience most clients are either declaring they only support version broker version X, or are spending a lot of time hacking around the issue. I think the community of non-java drivers would see significant benefit from this proposal.
My specific thought is that for kafka-python it has been easier to manage compatibility using broker release version to gate various features by api-protocol version. For example, only enable group coordination apis if >= (0, 9), kafka-backed offsets >= (0, 8, 2), etc. As an example, here are some backwards compatibility issues that I think are difficult to capture w/ just the protocol versions: - LZ4 compression only supported in brokers >= 0.8.2, but no protocol change. - kafka-backed offset storage, in additional to requiring new offset commit/fetch protocol versions, also requires adding support for tracking the group coordinator. - 0.8.2-beta OffsetCommit api [different than 0.8.X release] Could release version be added to the api response in this proposal? Perhaps include a KafkaRelease string in the Response before the array of api versions? Thanks for the great KIP, Magnus. And thanks for restarting the discussion, Ashish. I also would like to see this addressed in 0.10 -Dana On Mon, Feb 29, 2016 at 3:55 PM, Ashish Singh <asi...@cloudera.com> wrote: > I hope it is OK for me to make some progress here. I have made the > following changes. > > 1. Updated KIP-35, to adopt Jay's suggestion on maintaining separate list > of deprecated versions, instead of using a version of -1. > 2. Added information on required permissions, Describe action on Cluster > resource, to be able to retrieve protocol versions from a auth enabled > Kafka cluster. > > Created https://issues.apache.org/jira/browse/KAFKA-3304. Primary patch is > available to review, https://github.com/apache/kafka/pull/986 > > On Thu, Feb 25, 2016 at 1:27 PM, Ashish Singh <asi...@cloudera.com> wrote: > > > Kafka clients in Hadoop ecosystem, Flume, Spark, etc, have found it > really > > difficult to cope up with Kafka releases as they want to support users on > > different Kafka versions. Capability to retrieve protocol version will > go a > > long way to ease out those pain points. I will be happy to help out with > > the work on this KIP. @Magnus, thanks for driving this, is it OK if I > carry > > forward the work from here. It will be ideal to have this in 0.10.0.0. > > > > On Mon, Oct 12, 2015 at 9:29 PM, Jay Kreps <j...@confluent.io> wrote: > > > >> I wonder if we need to solve the error problem? I think this KIP gives a > >> descent work around. > >> > >> Probably we should have included an error in the response header, but we > >> debated it at the time decided not to and now it is pretty hard to add > >> because the headers aren't versioned (d'oh). > >> > >> It seems like any other solution is going to be kind of a hack, right? > >> Sending malformed responses back seems like not a clean solution... > >> > >> (Not sure if I was pro- having a top-level error or not, but in any case > >> the rationale for the decision was that so many of the requests were > >> per-partition or per-topic or whatever and hence fail or succeed at that > >> level and this makes it hard to know what the right top-level error code > >> is > >> and hard for the client to figure out what to do with the top level > error > >> if some of the partitions succeed but there is a top-level error). > >> > >> I think actually this new API actually gives a way to handle this > >> gracefully on the client side by just having clients that want to be > >> graceful check for support for their version. Clients that do that will > >> have a graceful message. > >> > >> At some point if we're ever reworking the headers we should really > >> consider > >> (a) versioning them and (b) adding a top-level error code in the > response. > >> But given this would be a big breaking change and this is really just to > >> give a nicer error message seems like it probably isn't worth it to try > to > >> do something now. > >> > >> -Jay > >> > >> > >> > >> On Mon, Oct 12, 2015 at 8:11 PM, Jiangjie Qin <j...@linkedin.com.invalid > > > >> wrote: > >> > >> > I am thinking instead of returning an empty response, it would be > >> better to > >> > return an explicit UnsupportedVersionException code. > >> > > >> > Today KafkaApis handles the error in the following way: > >> > 1. For requests/responses using old Scala classes, KafkaApis uses > >> > RequestOrResponse.handleError() to return an error response. > >> > 2. For requests/response using Java classes (only JoinGroupRequest and > >> > Heartbeat now), KafkaApis calls AbstractRequest.getErrorResponse() to > >> > return an error response. > >> > > >> > In KAFKA-2512, I am returning an UnsupportedVersionException for case > >> [1] > >> > when see an unsupported version. This will put the error code per > topic > >> or > >> > partition for most of the requests, but might not work all the time. > >> e.g. > >> > TopicMetadataRequest with an empty topic set. > >> > > >> > Case [2] does not quite work for unsupported version, because we will > >> > thrown an uncaught exception when version is not recognized (BTW this > >> is a > >> > bug). Part of the reason is that for some response types, error code > is > >> not > >> > part of the response level field. > >> > > >> > Maybe it worth checking how each response is dealing with error code > >> today. > >> > A scan of the response formats gives the following result: > >> > 1. TopicMetadataResponse - per topic error code, does not work when > the > >> > topic set is empty in the request. > >> > 2. ProduceResonse - per partition error code. > >> > 3. OffsetCommitResponse - per partition. > >> > 4. OffsetFetchResponse - per partition. > >> > 5. OffsetResponse - per partition. > >> > 6. FetchResponse - per partition > >> > 7. ConsumerMetadataResponse - response level > >> > 8. ControlledShutdownResponse - response level > >> > 9. JoinGroupResponse - response level > >> > 10. HearbeatResponse - response level > >> > 11. LeaderAndIsrResponse - response level > >> > 12. StopReplicaResponse - response level > >> > 13. UpdateMetadataResponse - response level > >> > > >> > So from the list above it looks for each response we are actually able > >> to > >> > return an error code, as long as we make sure the topic or partition > >> won't > >> > be empty when the error code is at topic or partition level. Luckily > in > >> the > >> > above list we only need to worry about TopicMetadataResponse. > >> > > >> > Maybe error handling is out of the scope of this KIP, but I prefer we > >> think > >> > through how to deal with error code for the requests, because there > are > >> > more request types to be added in KAFKA-2464 and future patches. > >> > > >> > Thanks, > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > On Mon, Oct 12, 2015 at 6:04 PM, Jay Kreps <j...@confluent.io> wrote: > >> > > >> > > Two quick pieces of feedback: > >> > > 1. The use of a version of -1 as magical entry dividing between > >> > deprecated > >> > > versions is a bit hacky. What about instead having an array of > >> supported > >> > > versions and a separate array of deprecated versions. The deprecated > >> > > versions would always be a subset of the supported versions. Or, > >> > > alternately, since deprecation has no functional impact and is just > a > >> > > message to developers, we could just leave it out of the protocol > and > >> > just > >> > > have it in release notes etc. > >> > > 2. I think including the api name may cause some problems. Currently > >> the > >> > > api key is the primary key that we keep consistent but we have > >> actually > >> > > evolved the english description of the apis as they have changed. > The > >> > only > >> > > use I can think of for the name would be if people used the logical > >> name > >> > > and tried to resolve the api key, but that would be wrong. Not sure > >> if we > >> > > actually need the english name, if there is a use case I guess we'll > >> just > >> > > have to be very clear that the name is just documentation and can > >> change > >> > > any time. > >> > > > >> > > -Jay > >> > > > >> > > On Fri, Sep 25, 2015 at 2:53 PM, Magnus Edenhill < > mag...@edenhill.se> > >> > > wrote: > >> > > > >> > > > Good evening, > >> > > > > >> > > > KIP-35 was created to address current and future broker-client > >> > > > compatibility. > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version > >> > > > > >> > > > Summary: > >> > > > * allow clients to retrieve the broker's protocol version > >> > > > * make broker handle unknown protocol requests gracefully > >> > > > > >> > > > Feedback and comments welcome! > >> > > > > >> > > > Regards, > >> > > > Magnus > >> > > > > >> > > > >> > > >> > > > > > > > > -- > > > > Regards, > > Ashish > > > > > > -- > > Regards, > Ashish >