Everyone, thanks for your comments and input this far, here follows an update of the proposal based on the discussions.
BrokerMetadataRequest => [NodeId] NodeId => int32 // Request Metadata for these brokers only. // Empty array: retrieve for all brokers. // Use -1 for current broker only. BrokerMetadataResponse => [BrokerId Host Port ProtocolVersionMin ProtocolVersionMax [Key Value]] NodeId => int32 // Broker NodeId Host => string // Broker Host Port => int32 // Broker Port ProtocolVersionMin => int32 // Broker's minimum supported protocol version ProtocolVersionMax => int32 // Broker's maximum supported protocol version Key => string // Tag name Value => stirng // Tag value Builtin tags: "broker.id" = "9" "broker.version" = "0.9.0.0-SNAPSHOT-d12ca4f" "broker.version.int" = "0x00090000" "compression.codecs" = "gzip,snappy,lz4" "message.max.bytes" = "1000000" "message.formats" = "v1,v2" // KIP-31 "endpoints" = "plaintext://host:9092,ssl://host:9192" These are all documented, including their value format and how to parse it. The "broker.id" has multiple purposes: * allows upgrading the bootstrap broker connection to a proper one since the broker_id is initially not known, but would be with this. * verifying that the broker connected to is actually the broker id that was learnt through TopicMetadata. The BrokerMetadata may be used in broker-broker communication during upgrades to decide on a common protocol version between brokers with different versions. User-provided tags (server.properties), examples: "aws.zone" = "eu-central-1" "rack" = "r8a9" "cluster" = "kafka3" User provided tags are configured through the broker configuration file, server.properties, e.g.: tag.aws.zone = eu-central-1 tag.rack = r8a9 tag.cluster = kafka3 /Magnus 2015-10-01 0:11 GMT+02:00 Magnus Edenhill <mag...@edenhill.se>: > > Very good points, Todd, totally agree. > > > 2015-09-30 1:04 GMT+02:00 Todd Palino <tpal...@gmail.com>: > >> We should also consider what else should be negotiated between the broker >> and the client as this comes together. The version is definitely first, >> but >> there are other things, such as the max message size, that should not need >> to be replicated on both the broker and the client. Granted, max message >> size has per-topic overrides as well, but that should also be considered >> (possibly as an addition to the topic metadata response). >> >> Ideally you never want a requirement that is enforced by the broker to be >> a >> surprise to the client, whether that's a supported version or a >> configuration parameter. The client should not have to know it in advance >> (except for the most basic of connection parameters), and even if it does >> have it as a configuration option, it should be able to know before it >> even >> starts running that what it has configured is in conflict with the server. >> >> -Todd >> >> >> On Tue, Sep 29, 2015 at 11:08 AM, Mayuresh Gharat < >> gharatmayures...@gmail.com> wrote: >> >> > Right. But there should be a max old version that the broker should >> support >> > to avoid these incompatibility issues. >> > For example, if the broker is at version X, it should be able to support >> > the versions (clients and interbroker) till X-2. In case we have brokers >> > and clients older than that it can send a response warning them to >> upgrade >> > till X-2 minimum. >> > The backward compatibility limit can be discussed further. This will >> help >> > for rolling upgrades. >> > >> > Thanks, >> > >> > Mayuresh >> > >> > On Tue, Sep 29, 2015 at 8:25 AM, Grant Henke <ghe...@cloudera.com> >> wrote: >> > >> > > If we create a protocol version negotiation api for clients, can we >> use >> > it >> > > to replace or improve the ease of upgrades that break inter-broker >> > > messaging? >> > > >> > > Currently upgrades that break the wire protocol take 2 rolling >> restarts. >> > > The first restart we set inter.broker.protocol.version telling all >> > brokers >> > > to communicate on the old version, and then we restart again removing >> the >> > > inter.broker.protocol.version. With this api the brokers could agree >> on a >> > > version to communicate with, and when bounced re-negotiate to the new >> > > version. >> > > >> > > >> > > On Mon, Sep 28, 2015 at 10:26 PM, Mayuresh Gharat < >> > > gharatmayures...@gmail.com> wrote: >> > > >> > > > Nice write-up. >> > > > >> > > > Just had a question, instead of returning an empty response back to >> the >> > > > client, would it be better for the broker to return a response that >> > gives >> > > > some more info to the client regarding the min version they need to >> > > upgrade >> > > > to in order to communicate with the broker. >> > > > >> > > > >> > > > Thanks, >> > > > >> > > > Mayuresh >> > > > >> > > > On Mon, Sep 28, 2015 at 6:36 PM, Jiangjie Qin >> > <j...@linkedin.com.invalid >> > > > >> > > > wrote: >> > > > >> > > > > Thanks for the writeup. I also think having a specific protocol >> for >> > > > > client-broker version negotiation is better. >> > > > > >> > > > > I'm wondering is it better to let the broker to decide the >> version to >> > > > use? >> > > > > It might have some value If brokers have preference for a >> particular >> > > > > version. >> > > > > Using a global version is a good approach. For the client-broker >> > > > > negotiation, I am thinking about something like: >> > > > > >> > > > > ProtocolSyncRequest => ClientType [ProtocolVersion] >> > > > > ClientType => int8 >> > > > > ProtocolVersion => int16 >> > > > > >> > > > > ProtocolSyncResponse => PreferredVersion >> > > > > PreferredVersion => int16 >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Jiangjie (Becket) Qin >> > > > > >> > > > > On Mon, Sep 28, 2015 at 11:59 AM, Jun Rao <j...@confluent.io> >> wrote: >> > > > > >> > > > > > I agree with Ewen that having the keys explicitly specified in >> the >> > > > > response >> > > > > > is better. >> > > > > > >> > > > > > In addition to the supported protocol version, there are other >> > > > > interesting >> > > > > > metadata at the broker level that could be of interest to things >> > like >> > > > > admin >> > > > > > tools (e.g., used disk space, remaining disk space, etc). I am >> > > > wondering >> > > > > if >> > > > > > we should separate those into different requests. For inquiring >> the >> > > > > > protocol version, we can have a separate BrokerProtocolRequest. >> The >> > > > > > response will just include the broker version and perhaps a >> list of >> > > > > > supported requests and versions? >> > > > > > >> > > > > > As for sending an empty response for unrecognized requests, do >> you >> > > how >> > > > is >> > > > > > that handled in other similar systems? >> > > > > > >> > > > > > Thanks, >> > > > > > >> > > > > > Jun >> > > > > > >> > > > > > On Mon, Sep 28, 2015 at 10:47 AM, Jason Gustafson < >> > > ja...@confluent.io> >> > > > > > wrote: >> > > > > > >> > > > > > > Having the version API can make clients more robust, so I'm in >> > > favor. >> > > > > One >> > > > > > > note on the addition of the "rack" field. Since this is a >> > > > > broker-specific >> > > > > > > setting, the client would have to query BrokerMetadata for >> every >> > > new >> > > > > > broker >> > > > > > > it connects to (unless we also expose rack in TopicMetadata). >> > This >> > > is >> > > > > > also >> > > > > > > kind of unfortunate for admin utilities leveraging this API. >> It >> > > might >> > > > > be >> > > > > > > more convenient to allow this API to return broker metadata >> for >> > the >> > > > > full >> > > > > > > cluster, assuming all of it could be made available in >> Zookeeper. >> > > > > > > >> > > > > > > As for using the empty response to indicate an incompatible >> API, >> > it >> > > > > seems >> > > > > > > like that could work. I think some of the clients might catch >> > > > response >> > > > > > > parsing exceptions and retry anyway, but that's no worse than >> > > > retrying >> > > > > > > because of a disconnect in the same case. >> > > > > > > >> > > > > > > -Jason >> > > > > > > >> > > > > > > On Fri, Sep 25, 2015 at 9:34 PM, Ewen Cheslack-Postava < >> > > > > > e...@confluent.io> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > The basic functionality is definitely useful here. I'm >> > generally >> > > in >> > > > > > favor >> > > > > > > > of exposing some info about broker versions to clients. >> > > > > > > > >> > > > > > > > I'd prefer to keep the key/values explicit. Making them >> > > extensible >> > > > > > > > string/string pairs is good for avoiding unnecessary version >> > > > changes >> > > > > in >> > > > > > > the >> > > > > > > > protocol, but I think we should explicitly define the valid >> > > > key/value >> > > > > > > > formats in each version of the protocol. New keys can >> safely be >> > > > > > ignored, >> > > > > > > > but actually specifying the format of the values is >> important >> > if >> > > we >> > > > > > ever >> > > > > > > > need to evolve those formats. >> > > > > > > > >> > > > > > > > I like some of the examples you've provided for returned >> > > key/value >> > > > > > pairs >> > > > > > > > and I think we should provide some of them even when the >> values >> > > > > should >> > > > > > be >> > > > > > > > obvious from the broker version. >> > > > > > > > >> > > > > > > > * broker.version - are we definitely standardizing on this >> > > > versioning >> > > > > > > > format? 4 digits, with each level indicating the intuitive >> > levels >> > > > of >> > > > > > > > compatibility? Is there any chance we'll have a 0.10.0.0? >> This >> > > > might >> > > > > > seem >> > > > > > > > like a trivial consideration, but after fighting versioning >> in >> > > > > > different >> > > > > > > > packaging systems, I'm a bit more sensitive to the annoying >> > > effects >> > > > > > that >> > > > > > > > not considering this carefully can have. We're at a >> > particularly >> > > > > > > sensitive >> > > > > > > > point as we hit .9 -> .10 or .9 -> 1.0 (!!!!). >> > > > > > > > * supported.compression.codecs - nit, but I'd like to figure >> > out >> > > a >> > > > > good >> > > > > > > way >> > > > > > > > to keep these as close to the actual config name as >> possible. >> > in >> > > > this >> > > > > > > case, >> > > > > > > > the setting is compression.codec, so just >> "compression.codecs" >> > > > seems >> > > > > > > ideal >> > > > > > > > to me. >> > > > > > > > * rack: I think there's a ton of demand for something like >> > this, >> > > > but >> > > > > > I'd >> > > > > > > > really like to think through it more before exposing >> anything. >> > > > "rack" >> > > > > > is >> > > > > > > > *very* specific to a deployment scenario. I think we're >> > > comfortable >> > > > > > > > adapting the terminology, but the meaning can change >> > drastically, >> > > > > even >> > > > > > > > under seemingly similar deployments. For example, if you >> deploy >> > > in >> > > > > EC2, >> > > > > > > you >> > > > > > > > might create instances in multiple AZs. Within an AZ, you >> might >> > > > > > consider >> > > > > > > > all the nodes on the same "rack". But there are also >> placement >> > > > groups >> > > > > > > > within each AZ that provide better guarantees on network >> > > > performance. >> > > > > > Are >> > > > > > > > any nodes in the same AZ considered on the same rack or do >> you >> > > need >> > > > > > > special >> > > > > > > > guarantees to be on the same "rack". In general, I don't >> think >> > > > > there's >> > > > > > a >> > > > > > > > generic "rack" identifier we can expose -- we'll need to do >> > > > something >> > > > > > > > specialized depending on different environments. This is a >> case >> > > > where >> > > > > > > > extensibility in the format is probably really useful. >> > > > > > > > * Properties like supported.compression.codecs can >> presumably >> > be >> > > > > > > determined >> > > > > > > > just via the broker version. Is there a good reason for >> > including >> > > > > them? >> > > > > > > > Perhaps explicit info >> implicit info? >> > > > > > > > * "All existing clients should be able to handle this >> > gracefully >> > > > > > without >> > > > > > > > any alterations to the code" - which clients does this refer >> > to? >> > > > For >> > > > > > > > example, will the Java clients continue to behave the same >> way >> > > they >> > > > > do >> > > > > > > > today? I'm curious because empty responses seem *very* >> > different >> > > > from >> > > > > > > > closing connections. >> > > > > > > > >> > > > > > > > >> > > > > > > > -Ewen >> > > > > > > > >> > > > > > > > On Fri, Sep 25, 2015 at 2:53 PM, Magnus Edenhill < >> > > > mag...@edenhill.se >> > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Good evening, >> > > > > > > > > >> > > > > > > > > KIP-35 was created to address current and future >> > broker-client >> > > > > > > > > compatibility. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version >> > > > > > > > > >> > > > > > > > > Summary: >> > > > > > > > > * allow clients to retrieve the broker's protocol version >> > > > > > > > > * make broker handle unknown protocol requests gracefully >> > > > > > > > > >> > > > > > > > > Feedback and comments welcome! >> > > > > > > > > >> > > > > > > > > Regards, >> > > > > > > > > Magnus >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Thanks, >> > > > > > > > Ewen >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > -Regards, >> > > > Mayuresh R. Gharat >> > > > (862) 250-7125 >> > > > >> > > >> > > >> > > >> > > -- >> > > Grant Henke >> > > Software Engineer | Cloudera >> > > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >> > > >> > >> > >> > >> > -- >> > -Regards, >> > Mayuresh R. Gharat >> > (862) 250-7125 >> > >> > >