Thanks for the write-up and discussion. This would have been super
useful in our last round of deployments at LinkedIn for which we ended
up having to hack around a number of incompatibilities. I could list
all of the compatibility issues that have hit us, but some are
irrelevant to this specific KIP (e.g., ZooKeeper registration
versions). So I should perhaps just list two that I think are
relevant:

First, is the way that our metrics collection works. We have a special
metrics producer on every service that emits to a separate metrics
cluster. Kafka brokers also use this producer to emit to the
(separate) metrics cluster. So when we upgrade our test clusters, the
metrics producer in those clusters end up sending the latest produce
request version to the yet to be upgraded metrics cluster. This caused
an issue for us in the last round of deployments which bumped up the
protocol version for the quota-related throttle-time response field.
We got around that by just setting the metrics producer requiredAcks
to zero (since the error occurs on parsing the response - and the old
broker fortunately did not check the request version).

Second, the inter-broker protocol versioning scheme works fine across
official Apache releases but we picked up intermediate versions that
contained some request version bumps, and then follow-up versions that
picked up some more request bumps. For people deploying off trunk,
protocol version lookup would help.

General comments on the discussion and KIP:

I like Grant’s suggestion on using this to avoid the explicit
inter-broker-protocol-version - this will not only help address the
second compatibility issue above, but I’m all for anything that
eliminates an hour of config deployment (our deployments can take that
long!)

+1 on explicit response fields vs. key-value pairs - I don’t see this
reflected on the wiki though.

Aggregate protocol version vs specific request version: so you are
associating an increasing aggregate version (for each request version
bump). It may be useful to allow look up of the supported version (or
version range) for each request type. The BrokerMetadataResponse could
alternately return a vector of supported version ranges for each
request type.

Error response for unrecognized request versions: One option raised in
the discussion was to always include the highest supported version of
that request type in the response, but it may be worthwhile avoiding
that (since it is irrelevant most of the time) and fold that into the
BrokerMetadataRequest instead.

Max-message size/compression-codec: I actually prefer having this only
in TopicMetadataResponse and leave it out of the
BrokerMetadataRequest/Response (even for the defaults) since these are
really topic-specific fields. Rack-info on the other hand should
probably be there (at some point) in the BrokerMetadataResponse, and
this should perhaps be just a raw string that would need some
pluggable (deployment-specific) parsing.

Thanks,

Joel


On Wed, Sep 30, 2015 at 3:18 PM, Magnus Edenhill <mag...@edenhill.se> wrote:
> Everyone, thanks for your comments and input this far, here
> follows an update of the proposal based on the discussions.
>
>
>  BrokerMetadataRequest => [NodeId]
>    NodeId => int32   // Request Metadata for these brokers only.
>                      // Empty array: retrieve for all brokers.
>                      // Use -1 for current broker only.
>
>  BrokerMetadataResponse => [BrokerId Host Port ProtocolVersionMin
> ProtocolVersionMax [Key Value]]
>   NodeId => int32              // Broker NodeId
>   Host => string               // Broker Host
>   Port => int32                // Broker Port
>   ProtocolVersionMin => int32  // Broker's minimum supported protocol
> version
>   ProtocolVersionMax => int32  // Broker's maximum supported protocol
> version
>   Key => string                // Tag name
>   Value => stirng              // Tag value
>
>
> Builtin tags:
>  "broker.id"          = "9"
>  "broker.version"     = "0.9.0.0-SNAPSHOT-d12ca4f"
>  "broker.version.int" = "0x00090000"
>  "compression.codecs" = "gzip,snappy,lz4"
>  "message.max.bytes"  = "1000000"
>  "message.formats"    = "v1,v2"  // KIP-31
>  "endpoints"          = "plaintext://host:9092,ssl://host:9192"
>
> These are all documented, including their value format and how to parse it.
>
> The "broker.id" has multiple purposes:
>  * allows upgrading the bootstrap broker connection to a proper one since
> the
>    broker_id is initially not known, but would be with this.
>  * verifying that the broker connected to is actually the broker id that
> was learnt
>    through TopicMetadata.
>
>
> The BrokerMetadata may be used in broker-broker communication during
> upgrades
> to decide on a common protocol version between brokers with different
> versions.
>
>
>
> User-provided tags (server.properties), examples:
>  "aws.zone"           = "eu-central-1"
>  "rack"               = "r8a9"
>  "cluster"            = "kafka3"
>
> User provided tags are configured through the broker configuration file,
> server.properties, e.g.:
>
>    tag.aws.zone = eu-central-1
>    tag.rack = r8a9
>    tag.cluster = kafka3
>
>
>
>
> /Magnus
>
>
> 2015-10-01 0:11 GMT+02:00 Magnus Edenhill <mag...@edenhill.se>:
>
>>
>> Very good points, Todd, totally agree.
>>
>>
>> 2015-09-30 1:04 GMT+02:00 Todd Palino <tpal...@gmail.com>:
>>
>>> We should also consider what else should be negotiated between the broker
>>> and the client as this comes together. The version is definitely first,
>>> but
>>> there are other things, such as the max message size, that should not need
>>> to be replicated on both the broker and the client. Granted, max message
>>> size has per-topic overrides as well, but that should also be considered
>>> (possibly as an addition to the topic metadata response).
>>>
>>> Ideally you never want a requirement that is enforced by the broker to be
>>> a
>>> surprise to the client, whether that's a supported version or a
>>> configuration parameter. The client should not have to know it in advance
>>> (except for the most basic of connection parameters), and even if it does
>>> have it as a configuration option, it should be able to know before it
>>> even
>>> starts running that what it has configured is in conflict with the server.
>>>
>>> -Todd
>>>
>>>
>>> On Tue, Sep 29, 2015 at 11:08 AM, Mayuresh Gharat <
>>> gharatmayures...@gmail.com> wrote:
>>>
>>> > Right. But there should be a max old version that the broker should
>>> support
>>> > to avoid these incompatibility issues.
>>> > For example, if the broker is at version X, it should be able to support
>>> > the versions (clients and interbroker) till X-2. In case we have brokers
>>> > and clients older than that it can send a response warning them to
>>> upgrade
>>> > till X-2 minimum.
>>> > The backward compatibility limit can be discussed further. This will
>>> help
>>> > for rolling upgrades.
>>> >
>>> > Thanks,
>>> >
>>> > Mayuresh
>>> >
>>> > On Tue, Sep 29, 2015 at 8:25 AM, Grant Henke <ghe...@cloudera.com>
>>> wrote:
>>> >
>>> > > If we create a protocol version negotiation api for clients, can we
>>> use
>>> > it
>>> > > to replace or improve the ease of upgrades that break inter-broker
>>> > > messaging?
>>> > >
>>> > > Currently upgrades that break the wire protocol take 2 rolling
>>> restarts.
>>> > > The first restart we set inter.broker.protocol.version telling all
>>> > brokers
>>> > > to communicate on the old version, and then we restart again removing
>>> the
>>> > > inter.broker.protocol.version. With this api the brokers could agree
>>> on a
>>> > > version to communicate with, and when bounced re-negotiate to the new
>>> > > version.
>>> > >
>>> > >
>>> > > On Mon, Sep 28, 2015 at 10:26 PM, Mayuresh Gharat <
>>> > > gharatmayures...@gmail.com> wrote:
>>> > >
>>> > > > Nice write-up.
>>> > > >
>>> > > > Just had a question, instead of returning an empty response back to
>>> the
>>> > > > client, would it be better for the broker to return a response that
>>> > gives
>>> > > > some more info to the client regarding the min version they need to
>>> > > upgrade
>>> > > > to in order to communicate with the broker.
>>> > > >
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Mayuresh
>>> > > >
>>> > > > On Mon, Sep 28, 2015 at 6:36 PM, Jiangjie Qin
>>> > <j...@linkedin.com.invalid
>>> > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Thanks for the writeup. I also think having a specific protocol
>>> for
>>> > > > > client-broker version negotiation is better.
>>> > > > >
>>> > > > > I'm wondering is it better to let the broker to decide the
>>> version to
>>> > > > use?
>>> > > > > It might have some value If brokers have preference for a
>>> particular
>>> > > > > version.
>>> > > > > Using a global version is a good approach. For the client-broker
>>> > > > > negotiation, I am thinking about something like:
>>> > > > >
>>> > > > > ProtocolSyncRequest => ClientType [ProtocolVersion]
>>> > > > >     ClientType => int8
>>> > > > >     ProtocolVersion => int16
>>> > > > >
>>> > > > > ProtocolSyncResponse => PreferredVersion
>>> > > > >     PreferredVersion => int16
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Jiangjie (Becket) Qin
>>> > > > >
>>> > > > > On Mon, Sep 28, 2015 at 11:59 AM, Jun Rao <j...@confluent.io>
>>> wrote:
>>> > > > >
>>> > > > > > I agree with Ewen that having the keys explicitly specified in
>>> the
>>> > > > > response
>>> > > > > > is better.
>>> > > > > >
>>> > > > > > In addition to the supported protocol version, there are other
>>> > > > > interesting
>>> > > > > > metadata at the broker level that could be of interest to things
>>> > like
>>> > > > > admin
>>> > > > > > tools (e.g., used disk space, remaining disk space, etc). I am
>>> > > > wondering
>>> > > > > if
>>> > > > > > we should separate those into different requests. For inquiring
>>> the
>>> > > > > > protocol version, we can have a separate BrokerProtocolRequest.
>>> The
>>> > > > > > response will just include the broker version and perhaps a
>>> list of
>>> > > > > > supported requests and versions?
>>> > > > > >
>>> > > > > > As for sending an empty response for unrecognized requests, do
>>> you
>>> > > how
>>> > > > is
>>> > > > > > that handled in other similar systems?
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Jun
>>> > > > > >
>>> > > > > > On Mon, Sep 28, 2015 at 10:47 AM, Jason Gustafson <
>>> > > ja...@confluent.io>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > Having the version API can make clients more robust, so I'm in
>>> > > favor.
>>> > > > > One
>>> > > > > > > note on the addition of the "rack" field. Since this is a
>>> > > > > broker-specific
>>> > > > > > > setting, the client would have to query BrokerMetadata for
>>> every
>>> > > new
>>> > > > > > broker
>>> > > > > > > it connects to (unless we also expose rack in TopicMetadata).
>>> > This
>>> > > is
>>> > > > > > also
>>> > > > > > > kind of unfortunate for admin utilities leveraging this API.
>>> It
>>> > > might
>>> > > > > be
>>> > > > > > > more convenient to allow this API to return broker metadata
>>> for
>>> > the
>>> > > > > full
>>> > > > > > > cluster, assuming all of it could be made available in
>>> Zookeeper.
>>> > > > > > >
>>> > > > > > > As for using the empty response to indicate an incompatible
>>> API,
>>> > it
>>> > > > > seems
>>> > > > > > > like that could work. I think some of the clients might catch
>>> > > > response
>>> > > > > > > parsing exceptions and retry anyway, but that's no worse than
>>> > > > retrying
>>> > > > > > > because of a disconnect in the same case.
>>> > > > > > >
>>> > > > > > > -Jason
>>> > > > > > >
>>> > > > > > > On Fri, Sep 25, 2015 at 9:34 PM, Ewen Cheslack-Postava <
>>> > > > > > e...@confluent.io>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > The basic functionality is definitely useful here. I'm
>>> > generally
>>> > > in
>>> > > > > > favor
>>> > > > > > > > of exposing some info about broker versions to clients.
>>> > > > > > > >
>>> > > > > > > > I'd prefer to keep the key/values explicit. Making them
>>> > > extensible
>>> > > > > > > > string/string pairs is good for avoiding unnecessary version
>>> > > > changes
>>> > > > > in
>>> > > > > > > the
>>> > > > > > > > protocol, but I think we should explicitly define the valid
>>> > > > key/value
>>> > > > > > > > formats in each version of the protocol. New keys can
>>> safely be
>>> > > > > > ignored,
>>> > > > > > > > but actually specifying the format of the values is
>>> important
>>> > if
>>> > > we
>>> > > > > > ever
>>> > > > > > > > need to evolve those formats.
>>> > > > > > > >
>>> > > > > > > > I like some of the examples you've provided for returned
>>> > > key/value
>>> > > > > > pairs
>>> > > > > > > > and I think we should provide some of them even when the
>>> values
>>> > > > > should
>>> > > > > > be
>>> > > > > > > > obvious from the broker version.
>>> > > > > > > >
>>> > > > > > > > * broker.version - are we definitely standardizing on this
>>> > > > versioning
>>> > > > > > > > format? 4 digits, with each level indicating the intuitive
>>> > levels
>>> > > > of
>>> > > > > > > > compatibility? Is there any chance we'll have a 0.10.0.0?
>>> This
>>> > > > might
>>> > > > > > seem
>>> > > > > > > > like a trivial consideration, but after fighting versioning
>>> in
>>> > > > > > different
>>> > > > > > > > packaging systems, I'm a bit more sensitive to the annoying
>>> > > effects
>>> > > > > > that
>>> > > > > > > > not considering this carefully can have. We're at a
>>> > particularly
>>> > > > > > > sensitive
>>> > > > > > > > point as we hit .9 -> .10 or .9 -> 1.0 (!!!!).
>>> > > > > > > > * supported.compression.codecs - nit, but I'd like to figure
>>> > out
>>> > > a
>>> > > > > good
>>> > > > > > > way
>>> > > > > > > > to keep these as close to the actual config name as
>>> possible.
>>> > in
>>> > > > this
>>> > > > > > > case,
>>> > > > > > > > the setting is compression.codec, so just
>>> "compression.codecs"
>>> > > > seems
>>> > > > > > > ideal
>>> > > > > > > > to me.
>>> > > > > > > > * rack: I think there's a ton of demand for something like
>>> > this,
>>> > > > but
>>> > > > > > I'd
>>> > > > > > > > really like to think through it more before exposing
>>> anything.
>>> > > > "rack"
>>> > > > > > is
>>> > > > > > > > *very* specific to a deployment scenario. I think we're
>>> > > comfortable
>>> > > > > > > > adapting the terminology, but the meaning can change
>>> > drastically,
>>> > > > > even
>>> > > > > > > > under seemingly similar deployments. For example, if you
>>> deploy
>>> > > in
>>> > > > > EC2,
>>> > > > > > > you
>>> > > > > > > > might create instances in multiple AZs. Within an AZ, you
>>> might
>>> > > > > > consider
>>> > > > > > > > all the nodes on the same "rack". But there are also
>>> placement
>>> > > > groups
>>> > > > > > > > within each AZ that provide better guarantees on network
>>> > > > performance.
>>> > > > > > Are
>>> > > > > > > > any nodes in the same AZ considered on the same rack or do
>>> you
>>> > > need
>>> > > > > > > special
>>> > > > > > > > guarantees to be on the same "rack". In general, I don't
>>> think
>>> > > > > there's
>>> > > > > > a
>>> > > > > > > > generic "rack" identifier we can expose -- we'll need to do
>>> > > > something
>>> > > > > > > > specialized depending on different environments. This is a
>>> case
>>> > > > where
>>> > > > > > > > extensibility in the format is probably really useful.
>>> > > > > > > > * Properties like supported.compression.codecs can
>>> presumably
>>> > be
>>> > > > > > > determined
>>> > > > > > > > just via the broker version. Is there a good reason for
>>> > including
>>> > > > > them?
>>> > > > > > > > Perhaps explicit info >> implicit info?
>>> > > > > > > > * "All existing clients should be able to handle this
>>> > gracefully
>>> > > > > > without
>>> > > > > > > > any alterations to the code" - which clients does this refer
>>> > to?
>>> > > > For
>>> > > > > > > > example, will the Java clients continue to behave the same
>>> way
>>> > > they
>>> > > > > do
>>> > > > > > > > today? I'm curious because empty responses seem *very*
>>> > different
>>> > > > from
>>> > > > > > > > closing connections.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > -Ewen
>>> > > > > > > >
>>> > > > > > > > On Fri, Sep 25, 2015 at 2:53 PM, Magnus Edenhill <
>>> > > > mag...@edenhill.se
>>> > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Good evening,
>>> > > > > > > > >
>>> > > > > > > > > KIP-35 was created to address current and future
>>> > broker-client
>>> > > > > > > > > compatibility.
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version
>>> > > > > > > > >
>>> > > > > > > > > Summary:
>>> > > > > > > > >  * allow clients to retrieve the broker's protocol version
>>> > > > > > > > >  * make broker handle unknown protocol requests gracefully
>>> > > > > > > > >
>>> > > > > > > > > Feedback and comments welcome!
>>> > > > > > > > >
>>> > > > > > > > > Regards,
>>> > > > > > > > > Magnus
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > --
>>> > > > > > > > Thanks,
>>> > > > > > > > Ewen
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > -Regards,
>>> > > > Mayuresh R. Gharat
>>> > > > (862) 250-7125
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Grant Henke
>>> > > Software Engineer | Cloudera
>>> > > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > -Regards,
>>> > Mayuresh R. Gharat
>>> > (862) 250-7125
>>> >
>>>
>>
>>

Reply via email to