Hi Colin,

Thanks a lot for the excellent feedback and great
ideas/questions/suggestions. I have updated the KIP based on the feedback.
Please find my response below to the comments.

> It would be good to note that deprecating a feature version (in other
words, increasing minVersionLevel on the broker) is an incompatible
> change, which requires a major release of Kafka.

(Kowshik): Great point! Done. I have updated the KIP mentioning this in
'Non-goals' as well as 'Feature version deprecation' sections.

===

> I think the strategy for transitioning from not having a /features znode
to having one could use some work. The current proposal is to wait for all
the
> brokers to fill in their feature znodes and then pick the highest common
versions.  But that requires blocking in the controller startup code until
the
> whole cluster is active (technically, a point in time which we never
really know that we have reached...)

> Instead, I think it would be better to have a strategy like this:
> 1. If the controller comes up and there is no /features znode AND the IBP
is less than 2.6, create a /features znode where all the features are
disabled.
> 2. If the controller comes up and there is no /features znode AND the IBP
is greater than or equal to 2.6, create a /features znode where all the
> features are enabled at the highest versions supported by the controller.

> People upgrading from the pre-KIP-584
<https://issues.apache.org/jira/browse/KIP-584> world will end up in case
#1 because they have to do a double roll to upgrade, and during the first
roll, the IBP
> is unchanged.  People creating new clusters from scratch will end up in
case #2, which is what we want since we don't want a brand new cluster to be
> using old feature flag versions.

(Kowshik): Great point! However for case #1, I'm not sure why we need to
create a '/features' ZK node with disabled features. Instead, do you see
any drawback if we just do not create it? i.e. if IBP is less than 2.6, the
controller treats the case as though the versioning system is completely
disabled, and would not create a non-existing '/features' node.

I have updated the KIP with your suggestions, please refer to this section:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
%3A+Versioning+scheme+for+features#KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues


===

> UpdateFeaturesResponse#ErrorMessage should specify nullableVersions since
null is a valid value here
> Also, in the message format, the tags we use need to start at 0.

(Kowshik): Great point! Done.

===

> I don't think we need the FEATURE_UPDATE_IN_PROGRESS error code.  The
controller is basically single-threaded and will only do one of these
operations at once.  Even
> if it weren't, though, we could simply block the second operation behind
the first one.

(Kowshik): Great point! Done, eliminated now.

===

> For updateFeatures, it would be good to specify that if a single feature
version update in the batch can't be done, none of them are done.  I think
this was the
>intention, but I wasn't able to find it spelled out (maybe i missed it).

(Kowshik): Done. I have changed the wording a bit in the KIP to bring
emphasis to this point, although it was mentioned previously. Please refer
to this section: https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
%3A+Versioning+scheme+for+features#KIP-584
<https://issues.apache.org/jira/browse/KIP-584>
:Versioningschemeforfeatures-ChangestoKafkaController.

===

> And now, something a little bit bigger (sorry).  For finalized features,
why do we need both min_version_level and max_version_level?  Assuming that
we want all the
>  brokers to be on the same feature version level, we really only care
about three numbers for each feature, right?  The minimum supported version
level, the maximum
>  supported version level, and the current active version level.

> We don't actually want different brokers to be on different versions of
the same feature, right?  So we can just have one number for current
version level, rather
>  than two.  At least that's what I was thinking -- let me know if I
missed something.

(Kowshik): It is my understanding that the "current active version level"
that you have mentioned, is the "max_version_level". But we still
maintain/publish both min and max version levels, because, the detail about
min level is useful to external clients. This is described below.

For any feature F, think of the closed range: [min_version_level,
max_version_level] as the range of finalized versions, that's guaranteed to
be supported by all brokers in the cluster.
 - "max_version_level" is the finalized highest common version among all
brokers,
 - "min_version_level" is the finalized lowest common version among all
brokers.

Next, think of "client" here as the "user of the new feature versions
system". Imagine that such a client learns about finalized feature
versions, and exercises some logic based on the version. These clients can
be of 2 types:
1. Some part of the broker code itself could behave like a client trying to
use some feature that's "internal" to the broker cluster. Such a client
would learn the latest finalized features via ZK.
2. An external system (ex: Streams) could behave like a client, trying to
use some "external" facing feature. Such a client would learn latest
finalized features via ApiVersionsRequest. Ex: group_coordinator feature
described in the KIP.

Next, imagine that for F, the max_version_level is successfully bumped by
+1 (via Controller API). Now it is guaranteed that all brokers (i.e.
internal clients) understand max_version_level + 1. However, it is still
not guaranteed that all external clients have support for (or have
activated) the logic for the newer version. Why? Because, this is
subjective as explained next:

1. On one hand, imagine F as an internal feature only relevant to Brokers.
The binary for the internal client logic is controlled by Broker cluster
deployments. When shipping a new Broker release, we wouldn't bump max
"supported" feature version for F by 1, unless we have introduced some new
logic (with a potentially breaking change) in the Broker. Furthermore, such
feature logic in the broker should/will not be implemented in a way that it
would activate logic for an older feature version after it has migrated to
using the logic for a newer feature version (because this could break the
cluster!). For these cases, max_version_level will be very useful for
decision making.

2. On the other hand, imagine F as an external facing feature. External
clients are not within the control of Broker cluster. An external client
may not have upgraded it's code (yet) to use 'max_version_level + 1'. But,
the Kafka cluster could have been deployed with support for
'max_version_level + 1' of an external facing feature. Now, these external
clients (until an upgrade) are benefitted in learning "what is the lowest
common version for F among all brokers?". This is where the
"min_version_level" becomes useful. Using this, a client could learn the
specific supported versions that's lower than max_version_level (instead of
assuming that all brokers support the range: [1, max_version_level]). For
example, if the cluster deprecates "min_version_level", then the client
becomes aware because it periodically learns the latest "min_version_level"
via ApiVersionsRequest.


Cheers,
Kowshik




On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Kowshik,
>
> Thanks again for working on this-- it looks great.  I went over the KIP
> again and have a few more comments.
>
> ===
>
> It would be good to note that deprecating a feature version (in other
> words, increasing minVersionLevel on the broker) is an incompatible change,
> which requires a major release of Kafka.
>
> ===
>
> I think the strategy for transitioning from not having a /features znode
> to having one could use some work. The current proposal is to wait for all
> the brokers to fill in their feature znodes and then pick the highest
> common versions.  But that requires blocking in the controller startup code
> until the whole cluster is active (technically, a point in time which we
> never really know that we have reached...)
>
> Instead, I think it would be better to have a strategy like this:
> 1. If the controller comes up and there is no /features znode AND the IBP
> is less than 2.6, create a /features znode where all the features are
> disabled.
> 2. If the controller comes up and there is no /features znode AND the IBP
> is greater than or equal to 2.6, create a /features znode where all the
> features are enabled at the highest versions supported by the controller.
>
> People upgrading from the pre-KIP-584
> <https://issues.apache.org/jira/browse/KIP-584> world will end up in case
> #1 because they have to do a double roll to upgrade, and during the first
> roll, the IBP is unchanged.  People creating new clusters from scratch will
> end up in case #2, which is what we want since we don't want a brand new
> cluster to be using old feature flag versions.
>
> ===
>
> UpdateFeaturesResponse#ErrorMessage should specify nullableVersions since
> null is a valid value here
>
> Also, in the message format, the tags we use need to start at 0.
>
> ===
>
> I don't think we need the FEATURE_UPDATE_IN_PROGRESS error code.  The
> controller is basically single-threaded and will only do one of these
> operations at once.  Even if it weren't, though, we could simply block the
> second operation behind the first one.
>
> ===
>
> For updateFeatures, it would be good to specify that if a single feature
> version update in the batch can't be done, none of them are done.  I think
> this was the intention, but I wasn't able to find it spelled out (maybe i
> missed it).
>
> ===
>
> And now, something a little bit bigger (sorry).  For finalized features,
> why do we need both min_version_level and max_version_level?  Assuming that
> we want all the brokers to be on the same feature version level, we really
> only care about three numbers for each feature, right?  The minimum
> supported version level, the maximum supported version level, and the
> current active version level.
>
> We don't actually want different brokers to be on different versions of
> the same feature, right?  So we can just have one number for current
> version level, rather than two.  At least that's what I was thinking -- let
> me know if I missed something.
>
> best,
> Colin
>
>
> On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah wrote:
> > Thanks for the KIP! +1 (non-binding)
> >
> > On Tue, Apr 21, 2020 at 6:09 AM David Jacot <dja...@confluent.io> wrote:
> >
> > > Great KIP, thanks! +1 (non-binding)
> > >
> > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> > >
> > > > Thanks for the great KIP Kowshik, +1 (binding).
> > > >
> > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Kowshik,
> > > > >
> > > > > Thanks for the KIP. +1
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik Prakasam <
> > > > kpraka...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start a vote for KIP-584
> <https://issues.apache.org/jira/browse/KIP-584>. The link to the KIP can
> be
> > > found
> > > > > > here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > .
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>

Reply via email to