Hi, Kowshik,

Thanks for the KIP. Looks good overall. A few comments below.

100. UpdateFeaturesRequest/UpdateFeaturesResponse
100.1 Since this request waits for responses from brokers, should we add a
timeout in the request (like createTopicRequest)?
100.2 The response schema is a bit weird. Typically, the response just
shows an error code and an error message, instead of echoing the request.
100.3 Should we add a separate request to list/describe the existing
features?
100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
DELETE, the version field doesn't make sense. So, I guess the broker just
ignores this? An alternative way is to have a separate DeleteFeaturesRequest
100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
version of the metadata for finalized features." I am wondering why the
ordering is important?
100.6 Could you specify the required ACL for this new request?

101. For the broker registration ZK node, should we bump up the version in
the json?

102. For the /features ZK node, not sure if we need the epoch field. Each
ZK node has an internal version field that is incremented on every update.

103. "Enabling the actual semantics of a feature version cluster-wide is
left to the discretion of the logic implementing the feature (ex: can be
done via dynamic broker config)." Does that mean the broker registration ZK
node will be updated dynamically when this happens?

104. UpdateMetadataRequest
104.1 It would be useful to describe when the feature metadata is included
in the request. My understanding is that it's only included if (1) there is
a change to the finalized feature; (2) broker restart; (3) controller
failover.
104.2 The new fields have the following versions. Why are the versions 3+
when the top version is bumped to 6?
      "fields":  [
        {"name": "Name", "type":  "string", "versions":  "3+",
          "about": "The name of the feature."},
        {"name":  "Version", "type":  "int64", "versions":  "3+",
          "about": "The finalized version for the feature."}
      ]

105. kafka-features.sh: Instead of using update/delete, perhaps it's better
to use enable/disable?

Jun

On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kpraka...@confluent.io>
wrote:

> Hey Boyang,
>
> Thanks for the great feedback! I have updated the KIP based on your
> feedback.
> Please find my response below for your comments, look for sentences
> starting
> with "(Kowshik)" below.
>
>
> > 1. "When is it safe for the brokers to begin handling EOS traffic" could
> be
> > converted as "When is it safe for the brokers to start serving new
> > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > context.
>
> (Kowshik): Great point! Done.
>
> > 2. In the *Explanation *section, the metadata version number part seems a
> > bit blurred. Could you point a reference to later section that we going
> to
> > store it in Zookeeper and update it every time when there is a feature
> > change?
>
> (Kowshik): Great point! Done. I've added a reference in the KIP.
>
>
> > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > features such as group coordinator semantics, there is no legal scenario
> to
> > perform a downgrade at all. So having downgrade door open is pretty
> > error-prone as human faults happen all the time. I'm assuming as new
> > features are implemented, it's not very hard to add a flag during feature
> > creation to indicate whether this feature is "downgradable". Could you
> > explain a bit more on the extra engineering effort for shipping this KIP
> > with downgrade protection in place?
>
> (Kowshik): Great point! I'd agree and disagree here. While I agree that
> accidental
> downgrades can cause problems, I also think sometimes downgrades should
> be allowed for emergency reasons (not all downgrades cause issues).
> It is just subjective to the feature being downgraded.
>
> To be more strict about feature version downgrades, I have modified the KIP
> proposing that we mandate a `--force-downgrade` flag be used in the
> UPDATE_FEATURES api
> and the tooling, whenever the human is downgrading a finalized feature
> version.
> Hopefully this should cover the requirement, until we find the need for
> advanced downgrade support.
>
> > 4. "Each broker’s supported dictionary of feature versions will be
> defined
> > in the broker code." So this means in order to restrict a certain
> feature,
> > we need to start the broker first and then send a feature gating request
> > immediately, which introduces a time gap and the intended-to-close
> feature
> > could actually serve request during this phase. Do you think we should
> also
> > support configurations as well so that admin user could freely roll up a
> > cluster with all nodes complying the same feature gating, without
> worrying
> > about the turnaround time to propagate the message only after the cluster
> > starts up?
>
> (Kowshik): This is a great point/question. One of the expectations out of
> this KIP, which is
> already followed in the broker, is the following.
>  - Imagine at time T1 the broker starts up and registers it’s presence in
> ZK,
>    along with advertising it’s supported features.
>  - Imagine at a future time T2 the broker receives the
> UpdateMetadataRequest
>    from the controller, which contains the latest finalized features as
> seen by
>    the controller. The broker validates this data against it’s supported
> features to
>    make sure there is no mismatch (it will shutdown if there is an
> incompatibility).
>
> It is expected that during the time between the 2 events T1 and T2, the
> broker is
> almost a silent entity in the cluster. It does not add any value to the
> cluster, or carry
> out any important broker activities. By “important”, I mean it is not doing
> mutations
> on it’s persistence, not mutating critical in-memory state, won’t be
> serving
> produce/fetch requests. Note it doesn’t even know it’s assigned partitions
> until
> it receives UpdateMetadataRequest from controller. Anything the broker is
> doing up
> until this point is not damaging/useful.
>
> I’ve clarified the above in the KIP, see this new section:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> .
>
> > 5. "adding a new Feature, updating or deleting an existing Feature", may
> be
> > I misunderstood something, I thought the features are defined in broker
> > code, so admin could not really create a new feature?
>
> (Kowshik): Great point! You understood this right. Here adding a feature
> means we are
> adding a cluster-wide finalized *max* version for a feature that was
> previously never finalized.
> I have clarified this in the KIP now.
>
> > 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS
> to
> > reject a concurrent feature update request.
>
> (Kowshik): Great point! I have modified the KIP adding the above (see
> 'Tooling support -> Admin API changes').
>
> > 7. I think we haven't discussed the alternative solution to pass the
> > feature information through Zookeeper. Is that mentioned in the KIP to
> > justify why using UpdateMetadata is more favorable?
>
> (Kowshik): Nice question! The broker reads finalized feature info stored in
> ZK,
> only during startup when it does a validation. When serving
> `ApiVersionsRequest`, the
> broker does not read this info from ZK directly. I'd imagine the risk is
> that it can increase
> the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
> we use the
> controller to fan out ZK updates to brokers and we want to stick to that
> pattern to avoid
> the ZK read bottleneck when serving `ApiVersionsRequest`.
>
> > 8. I was under the impression that user could configure a range of
> > supported versions, what's the trade-off for allowing single finalized
> > version only?
>
> (Kowshik): Great question! The finalized version of a feature basically
> refers to
> the cluster-wide finalized feature "maximum" version. For example, if the
> 'group_coordinator' feature
> has the finalized version set to 10, then, it means that cluster-wide all
> versions upto v10 are
> supported for this feature. However, note that if some version (ex: v0)
> gets deprecated
> for this feature, then we don’t convey that using this scheme (also
> supporting deprecation is a non-goal).
>
> (Kowshik): I’ve now modified the KIP at all points, refering to finalized
> feature "maximum" versions.
>
> > 9. One minor syntax fix: Note that here the "client" here may be a
> producer
>
> (Kowshik): Great point! Done.
>
>
> Cheers,
> Kowshik
>
>
> On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <reluctanthero...@gmail.com>
> wrote:
>
> > Hey Kowshik,
> >
> > thanks for the revised KIP. Got a couple of questions:
> >
> > 1. "When is it safe for the brokers to begin handling EOS traffic" could
> be
> > converted as "When is it safe for the brokers to start serving new
> > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > context.
> >
> > 2. In the *Explanation *section, the metadata version number part seems a
> > bit blurred. Could you point a reference to later section that we going
> to
> > store it in Zookeeper and update it every time when there is a feature
> > change?
> >
> > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > features such as group coordinator semantics, there is no legal scenario
> to
> > perform a downgrade at all. So having downgrade door open is pretty
> > error-prone as human faults happen all the time. I'm assuming as new
> > features are implemented, it's not very hard to add a flag during feature
> > creation to indicate whether this feature is "downgradable". Could you
> > explain a bit more on the extra engineering effort for shipping this KIP
> > with downgrade protection in place?
> >
> > 4. "Each broker’s supported dictionary of feature versions will be
> defined
> > in the broker code." So this means in order to restrict a certain
> feature,
> > we need to start the broker first and then send a feature gating request
> > immediately, which introduces a time gap and the intended-to-close
> feature
> > could actually serve request during this phase. Do you think we should
> also
> > support configurations as well so that admin user could freely roll up a
> > cluster with all nodes complying the same feature gating, without
> worrying
> > about the turnaround time to propagate the message only after the cluster
> > starts up?
> >
> > 5. "adding a new Feature, updating or deleting an existing Feature", may
> be
> > I misunderstood something, I thought the features are defined in broker
> > code, so admin could not really create a new feature?
> >
> > 6. I think we need a separate error code like FEATURE_UPDATE_IN_PROGRESS
> to
> > reject a concurrent feature update request.
> >
> > 7. I think we haven't discussed the alternative solution to pass the
> > feature information through Zookeeper. Is that mentioned in the KIP to
> > justify why using UpdateMetadata is more favorable?
> >
> > 8. I was under the impression that user could configure a range of
> > supported versions, what's the trade-off for allowing single finalized
> > version only?
> >
> > 9. One minor syntax fix: Note that here the "client" here may be a
> producer
> >
> > Boyang
> >
> > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > Hi Colin,
> > > >
> > > > Thanks for the feedback! I've changed the KIP to address your
> > > > suggestions.
> > > > Please find below my explanation. Here is a link to KIP 584:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > .
> > > >
> > > > 1. '__data_version__' is the version of the finalized feature
> metadata
> > > > (i.e. actual ZK node contents), while the '__schema_version__' is the
> > > > version of the schema of the data persisted in ZK. These serve
> > different
> > > > purposes. '__data_version__' is is useful mainly to clients during
> > reads,
> > > > to differentiate between the 2 versions of eventually consistent
> > > 'finalized
> > > > features' metadata (i.e. larger metadata version is more recent).
> > > > '__schema_version__' provides an additional degree of flexibility,
> > where
> > > if
> > > > we decide to change the schema for '/features' node in ZK (in the
> > > future),
> > > > then we can manage broker roll outs suitably (i.e.
> > > > serialization/deserialization of the ZK data can be handled safely).
> > >
> > > Hi Kowshik,
> > >
> > > If you're talking about a number that lets you know if data is more or
> > > less recent, we would typically call that an epoch, and not a version.
> > For
> > > the ZK data structures, the word "version" is typically reserved for
> > > describing changes to the overall schema of the data that is written to
> > > ZooKeeper.  We don't even really change the "version" of those schemas
> > that
> > > much, since most changes are backwards-compatible.  But we do include
> > that
> > > version field just in case.
> > >
> > > I don't think we really need an epoch here, though, since we can just
> > look
> > > at the broker epoch.  Whenever the broker registers, its epoch will be
> > > greater than the previous broker epoch.  And the newly registered data
> > will
> > > take priority.  This will be a lot simpler than adding a separate epoch
> > > system, I think.
> > >
> > > >
> > > > 2. Regarding admin client needing min and max information - you are
> > > right!
> > > > I've changed the KIP such that the Admin API also allows the user to
> > read
> > > > 'supported features' from a specific broker. Please look at the
> section
> > > > "Admin API changes".
> > >
> > > Thanks.
> > >
> > > >
> > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> I've
> > > > improved the KIP to just use `long` at all places.
> > >
> > > Sounds good.
> > >
> > > >
> > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > > updated
> > > > the KIP sketching the functionality provided by this tool, with some
> > > > examples. Please look at the section "Tooling support examples".
> > > >
> > > > Thank you!
> > >
> > >
> > > Thanks, Kowshik.
> > >
> > > cheers,
> > > Colin
> > >
> > > >
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > Thanks, Kowshik, this looks good.
> > > > >
> > > > > In the "Schema" section, do we really need both __schema_version__
> > and
> > > > > __data_version__?  Can we just have a single version field here?
> > > > >
> > > > > Shouldn't the Admin(Client) function have some way to get the min
> and
> > > max
> > > > > information that we're exposing as well?  I guess we could have
> min,
> > > max,
> > > > > and current.  Unrelated: is the use of Long rather than long
> > deliberate
> > > > > here?
> > > > >
> > > > > It would be good to describe how the command line tool
> > > > > kafka.admin.FeatureCommand will work.  For example the flags that
> it
> > > will
> > > > > take and the output that it will generate to STDOUT.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I've opened KIP-584 <
> https://issues.apache.org/jira/browse/KIP-584
> > >
> > > > > > which
> > > > > > is intended to provide a versioning scheme for features. I'd like
> > to
> > > use
> > > > > > this thread to discuss the same. I'd appreciate any feedback on
> > this.
> > > > > > Here
> > > > > > is a link to KIP-584:
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > >  .
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kowshik
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to