Hi David, Thanks for the great KIP! It's good to see KIP-584 is starting to get used.
Few comments below. 7001. In the UpdateFeaturesRequest definition, there is a newly introduced ForceDowngrade parameter. There is also an existing AllowDowngrade parameter. My understanding is that the AllowDowngrade flag could be set in a FeatureUpdate whenever a downgrade is attempted via the Admin.updateFeatures API. Then, the ForceDowngrade flag could be set if we would like to ask the controller to proceed with the downgrade even if it was deemed unsafe. Could we document the distinction between the two flags in the KIP? If we can just keep one of the flags, that would simplify things. But, as it stands it seems like we will need both flags. To avoid confusions, does it make sense to deprecate the dual boolean flags and instead change the updateFeatures API to use an enum parameter called DOWNGRADE_REQUEST_TYPE with 3 values: NONE (default value meaning no downgrade is allowed), DOWNGRADE_SAFE (ask the controller to downgrade if the operation is safe) and DOWNGRADE_UNSAFE (ask the controller to downgrade disregarding safety)? 7002. The KIP introduces ForceDowngrade flag into UpdateFeaturesRequest definition. But the KIP also introduces a generic --force CLI flag in the kafka-features.sh tool. Should we call the CLI flag instead as --force-downgrade instead so that its intent is more specific? 7003. Currently the kafka-features.sh tool does not yet implement the "Advanced CLI usage" explained in KIP-584. The associated jira is KAFKA-10621. For the needs of this KIP, do you need the advanced CLI or would the basic version work? 7004. KIP-584 feature flag max version is typically incremented to indicate a breaking change. It usually means that version level downgrades would break something. Could this KIP explain why it is useful to support a lossy downgrade for the metadata.version feature flag? i.e. what are some situations when a lossy downgrade is useful? 7005. Regarding downgrade, I read the `enum MetadataVersions` type introduced in this KIP that captures the rules for backwards compatibility. For my understanding, is this enum an implementation detail on the controller specific to this feature flag? i.e. when processing the UpdateFeaturesRequest, would the controller introduce a flag-specific logic (based on the enum values) to decide whether a downgrade is allowed? Cheers, Kowshik On Tue, Oct 12, 2021 at 2:13 PM David Arthur <mum...@gmail.com> wrote: > Jun and Colin, thanks very much for the comments. See below > > 10. Colin, I agree that ApiVersionsRequest|Response is the most > straightforward approach here > > 11. This brings up an interesting point. Once a UpdateFeature request is > finished processing, a subsequent "describe features" (ApiVersionsRequest) > made by the admin client will go to a random broker, and so might not > reflect the feature update. We could add blocking behavior to the admin > client so it would polls ApiVersions for some time until the expected > version is reflected on that broker. However, this does not mean _every_ > broker has finished upgrading/downgrading -- just the one handling that > request. Maybe we could have the admin client poll all the brokers until > the expected version is seen. > > If at a later time a broker comes online that needs to process an > upgrade/downgrade, I don't think there will be a problem since the broker > will be fenced until it catches up on latest metadata (which will include > the feature level). > > 12. Yes, we will need changes to the admin client for "updateFeatures". > I'll update the KIP to reflect this. > > 13. I'll expand the paragraph on the initial "metadata.version" into its > own section and add some detail. > > 14/15. As mentioned, I think we can avoid this and rely on > brokers/controllers generating their own snapshots. We will probably want > some kind of manual recovery mode where we can force load a snapshot, but > that is out of scope here (I think..) > > 16. Automatic upgrades should be feasible, but I think we will want to > start with manual upgrades (while we work out the design and fix bugs). > Following the design detailed in this KIP, we could have a controller > component that (as you suggest) automatically finalizes the feature to the > max of all broker supported versions. I can include a section on this or we > could defer to a future KIP. WDYT? > > -David > > > On Tue, Oct 12, 2021 at 1:57 PM Colin McCabe <cmcc...@apache.org> wrote: > > > On Thu, Oct 7, 2021, at 17:19, Jun Rao wrote: > > > Hi, David, > > > > > > Thanks for the KIP. A few comments below. > > > > > > 10. It would be useful to describe how the controller node determines > the > > > RPC version used to communicate to other controller nodes. There seems > to > > > be a bootstrap problem. A controller node can't read the log and > > > therefore the feature level until a quorum leader is elected. But > leader > > > election requires an RPC. > > > > > > > Hi Jun, > > > > I agree that we need to figure this out. I think it would be best to > > simply use ApiVersionsRequest and ApiVersionsResponse. That way each > > controller can use the appropriate RPC versions when communicating with > > each other controller. This would allow us to upgrade them one by one. > > > > > 11. For downgrades, it would be useful to describe how to determine the > > > downgrade process (generating new snapshot, propagating the snapshot, > > etc) > > > has completed. We could block the UpdateFeature request until the > process > > > is completed. However, since the process could take time, the request > > could > > > time out. Another way is through DescribeFeature and the server only > > > reports downgraded versions after the process is completed. > > > > Hmm.. I think we need to avoid blocking, since we don't know how long it > > will take for all nodes to act on the downgrade request. After all, some > > nodes may be down. > > > > But I agree we should have some way of knowing when the upgrade is done. > > DescribeClusterResponse seems like the natural place to put information > > about each node's feature level. While we're at it, we should also add a > > boolean to indicate whether the given node is fenced. (This will always > be > > false for ZK mode, of course...) > > > > > > > > 12. Since we are changing UpdateFeaturesRequest, do we need to change > the > > > AdminClient api for updateFeatures too? > > > > > > 13. For the paragraph starting with "In the absence of an operator > > defined > > > value for metadata.version", in KIP-584, we described how to finalize > > > features with New cluster bootstrap. In that case, it's inconvenient > for > > > the users to have to run an admin tool to finalize the version for each > > > feature. Instead, the system detects that the /features path is missing > > in > > > ZK and thus automatically finalizes every feature with the latest > > supported > > > version. Could we do something similar in the KRaft mode? > > > > > > > Yes, I think we need to have a section describing how this ties into > > creating new clusters. The simplest thing is probably to have the > > controller notice that there are no FeatureRecords at all, and then > create > > a record for the latest metadata.version. This is analogous to how we > > assume the latest IBP if no IBP is configured. > > > > There is also the question of how to create a cluster that starts up with > > something other than the latest metadata.version. We could have a config > > for that, like initial.metadata.version, or pass a flag to the > > controllers... alternately, we could pass a flag to "kafka-storage.sh > > format". > > > > > 14. After the quorum leader generates a new snapshot, how do we force > > other > > > nodes to pick up the new snapshot? > > > > > > 15. I agree with Jose that it will be useful to describe when > generating > > a > > > new snapshot is needed. To me, it seems the new snapshot is only needed > > > when incompatible changes are made. > > > > > > > I think it would be good to always generate a snapshot right before the > > upgrade. Then, if the upgrade goes wrong, we have a metadata state we > could > > revert back to, albeit with some effort and potential data loss. But, I > > agree that the rationale for this should be spelled out in the KIP. > > > > I also think that the brokers should generate their own snapshots rather > > than fetching from the controller, both in the upgrade and downgrade > case. > > Jose mentioned this earlier and I agree. > > > > best, > > Colin > > > > > 7. Jose, what control records were you referring? > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Tue, Oct 5, 2021 at 8:53 AM David Arthur <davidart...@apache.org> > > wrote: > > > > > >> Jose, thanks for the thorough review and comments! > > >> > > >> I am out of the office until next week, so I probably won't be able to > > >> update the KIP until then. Here are some replies to your questions: > > >> > > >> 1. Generate snapshot on upgrade > > >> > > Metadata snapshot is generated and sent to the other nodes > > >> > Why does the Active Controller need to generate a new snapshot and > > >> > force a snapshot fetch from the replicas (inactive controller and > > >> > brokers) on an upgrade? Isn't writing the FeatureLevelRecord good > > >> > enough to communicate the upgrade to the replicas? > > >> > > >> > > >> You're right, we don't necessarily need to _transmit_ a snapshot, > since > > >> each node can generate its own equivalent snapshot > > >> > > >> 2. Generate snapshot on downgrade > > >> > > Metadata snapshot is generated and sent to the other inactive > > >> > controllers and to brokers (this snapshot may be lossy!) > > >> > Why do we need to send this downgraded snapshot to the brokers? The > > >> > replicas have seen the FeatureLevelRecord and noticed the downgrade. > > >> > Can we have the replicas each independently generate a downgraded > > >> > snapshot at the offset for the downgraded FeatureLevelRecord? I > assume > > >> > that the active controller will guarantee that all records after the > > >> > FatureLevelRecord use the downgraded version. If so, it would be > good > > >> > to mention that explicitly. > > >> > > >> > > >> Similar to above, yes a broker that detects a downgrade via > > >> FeatureLevelRecord could generate its own downgrade snapshot and > reload > > its > > >> state from that. This does get a little fuzzy when we consider cases > > where > > >> brokers are on different software versions and could be generating a > > >> downgrade snapshot for version X, but using different versions of the > > code. > > >> It might be safer to let the controller generate the snapshot so each > > >> broker (regardless of software version) gets the same records. > However, > > for > > >> upgrades (or downgrades) we expect the whole cluster to be running the > > same > > >> software version before triggering the metadata.version change, so > > perhaps > > >> this isn't a likely scenario. Thoughts? > > >> > > >> > > >> 3. Max metadata version > > >> > >For the first release that supports metadata.version, we can simply > > >> > initialize metadata.version with the current (and only) version. For > > >> future > > >> > releases, we will need a mechanism to bootstrap a particular > version. > > >> This > > >> > could be done using the meta.properties file or some similar > > mechanism. > > >> The > > >> > reason we need the allow for a specific initial version is to > support > > the > > >> > use case of starting a Kafka cluster at version X with an older > > >> > metadata.version. > > >> > > >> > > >> I assume that the Active Controller will learn the metadata version of > > >> > the broker through the BrokerRegistrationRequest. How will the > Active > > >> > Controller learn about the max metadata version of the inactive > > >> > controller nodes? We currently don't send a registration request > from > > >> > the inactive controller to the active controller. > > >> > > >> > > >> This came up during the design, but I neglected to add it to the KIP. > We > > >> will need a mechanism for determining the supported features of each > > >> controller similar to how brokers use BrokerRegistrationRequest. > Perhaps > > >> controllers could write a FeatureLevelRecord (or similar) to the > > metadata > > >> log indicating their supported version. WDYT? > > >> > > >> Why do you need to bootstrap a particular version? Isn't the intent > > >> > that the broker will learn the active metadata version by reading > the > > >> > metadata before unfencing? > > >> > > >> > > >> This bootstrapping is needed for when a KRaft cluster is first > started. > > If > > >> we don't have this mechanism, the cluster can't really do anything > until > > >> the operator finalizes the metadata.version with the tool. The > > >> bootstrapping will be done by the controller and the brokers will see > > this > > >> version as a record (like you say). I'll add some text to clarify > this. > > >> > > >> > > >> 4. Reject Registration - This is related to the bullet point above. > > >> > What will be the behavior of the active controller if the broker > sends > > >> > a metadata version that is not compatible with the cluster wide > > >> > metadata version? > > >> > > >> > > >> If a broker starts up with a lower supported version range than the > > current > > >> cluster metadata.version, it should log an error and shutdown. This is > > in > > >> line with KIP-584. > > >> > > >> 5. Discover upgrade > > >> > > This snapshot will be a convenient way to let broker and > controller > > >> > components rebuild their entire in-memory state following an > upgrade. > > >> > Can we rely on the presence of the FeatureLevelRecord for the > metadata > > >> > version for this functionality? If so, it avoids having to reload > the > > >> > snapshot. > > >> > > >> > > >> For upgrades, yes probably since we won't need to "rewrite" any > records > > in > > >> this case. For downgrades, we will need to generate the snapshot and > > reload > > >> everything. > > >> > > >> 6. Metadata version specification > > >> > > V4(version=4, isBackwardsCompatible=false, description="New > > metadata > > >> > record type Bar"), > > >> > > >> > > >> Very cool. Do you have plans to generate Apache Kafka HTML > > >> > documentation for this information? Would be helpful to display this > > >> > information to the user using the kafka-features.sh and feature RPC? > > >> > > >> > > >> Hm good idea :) I'll add a brief section on documentation. This would > > >> certainly be very useful > > >> > > >> 7.Downgrade records > > >> > I think we should explicitly mention that the downgrade process will > > >> > downgrade both metadata records and controller records. In KIP-630 > we > > >> > introduced two control records for snapshots. > > >> > > >> > > >> Yes, good call. Let me re-read that KIP and include some details. > > >> > > >> Thanks again for the comments! > > >> -David > > >> > > >> > > >> On Wed, Sep 29, 2021 at 5:09 PM José Armando García Sancio > > >> <jsan...@confluent.io.invalid> wrote: > > >> > > >> > One more comment. > > >> > > > >> > 7.Downgrade records > > >> > I think we should explicitly mention that the downgrade process will > > >> > downgrade both metadata records and controller records. In KIP-630 > we > > >> > introduced two control records for snapshots. > > >> > > > >> > > > > > -- > David Arthur >