On Fri, May 2, 2025, at 09:54, José Armando García Sancio wrote:
> Hi Colin,
>
> On Fri, Apr 25, 2025 at 5:42 PM Colin McCabe <cmcc...@apache.org> wrote:
>> > The "initiating the downgrade" section states "It will also check that
>> > the requested metadata version is compatible with all the other
>> > KIP-584 features that are enabled." What do you mean by this?
>>
> > So, in Kafka version 4.0, there are no KIP-584 features that rely on a 
> > specific metadata version. However, there could
> > be some in the future. This is just a statement that if we do have these 
> > kind of cross-feature dependencies in the
> > future, we will enforce them.
>
> Thanks. I took a look at KIP-584 and KIP-1022. Maybe I missed it but
> neither of them define how dependencies are specified or checked. I
> suspect that we will need a KIP if we ever add dependencies between
> features.
>

Hi José,

KIP-1022 does discuss feature dependencies:

> A Note on Feature Interdependence
>
 > One aspect of compatibility is how we set features that depend on each
 > other. KIP-584 was built to allow for interdependence. In other words, one
 > feature may depend on another and a cluster may have valid “ranges” of
 > versions for each feature depending on which features are enabled.
 >
 > If feature Y version 12 depends on feature X version 14, we need to handle
 > setting and upgrading these features properly. Given a cluster where
 > feature is X version 13, Y version 12 is not compatible and an error
 > should be thrown if trying to format with X version 13 and Y version 12 OR
 > when trying to upgrade to Y version 12 when X is version 13. The only way
 > to get to Y version 12 is by upgrading to X version 14 first or at the
 > same time as setting Y version 12. Likewise, if we wanted to downgrade
 > feature X to version 12, we would need to do it at the same time as
 > downgrading feature Y or by doing Y first.
 >
 > The tool should ensure that the features set are valid. This is done by
 > marking features as dependent on others internally.
 >
 > For this change specifically, all features rely on at least MV 3.3-IV0
 > (since this enables the writing of features to the metadata topic as per
 > KIP-584). Any command to try to set a feature (either via storage or
 > upgrade tool) when metadata version is below 3.3-IV0 will fail. (If an RPC
 > is sent, it will also fail as the version is not supported)
 >
 > The UpdateFeaturesReponse will fail fast and contain a top level error if
 > any feature fails to update and no updates will persist if a single
 > feature in the request fails validation. This will be accompanied with
 > UpdatableFeatureResult.ErrorMessage an error message explaining the
 > dependency.

That seems pretty clear? It's also already implemented (although not used since 
we don't have any inter-dependent features yet).

>> > In the same section, you have "If the metadata version is too old to
>> > support KIP-919 controller registrations (pre 3.7.0-IV0), we will
>> > simply assume that the controllers do support downgrade, because we
>> > cannot check. This is similar to how metadata version upgrade is
>> > handled in the absence of controller registrations." What happens if
>> > this assumption is incorrect, the controller doesn't support
>> > downgrade? Is there something that the user can do and we can document
>> > to mitigate any issues that result from this?
> > This is the scenario where some controllers are running the 3.6.0 release 
> > or older, and some controllers are running
> > 4.2 or newer, on a very old metadata version without controller 
> > registrations. There really isn't anything we can do to
> > detect this since by definition, the pre-3.7-IV3 metadata version doesn't 
> > have controller registrations.
>
> Why are the controllers running 3.6.x? They could be running 4.0.x but
> the MV hasn't been upgraded to 3.7 (when controller registration was
> added), no?
>

I agree that running the 3.6.x software version on some nodes, and 4.0.x one 
others would be very odd. 3.6 isn't even one of our supported software versions 
any more. Right now we're supporting 4.0, 3.9, 3.8. I was just trying to give 
an example. Perhaps we can simply agree to move on, since the bad case here 
relies on using unsupported software versions to do an unsupported operation.

>> Any problems that result from this should be clearable by restarting the 
>> ancient controller processes, though (since
> > they will then load the snapshot at the older MV)
>
> Are you assuming that the user is going to upgrade the software
> version when they restart the "ancient" controller processes?
>

No, I was not assuming that the software would be upgraded.

> I am trying to think of what would happen if the controller(s) on
> software version <= 4.0.x become leader after a controller downgrades
> the version. The controllers on software version <= 4.0.x will still
> have metadata records serialized using the old MV, no?

All the controllers will have metadata records using the previous MV. For 
pre-KIP-1155 controllers, they may not have a snapshot at the new MV.

>
>> > In the same section, you have "If the updateFeatures RPC specified
>> > multiple features, the metadata version downgrade record will be
>> > emitted last." Is there a reason why this is required? I think that
>> > all of the records would be in the same record batch so it is not
>> > clear to me why you need the invariant that the metadata version is
>> > last.
>> It's required because in theory the other features could depend on 
>> metadata.version. So we want to avoid
>> the potential invalid state where MV has been downgraded but the features 
>> which depend on it have not.
>
> I still don't fully understand why this is required. First, assuming
> feature dependency is supported, the dependency can be between any
> feature version. I know that you are not designing for any feature
> downgrade but the dependency could be between transaction.version and
> group.version. Do these feature level versions also need to be ordered
> if in the future we support such downgrades?
>

In general, if feature X depends on feature Y, I would expect X to be 
downgraded prior to Y. Since we don't have such dependencies right now, I 
didn't discuss that. But perhaps we should add a note.

>> > In the same section, you have "When a broker or controller replays the
>> > new FeatureLevelRecord telling it to change the metadata.version, it
>> > will immediately trigger a writing a __cluster_metadata snapshot with
>> > the new metadata version." I assume that a new snapshot will be
>> > generated only when the metadata version decreases, is that correct?
>> > Can you explain how a change in metadata version will be detected? For
>> > example, the FeatureLevelRecord may be in the cluster metadata
>> > checkpoint. In that case a new checkpoint doesn't need to be
>> > generated. I get the impression that this solution depends on the
>> > KRaft listener always returning the latest checkpoint in the
>> > RaftClient#handleLoadSnapshot. If so, should we make that explicit?
>> >
>>
> > Hmm, I'm a bit confused about the question. The current Raft interfaces do 
> > distinguish between records we
> > load from a snapshot and records we load from the log. So there is no risk 
> > of confusing the FeatureLevelRecord
> > we loaded from the latest snapshot with the FeatureLevelRecord we loaded 
> > from the log.
>
> Right, I was thinking of the following log and snapshot:
>
> Snapshot at end offset 100. All records between 0 and 99, inclusive,
> have been included in the snapshot. The MV in the snapshot is X
> offset 100 -> metadata.version = Y -- MV was previously upgraded to version Y
> ...             ->   ...  -- All of these records are serialized using
> version Y.
> offset 110 -> metadata.version = X -- MV was downgraded to X.
>
> Before a snapshot that includes offset 110 (MV 3.9) could be generated
> the node restarts. How would the code identify that the records
> between 100 and 110 need to be snapshotted using metadata version 3.9?
> Note that the metadata loader can batch all of the records between 100
> and 110 into one delta.

You loaded a snapshot with metadata version Y. You replayed a record changing 
the metadata version to X. We already specified that this will cause us to 
generate a new snapshot. The snapshot will presumably be generated with offset 
110 since that's the offset that changed the MV. I suppose it could also have a 
slightly larger offset if the loader batched more.

>
>> > Given that checkpoint generation is asynchronous from committing the
>> > new metadata version, should we have metrics (or a different
>> > mechanism) that the user can monitor to determine when it is safe to
>> > downgrade the software version of a node?
>> >
>> That's a fair point. We should have some documentation about waiting for a 
>> snapshot file to appear
> > with a date later than the RPC date, I guess.
>
> I was also thinking if we can expose this in a metric. E.g.
> LastSnapshottedMetadataVersion. If that metric is X, the user must run
> software versions that are greater than or equal to X. What do you
> think? The metric can expose the numerical MV and not the "string" MV.
>

Yeah, "last snapshot MV" seems like a reasonable metric.

best,
Colin

> Thanks,
> -- 
> -José

Reply via email to