Hi Jun,

This is a fair question. I think there's a few different scenarios to consider:

1. mixed server software versions in a single cluster

2. new client software + old server software

3. old client software + new server software

In scenario #1 and #2, we have old (pre-3.9) server software in the mix. This 
old software won't support features like group.version and kraft.version. As we 
know, there are no features supported in 3.8 and older except metadata.version 
itself. So the fact that we leave out some stuff from the ApiVersionResponse 
isn't terribly significant. We weren't going to be able to enable those 
post-3.8 features anyway, since enabling a feature requires ALL server nodes to 
support it.

Scenario #3 is more interesting. With new server software, features like 
group.version and kraft.version may be enabled. But due to the KAFKA-17011 bug, 
we cannot accurately communicate the supported feature range back to the old 
client.

What is the impact of this? It depends on what the client is. Today, the only 
client that cares about feature versions is admin client, which can surface 
them through the Admin.describeFeatures API. So if we omit the supported 
feature range, admi client won't report it. If we fudge it by reporting it as 
1-1 instead of 0-1, admin client will report the fudged version.

In theory, there could be other clients looking at the supported feature ranges 
later, but I guess those will be post-3.8, if they ever exist, and so not 
subject to this problem.

AdminClient returns a separate map for "supported features" and "finalized 
features." So leaving out the supported versions for group.version and 
kraft.version will not prevent the client from returning the finalized versions 
of those features to the old client.

So basically we have a choice between missing information in 
Admin.describeFeatures and wrong information. I would lean towards the missing 
information path, but I guess we should try out an old build of 
kafka-features.sh against a server with one of the new features enabled, to 
make sure it looks the way we want.

best,
Colin


On Thu, Jun 27, 2024, at 14:01, Jun Rao wrote:
> Hi, Colin,
>
> ApiVersionResponse includes both supported and finalized features. If we
> only suppress features in the supported field, but not in the finalized
> field, it can potentially lead to inconsistency in the older client. For
> example, if a future feature supporting V0 is finalized in the broker, an
> old client issuing V3 of ApiVersionRequest will see the feature in the
> finalized field, but not in the supported field.
>
> An alternative approach is to still include all features in the supported
> field, but replace minVersion of 0 with 1. This may still lead to
> inconsistency if a future feature is finalized at version 0. However, since
> downgrading is less frequent than upgrading, this approach seems slightly
> more consistent.
>
> No matter what approach we take, it would be useful to document this
> inconsistency to the old client.
>
> Thanks,
>
> Jun
>
> On Wed, Jun 26, 2024 at 1:18 PM Jun Rao <j...@confluent.io> wrote:
>
>> Thanks for the reply, Justine and Colin. Sounds good to me.
>>
>> Jun
>>
>> On Wed, Jun 26, 2024 at 12:54 PM Colin McCabe <cmcc...@apache.org> wrote:
>>
>>> Hi Justine,
>>>
>>> Yes, that was what I was thinking.
>>>
>>> best,
>>> Colin
>>>
>>> On Mon, Jun 24, 2024, at 11:11, Justine Olshan wrote:
>>> > My understanding is that the tools that don't rely on ApiVersions should
>>> > still return 0s when it is the correct value. I believe these commands
>>> do
>>> > not require this API and thus can show 0 as versions.
>>> >
>>> > Likewise, when the old ApiVersionsRequest is used to describe features,
>>> we
>>> > can't return 0 versions and we won't be able to see group version set.
>>> > However, the new api will return 0 and the group version correctly.
>>> >
>>> > Let me know if this is consistent with your thoughts, Colin.
>>> >
>>> > Justine
>>> >
>>> > On Mon, Jun 24, 2024 at 10:44 AM Jun Rao <j...@confluent.io.invalid>
>>> wrote:
>>> >
>>> >> Hi, Colin,
>>> >>
>>> >> Thanks for the update. The proposed change seems reasonable to me.
>>> Just one
>>> >> clarification.
>>> >>
>>> >> The KIP can show version 0 of certain features with version-mapping
>>> >> and feature-dependencies. Will that part change? For example, will the
>>> tool
>>> >> show version 0 features with --release-version 3.8 or do we exclude
>>> them.
>>> >>
>>> >> bin/kafka-storage.sh version-mapping --release-version 3.6-IV1
>>> >>     metadata.version=13 (3.6-IV1)  transaction.version=0
>>> group.version=0
>>> >>     kraft.version=0
>>> >>
>>> >> Jun
>>> >>
>>> >> On Sat, Jun 22, 2024 at 2:19 PM José Armando García Sancio
>>> >> <jsan...@confluent.io.invalid> wrote:
>>> >>
>>> >> > Thanks for the update Colin. The changes make sense to me.
>>> >> >
>>> >> > Are you planning to update the KIP to reflect this new RPC version?
>>> It
>>> >> > would be good to document the semantics explained above in the KIP.
>>> >> >
>>> >> > Thanks!
>>> >> >
>>> >> > On Fri, Jun 21, 2024 at 8:22 PM Justine Olshan
>>> >> > <jols...@confluent.io.invalid> wrote:
>>> >> > >
>>> >> > > Ok makes sense. I will update my PR.
>>> >> > >
>>> >> > > On Fri, Jun 21, 2024 at 5:09 PM Colin McCabe <cmcc...@apache.org>
>>> >> wrote:
>>> >> > >
>>> >> > > > I think it's better to suppress the response in v3. The issue
>>> with
>>> >> > > > modifying it is that there may be scenarios where [1, 1] is the
>>> >> actual
>>> >> > > > supported range, and we'd want to know that. But leaving out the
>>> >> > feature
>>> >> > > > should be OK for older clients (it will be the case with clients
>>> old
>>> >> > enough
>>> >> > > > to send a v0, v1, or v2 ApiVersionsRequest anyway)
>>> >> > > >
>>> >> > > > best,
>>> >> > > > Colin
>>> >> > > >
>>> >> > > > On Fri, Jun 21, 2024, at 16:46, Justine Olshan wrote:
>>> >> > > > > Thanks Colin,
>>> >> > > > >
>>> >> > > > > This makes sense to me. Namely in the case where we perhaps
>>> don't
>>> >> > want to
>>> >> > > > > support version 0 anymore, we need the range to be able to not
>>> >> > include 0.
>>> >> > > > > (In other words, we can't assume 0 is supported)
>>> >> > > > > It is unfortunate that this change is a bit tricky, but I think
>>> >> it's
>>> >> > the
>>> >> > > > > best option.
>>> >> > > > >
>>> >> > > > > Can you clarify
>>> >> > > > >> The server will simply leave out the features whose minimum
>>> >> > supported
>>> >> > > > > value is 0 for clients that send v3
>>> >> > > > >
>>> >> > > > > For 3.8, I planned to set the 0s in the response to 1. Is it
>>> better
>>> >> > to
>>> >> > > > > suppress the zero version features in the response so we are
>>> >> > consistent
>>> >> > > > > between trunk and 3.8?
>>> >> > > > >
>>> >> > > > > Thanks,
>>> >> > > > > Justine
>>> >> > > > >
>>> >> > > > > On Fri, Jun 21, 2024 at 4:34 PM Colin McCabe <
>>> cmcc...@apache.org>
>>> >> > wrote:
>>> >> > > > >
>>> >> > > > >> Hi all,
>>> >> > > > >>
>>> >> > > > >> It seems that there was a bug in older versions of Kafka which
>>> >> > caused
>>> >> > > > >> deserialization problems when a supported feature range
>>> included
>>> >> 0.
>>> >> > For
>>> >> > > > >> example, the range for group.version of [0, 1] would be a
>>> problem
>>> >> in
>>> >> > > > this
>>> >> > > > >> situation.
>>> >> > > > >>
>>> >> > > > >> This obviously makes supportedVersions kind of useless. Any
>>> >> feature
>>> >> > that
>>> >> > > > >> doesn't exist today is effectively at v0 today (v0 is
>>> equivalent
>>> >> to
>>> >> > > > "off").
>>> >> > > > >> But if we can't declare that the server supports [0, 1] or
>>> >> similar,
>>> >> > we
>>> >> > > > >> can't declare that it supports the feature being off.
>>> Therefore,
>>> >> no
>>> >> > > > rolling
>>> >> > > > >> upgrades are possible.
>>> >> > > > >>
>>> >> > > > >> We noticed this bug during the 3.8 release when we noticed
>>> >> problems
>>> >> > in
>>> >> > > > >> upgrade tests. As an addendum to KIP-1022, we're adding the
>>> >> > following
>>> >> > > > >> solution:
>>> >> > > > >>
>>> >> > > > >> - There will be a new v4 for ApiVersionsRequest
>>> >> > > > >>
>>> >> > > > >> - Clients that sent v4 will promise to correctly handle ranges
>>> >> that
>>> >> > > > start
>>> >> > > > >> with 0, such as [0, 1]
>>> >> > > > >>
>>> >> > > > >> - The server will simply leave out the features whose minimum
>>> >> > supported
>>> >> > > > >> value is 0 for clients that send v3
>>> >> > > > >>
>>> >> > > > >> - ApiVersionsRequest v4 will be supported in AK 3.9 and
>>> above. AK
>>> >> > 3.8
>>> >> > > > will
>>> >> > > > >> ship with ApiVersionsRequest v3 just as today.
>>> >> > > > >>
>>> >> > > > >> thanks,
>>> >> > > > >> Colin
>>> >> > > > >>
>>> >> > > > >>
>>> >> > > > >> On Mon, Apr 15, 2024, at 11:01, Justine Olshan wrote:
>>> >> > > > >> > Hey folks,
>>> >> > > > >> >
>>> >> > > > >> > Thanks everyone! I will go ahead and call it.
>>> >> > > > >> > The KIP passes with the following +1 votes:
>>> >> > > > >> >
>>> >> > > > >> > - Andrew Schofield (non-binding)
>>> >> > > > >> > - David Jacot (binding)
>>> >> > > > >> > - José Armando García Sancio (binding)
>>> >> > > > >> > - Jun Rao (binding)
>>> >> > > > >> >
>>> >> > > > >> > Thanks again,
>>> >> > > > >> > Justine
>>> >> > > > >> >
>>> >> > > > >> > On Fri, Apr 12, 2024 at 11:16 AM Jun Rao
>>> >> <j...@confluent.io.invalid
>>> >> > >
>>> >> > > > >> wrote:
>>> >> > > > >> >
>>> >> > > > >> >> Hi, Justine,
>>> >> > > > >> >>
>>> >> > > > >> >> Thanks for the KIP. +1
>>> >> > > > >> >>
>>> >> > > > >> >> Jun
>>> >> > > > >> >>
>>> >> > > > >> >> On Wed, Apr 10, 2024 at 9:13 AM José Armando García Sancio
>>> >> > > > >> >> <jsan...@confluent.io.invalid> wrote:
>>> >> > > > >> >>
>>> >> > > > >> >> > Hi Justine,
>>> >> > > > >> >> >
>>> >> > > > >> >> > +1 (binding)
>>> >> > > > >> >> >
>>> >> > > > >> >> > Thanks for the improvement.
>>> >> > > > >> >> > --
>>> >> > > > >> >> > -José
>>> >> > > > >> >> >
>>> >> > > > >> >>
>>> >> > > > >>
>>> >> > > >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > -José
>>> >> >
>>> >>
>>>
>>

Reply via email to