Hi Alexandre,

Thank you for taking a look!

100. I am not sure I fully understand what you mean by forcefully adding
tagged fields. Let's say VX does not have a flexible version,
VY allows deserialization but serializes with a non-flexible version, and
VZ introduces a new tagged field.
VX upgrade to VY then downgrade back to VX works because even if group
metadata changes VY will serialize with
the highest non-flexible version. VZ to VY back to VZ also works because
even if VY serializes with a non-flexible field
VZ will be able to deserialize it as it is a supported version. Does this
answer your question?

101. The future versioning scheme needs to be backward compatible with
older coordinators. Wouldn't segregating into 2 versions
be incompatible?

Thanks,
Jeff

On Tue, Mar 28, 2023 at 5:47 AM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi Jeff, Team,
>
> Thank you for the KIP. This is a very interesting approach. I feel it
> is simpler than the described alternative although it comes with
> tradeoffs, thanks for highlighting those. If I may, I would like to
> share two naive questions.
>
> 100. The KIP mentions that records will be serialised with the highest
> non-flexible version (e.g. 3 for GroupMetadataValue and
> OffsetCommitValue) so that records can be deserialized with earlier
> versions of Kafka. I am not sure I understand correctly: is the idea
> to forcefully add tagged fields at the end of the records while
> maintaining the existing version (3 for the two record types just
> mentioned) so that they can be deserialized by existing Kafka versions
> for which the version of these record types is not known as flexible,
> while at the same time preserving the new tagged fields to new Kafka
> versions abreast of the addition of a new flexible version for these
> record types? If so, is it "bypassing" the protocol convention which
> prescribes the use of a flexible version to allow the use of tagged
> fields?
>
> 101. After the bump of the records to a new version indicated as
> flexible, the record version is expected to never change while the
> underlying tagged fields could potentially still evolve over time. One
> potential downside is that we lose the benefits of the versioning
> scheme enforced by the serde protocol. Could this become a problem in
> the future if there is ever a need to segregate two distinct
> "versions" of the appended record structure held by the tagged fields?
>
> Thanks,
> Alexandre
>
> Le jeu. 23 mars 2023 à 18:15, Jeff Kim <jeff....@confluent.io.invalid> a
> écrit :
> >
> > Hi Yi,
> >
> > > Does it mean with a flexible version, the future
> > version of these value types will stay at the version where the
> flexibility
> > is first introduced? Will there be any need to bump the version again in
> > the future?
> >
> > Yes, there will be no need to bump the version since we will only be
> adding
> > tagged fields but in the chance the version is bumped, we will
> deserialize
> > to the highest known (flexible) version which will ignore unknown tagged
> > fields.
> >
> > > To enforce the version not bumping, is it possible to have a unit test
> to
> > gate?
> >
> > Do you have some tests in mind? I don't think we can tell whether a
> version
> > was bumped or not.
> >
> > Best,
> > Jeff
> >
> > On Thu, Mar 23, 2023 at 12:07 PM Yi Ding <yd...@confluent.io.invalid>
> wrote:
> >
> > > Hi Jeff,
> > >
> > > Thanks for the update. Does it mean with a flexible version, the future
> > > version of these value types will stay at the version where the
> flexibility
> > > is first introduced? Will there be any need to bump the version again
> in
> > > the future?
> > > To enforce the version not bumping, is it possible to have a unit test
> to
> > > gate?
> > >
> > >
> > > On Wed, Mar 22, 2023 at 3:19 PM Jeff Kim <jeff....@confluent.io.invalid
> >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > After discussing with my colleagues, I have repurposed the KIP for a
> > > > general downgrade solution for both transaction and group
> coordinators.
> > > The
> > > > KIP no longer discusses the downgrade path but instead lays out the
> > > > foundation for future downgrade solutions.
> > > >
> > > > Link:
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Txn+and+Group+Coordinator+Downgrade+Foundation
> > > >
> > > > Thanks,
> > > > Jeff
> > > >
> > > > On Mon, Mar 20, 2023 at 7:37 PM Jeff Kim <jeff....@confluent.io>
> wrote:
> > > >
> > > > > Hi David and Justine,
> > > > >
> > > > > Thank you both for the detailed feedback.
> > > > >
> > > > > David,
> > > > >
> > > > > 1. That makes sense. I revised the "Reading new fields" section
> with
> > > how
> > > > > we can downgrade to the highest known version and that this was
> > > confirmed
> > > > > via unit testing. I also attempted to dive deeper into using tagged
> > > > fields
> > > > > and the rejected alternative. Please let me know what you think.
> > > > >
> > > > > 2. Under "Restrictions and Guidelines" I updated the paragraph to
> > > clearly
> > > > > state that we want to use tagged fields across all record types
> > > > introduced
> > > > > in KIP-848 including OffsetCommitValue.
> > > > >
> > > > > 3. Would it be possible to bump the OffsetCommitValue record
> version to
> > > > > make it flexible along with the changes to parse with the highest
> known
> > > > > version? I'm not sure I understand why we cannot make both changes
> > > > together.
> > > > >
> > > > > 4. I completely missed this. Added some notes at the end of
> > > "Restrictions
> > > > > and Guidelines". Unfortunately I can't think of a solution at the
> > > moment.
> > > > > Will get back to you.
> > > > >
> > > > > 5. I have a section under "Identifying New Record Types" that
> discusses
> > > > > this:
> > > > >  > We can automate the cleanup by writing tombstones when the
> > > coordinator
> > > > > reads unrecognizable records. This may add duplicate work if
> tombstones
> > > > > were already added but not yet pruned by the log cleaner.
> > > > > This is a sure way to delete any unknown record types even if the
> > > > operator
> > > > > does not follow the steps.
> > > > >
> > > > > 6. Thanks, I have expanded on the section on transactional offset
> > > > commits.
> > > > > As for log compaction, my understanding was that we can control the
> > > > process
> > > > > by forcing compaction. Is my understanding incorrect?
> > > > >
> > > > > 7. We throw an exception if ConfigResource.type is unknown which is
> > > true
> > > > > in our case because KIP-848 introduces a new GROUP resource type.
> We
> > > will
> > > > > need to add some sort of safeguard if the dynamic configs are not
> > > deleted
> > > > > before the server downgrade. I'll give you an update once I update
> the
> > > > > section.
> > > > >
> > > > > Justine,
> > > > >
> > > > > On the transactional offset commits, I have updated the KIP to
> reflect
> > > > > your points after our offline discussion (much appreciated). It
> seems
> > > > that
> > > > > the work required to abort from the server side is fairly big and
> will
> > > > > require additional investigation if we are to go down this path.
> > > > >
> > > > > As for downgrade limitations, I missed that part so thanks for the
> > > catch
> > > > > (along with David). Unfortunately, the proposed design won't allow
> > > > > downgrades even after the new record types are deleted because the
> > > > existing
> > > > > OffsetCommitValue record is not flexible and will not be able to
> parse
> > > > > the topicId tagged field. I'll think more about this and get back
> to
> > > you.
> > > > >
> > > > > Best,
> > > > > Jeff
> > > > >
> > > > > On Mon, Mar 20, 2023 at 2:16 PM Justine Olshan
> > > > > <jols...@confluent.io.invalid> wrote:
> > > > >
> > > > >> Hey Jeff and David,
> > > > >>
> > > > >> Thanks for the KIP! I was also looking into this a bit since I may
> > > want
> > > > to
> > > > >> change the record format for KIP-890 as well (finally
> implementing the
> > > > >> record change from KIP-360 to better support the epoch bump. This
> will
> > > > >> potentially be helpful for me to implement that work.
> > > > >>
> > > > >> I discussed some aspects with David offline. Namely, the
> alternative
> > > > >> solution to rewrite the records in the old format upon downgrade.
> One
> > > > plus
> > > > >> to this approach is that it should trigger compaction for all the
> > > > existing
> > > > >> (new format) records.
> > > > >> The main issue we ran into was how to handle ongoing transactions.
> > > (Ie,
> > > > >> could we abort and rewrite with the old format) I think that's
> one of
> > > > the
> > > > >> main downsides. I think it could potentially be possible to do
> this,
> > > but
> > > > >> we'd definitely need a server-side mechanism to abort and rewrite
> the
> > > > >> records.
> > > > >>
> > > > >> I think the trouble I was running into with the current solution
> is
> > > that
> > > > >> we
> > > > >> can only downgrade to a version slightly before the new group
> > > > coordinator.
> > > > >> If someone was running 3.2 and they upgrade to 3.6, but find a
> bug in
> > > > 3.4
> > > > >> (or any version between 3.5 and the one they upgraded from), then
> they
> > > > can
> > > > >> only downgrade to 3.5. This puts us in a difficult spot. I guess
> in
> > > this
> > > > >> scenario, they will need to wait for the new format records to get
> > > > cleared
> > > > >> away before downgrading again. Is that correct?
> > > > >>
> > > > >> Thanks,
> > > > >> Justine
> > > > >>
> > > > >> On Thu, Mar 16, 2023 at 10:41 AM David Jacot
> > > > <dja...@confluent.io.invalid
> > > > >> >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Jeff,
> > > > >> >
> > > > >> > Thanks for the KIP! I am really glad that we are finally
> addressing
> > > > this
> > > > >> > gap in KIP-848. I have a few general comments.
> > > > >> >
> > > > >> > 1. Overall, I feel like the important bits are not bold enough
> in
> > > the
> > > > >> KIP.
> > > > >> > I think that it is good to explain the overall upgrade/downgrade
> > > > process
> > > > >> > and to highlight where the issues are but I think that the core
> bits
> > > > >> should
> > > > >> > give more details. For instance, we should explain why relying
> on
> > > tag
> > > > >> > fields works to ignore fields added in future releases. My
> > > > >> understanding is
> > > > >> > that it works because the buffer for the tagged fields is
> serialized
> > > > at
> > > > >> the
> > > > >> > end so reading with the old version, which is a prefix of the
> new
> > > one,
> > > > >> > works.
> > > > >> >
> > > > >> > 2. Moreover, it would be great if we could make the principle
> more
> > > > >> general.
> > > > >> > My hope is that we can keep reusing the principles introduced
> in the
> > > > >> KIP in
> > > > >> > future releases as well. For instance, let's say that we need to
> > > add a
> > > > >> new
> > > > >> > field to one of the new records introduced by KIP-848 or that we
> > > will
> > > > >> have
> > > > >> > to introduce a new record type as well. Would it work for those
> > > cases
> > > > as
> > > > >> > well?
> > > > >> >
> > > > >> > 3. Regarding enabling support for tagged fields for the
> > > > >> OffsetCommitValue
> > > > >> > record, it would be great if you could give more details on the
> > > steps
> > > > to
> > > > >> > get there in the KIP. My understanding is that we would have to
> do
> > > the
> > > > >> > following: 1) Update the code which reads the records to fail
> back
> > > to
> > > > >> the
> > > > >> > highest known version if the version stored in the log is
> unknown.
> > > > Let's
> > > > >> > say that we do this in AK 3.5. 2) We need to turn on tagged
> fields
> > > for
> > > > >> the
> > > > >> > record. I think that we can only do this in AK 3.6+.
> > > > >> >
> > > > >> > 4. I may have missed this part but we should clearly explain the
> > > > >> drawback
> > > > >> > of the proposed approach as well. Say that we enable tagged
> fields
> > > for
> > > > >> > OffsetCommitValue in AK 3.6. This means that it won't be
> possible to
> > > > >> > downgrade a cluster from 3.6 to a version earlier than 3.5.
> This is
> > > a
> > > > >> > significant limitation in my opinion because, I think, users
> don't
> > > > >> > necessarily upgrade to all versions.
> > > > >> >
> > > > >> > 5. In the proposal, it is not clear about whether the old
> software
> > > > will
> > > > >> > delete unknown records or not. It is true that new records will
> be
> > > > >> deleted
> > > > >> > when the group is downgraded but this only works if the operator
> > > > >> respects
> > > > >> > the process.
> > > > >> >
> > > > >> > 6. It would be great if we could extend the rejected
> alternative.
> > > The
> > > > >> > alternative sounds clearly better when you read it so we should
> > > really
> > > > >> > explain the reason to reject it. 1) One issue that you mention
> is
> > > that
> > > > >> the
> > > > >> > log must be compacted before downgrading and we don't really
> control
> > > > >> this
> > > > >> > process. 2) Transactions may be difficult to handle. I suppose
> that
> > > it
> > > > >> is
> > > > >> > possible to handle them though. Have you thought about this?
> > > > >> >
> > > > >> > 7. For the new dynamic configs, what happens if they are kept
> and
> > > the
> > > > >> > quorum controller is downgraded?
> > > > >> >
> > > > >> > Best,
> > > > >> > David
> > > > >> >
> > > > >> > On Thu, Mar 16, 2023 at 12:56 AM Jeff Kim
> > > > <jeff....@confluent.io.invalid
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > > I would like to start a discussion thread for KIP-915: Next
> Gen
> > > > Group
> > > > >> > > Coordinator Downgrade Path which proposes the downgrade
> design for
> > > > the
> > > > >> > new
> > > > >> > > group coordinator introduced in KIP-848
> > > > >> > > <
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol
> > > > >> > > >
> > > > >> > > .
> > > > >> > >
> > > > >> > > KIP:
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Next+Gen+Group+Coordinator+Downgrade+Path
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Jeff
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
>

Reply via email to