Hi all,

After discussing with my colleagues, I have repurposed the KIP for a
general downgrade solution for both transaction and group coordinators. The
KIP no longer discusses the downgrade path but instead lays out the
foundation for future downgrade solutions.

Link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Txn+and+Group+Coordinator+Downgrade+Foundation

Thanks,
Jeff

On Mon, Mar 20, 2023 at 7:37 PM Jeff Kim <jeff....@confluent.io> wrote:

> Hi David and Justine,
>
> Thank you both for the detailed feedback.
>
> David,
>
> 1. That makes sense. I revised the "Reading new fields" section with how
> we can downgrade to the highest known version and that this was confirmed
> via unit testing. I also attempted to dive deeper into using tagged fields
> and the rejected alternative. Please let me know what you think.
>
> 2. Under "Restrictions and Guidelines" I updated the paragraph to clearly
> state that we want to use tagged fields across all record types introduced
> in KIP-848 including OffsetCommitValue.
>
> 3. Would it be possible to bump the OffsetCommitValue record version to
> make it flexible along with the changes to parse with the highest known
> version? I'm not sure I understand why we cannot make both changes together.
>
> 4. I completely missed this. Added some notes at the end of "Restrictions
> and Guidelines". Unfortunately I can't think of a solution at the moment.
> Will get back to you.
>
> 5. I have a section under "Identifying New Record Types" that discusses
> this:
>  > We can automate the cleanup by writing tombstones when the coordinator
> reads unrecognizable records. This may add duplicate work if tombstones
> were already added but not yet pruned by the log cleaner.
> This is a sure way to delete any unknown record types even if the operator
> does not follow the steps.
>
> 6. Thanks, I have expanded on the section on transactional offset commits.
> As for log compaction, my understanding was that we can control the process
> by forcing compaction. Is my understanding incorrect?
>
> 7. We throw an exception if ConfigResource.type is unknown which is true
> in our case because KIP-848 introduces a new GROUP resource type. We will
> need to add some sort of safeguard if the dynamic configs are not deleted
> before the server downgrade. I'll give you an update once I update the
> section.
>
> Justine,
>
> On the transactional offset commits, I have updated the KIP to reflect
> your points after our offline discussion (much appreciated). It seems that
> the work required to abort from the server side is fairly big and will
> require additional investigation if we are to go down this path.
>
> As for downgrade limitations, I missed that part so thanks for the catch
> (along with David). Unfortunately, the proposed design won't allow
> downgrades even after the new record types are deleted because the existing
> OffsetCommitValue record is not flexible and will not be able to parse
> the topicId tagged field. I'll think more about this and get back to you.
>
> Best,
> Jeff
>
> On Mon, Mar 20, 2023 at 2:16 PM Justine Olshan
> <jols...@confluent.io.invalid> wrote:
>
>> Hey Jeff and David,
>>
>> Thanks for the KIP! I was also looking into this a bit since I may want to
>> change the record format for KIP-890 as well (finally implementing the
>> record change from KIP-360 to better support the epoch bump. This will
>> potentially be helpful for me to implement that work.
>>
>> I discussed some aspects with David offline. Namely, the alternative
>> solution to rewrite the records in the old format upon downgrade. One plus
>> to this approach is that it should trigger compaction for all the existing
>> (new format) records.
>> The main issue we ran into was how to handle ongoing transactions. (Ie,
>> could we abort and rewrite with the old format) I think that's one of the
>> main downsides. I think it could potentially be possible to do this, but
>> we'd definitely need a server-side mechanism to abort and rewrite the
>> records.
>>
>> I think the trouble I was running into with the current solution is that
>> we
>> can only downgrade to a version slightly before the new group coordinator.
>> If someone was running 3.2 and they upgrade to 3.6, but find a bug in 3.4
>> (or any version between 3.5 and the one they upgraded from), then they can
>> only downgrade to 3.5. This puts us in a difficult spot. I guess in this
>> scenario, they will need to wait for the new format records to get cleared
>> away before downgrading again. Is that correct?
>>
>> Thanks,
>> Justine
>>
>> On Thu, Mar 16, 2023 at 10:41 AM David Jacot <dja...@confluent.io.invalid
>> >
>> wrote:
>>
>> > Hi Jeff,
>> >
>> > Thanks for the KIP! I am really glad that we are finally addressing this
>> > gap in KIP-848. I have a few general comments.
>> >
>> > 1. Overall, I feel like the important bits are not bold enough in the
>> KIP.
>> > I think that it is good to explain the overall upgrade/downgrade process
>> > and to highlight where the issues are but I think that the core bits
>> should
>> > give more details. For instance, we should explain why relying on tag
>> > fields works to ignore fields added in future releases. My
>> understanding is
>> > that it works because the buffer for the tagged fields is serialized at
>> the
>> > end so reading with the old version, which is a prefix of the new one,
>> > works.
>> >
>> > 2. Moreover, it would be great if we could make the principle more
>> general.
>> > My hope is that we can keep reusing the principles introduced in the
>> KIP in
>> > future releases as well. For instance, let's say that we need to add a
>> new
>> > field to one of the new records introduced by KIP-848 or that we will
>> have
>> > to introduce a new record type as well. Would it work for those cases as
>> > well?
>> >
>> > 3. Regarding enabling support for tagged fields for the
>> OffsetCommitValue
>> > record, it would be great if you could give more details on the steps to
>> > get there in the KIP. My understanding is that we would have to do the
>> > following: 1) Update the code which reads the records to fail back to
>> the
>> > highest known version if the version stored in the log is unknown. Let's
>> > say that we do this in AK 3.5. 2) We need to turn on tagged fields for
>> the
>> > record. I think that we can only do this in AK 3.6+.
>> >
>> > 4. I may have missed this part but we should clearly explain the
>> drawback
>> > of the proposed approach as well. Say that we enable tagged fields for
>> > OffsetCommitValue in AK 3.6. This means that it won't be possible to
>> > downgrade a cluster from 3.6 to a version earlier than 3.5. This is a
>> > significant limitation in my opinion because, I think, users don't
>> > necessarily upgrade to all versions.
>> >
>> > 5. In the proposal, it is not clear about whether the old software will
>> > delete unknown records or not. It is true that new records will be
>> deleted
>> > when the group is downgraded but this only works if the operator
>> respects
>> > the process.
>> >
>> > 6. It would be great if we could extend the rejected alternative. The
>> > alternative sounds clearly better when you read it so we should really
>> > explain the reason to reject it. 1) One issue that you mention is that
>> the
>> > log must be compacted before downgrading and we don't really control
>> this
>> > process. 2) Transactions may be difficult to handle. I suppose that it
>> is
>> > possible to handle them though. Have you thought about this?
>> >
>> > 7. For the new dynamic configs, what happens if they are kept and the
>> > quorum controller is downgraded?
>> >
>> > Best,
>> > David
>> >
>> > On Thu, Mar 16, 2023 at 12:56 AM Jeff Kim <jeff....@confluent.io.invalid
>> >
>> > wrote:
>> >
>> > > Hi folks,
>> > >
>> > > I would like to start a discussion thread for KIP-915: Next Gen Group
>> > > Coordinator Downgrade Path which proposes the downgrade design for the
>> > new
>> > > group coordinator introduced in KIP-848
>> > > <
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol
>> > > >
>> > > .
>> > >
>> > > KIP:
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Next+Gen+Group+Coordinator+Downgrade+Path
>> > >
>> > > Thanks,
>> > > Jeff
>> > >
>> >
>>
>

Reply via email to