Re: [VOTE] KIP-866 ZooKeeper to KRaft Migration

Colin McCabe Mon, 14 Nov 2022 12:00:12 -0800

On Fri, Nov 11, 2022, at 08:59, David Arthur wrote:
> Thanks, Colin.
>
>> never start an upgrade without first verifying the quorum configuration on 
>> the ZK-based brokers
>
> I agree that this is a pretty big benefit. I could imagine debugging
> and fixing connection problems mid-migration would be a big pain.
> Especially if you had some brokers correctly configured, and others
> not.
>
> Adding a heartbeat raises some questions about what to do if a broker
> goes into a bad state, or stops heartbeating, during a migration.
> However, I think the same is true for a registration based approach,
> so maybe it's not an increase in net complexity.
>


Hi David,

Yeah. I think the goal should be for the set of heartbeaters to match the set 
of broker registrations under /brokers

Obviously, people could add or remove brokers after the upgrade has begun, but 
that's unavoidable, I think. We can at least ensure that at the time we enter 
upgrade, all the brokers are ready.

> I've replaced the ZK registration section with a new RPC and brief
> description. Please take a look.
>

Thanks, David. With these changes it LGTM to me.

+1 (binding)

Colin

> Thanks!
> David
>
> On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> wrote:
>>
>> Hi David,
>>
>> Thanks for the response. Replies inline.
>>
>> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote:
>> > Colin
>> >
>> >>  Maybe zk.metadata.migration.enable ?
>> >
>> > Done. I went with "zookeeper.metadata.migration.enable" since our
>> > other ZK configs start with "zookeeper.*"
>> >
>> >> SImilarly, for MigrationRecord: can we rename this to 
>> >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
>> >
>> > Sure
>> >
>> >> With ZkMigrationStateRecord, one thing to keep in mind here is that we 
>> >> will eventually compact all the metadata logs into a snapshot. That 
>> >> snapshot will then have to keep alive the memory of the old migration. So 
>> >> it is not really a matter of replaying the old metadata logs (probably) 
>> >> but a matter of checking to see what the ZkMigrationState is, which I 
>> >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, 
>> >> we already migrated / are migrating.
>> >
>> > Yea, makes sense.
>> >
>> >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought 
>> >> ZK already tracked this information in the mzxid of the znode?
>> >
>> > Yes, Jun pointed this out previously, I missed this update in the KIP.
>> > Fixed now.
>> >
>> >> It is true that technically it is only needed in UMR, but I would still 
>> >> suggest including KRaftControllerId in LeaderAndIsrRequest because it 
>> >> will make debugging much easier.
>> >>
>> >> I would suggest not implementing the topic deletion state machine, but 
>> >> just deleting topics eagerly when in migration mode. We can implement 
>> >> this behavior change by keying off of whether KRaftControllerId is 
>> >> present in LeaderAndIsrRequest. On broker startup, we'll be sent a full 
>> >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
>> >> expected (again, this behavior change would only be for migration mode)
>> >
>> > Sounds good to me. Since this is somewhat of an implementation detail,
>> > do you think we need this included in the KIP?
>>
>> Yeah, maybe we don't need to go into the delete behavior here. But I think 
>> the KIP should specify that we have KRaftControllerId in both 
>> LeaderAndIsrRequest. That will allow us to implement this behavior 
>> conditionally on zk-based brokers when in dual write mode.
>>
>> >
>> >> For existing KRaft controllers, will 
>> >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
>> >> (MigrationFinalized)? I assume this is true, but it would be good to 
>> >> spell it out. Sorry if this is answered somewhere else.
>> >
>> > We discussed using 0 (None) as the value to report for original,
>> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for
>> > clusters which underwent a migration. I have some description of this
>> > in the table under "Migration Overview"
>> >
>>
>> I don't feel that strongly about this, but wouldn't it be a good idea for 
>> MigrationState to have a different value for ZK-based clusters and 
>> KRaft-based clusters? If you have a bunch of clusters and you take an 
>> aggregate of this metric, it would be good to get a report of three numbers:
>> 1. unupgraded ZK
>> 2. in progress upgrades
>> 3. kraft
>>
>> I guess we could get that from examining some other metrics too, though. Not 
>> sure, what do you think?
>>
>> >> As you point out, the ZK brokers being upgraded will need to contact the 
>> >> KRaft quorum in order to forward requests to there, once we are in 
>> >> migration mode. This raises a question: rather than changing the broker 
>> >> registration, can we have those brokers send an RPC to the kraft 
>> >> controller quorum instead? This would serve to confirm that they can 
>> >> reach the quorum. Then the quorum could wait for all of the brokers to 
>> >> check in this way before starting the migration (It would know all the 
>> >> brokers by looking at /brokers)
>> >
>> > One of the motivations I had for putting the migration details in the
>> > broker registration is that it removes ordering constraints between
>> > the brokers and controllers when starting the migration. If we set the
>> > brokers in migration mode before the KRaft quorum is available, they
>> > will just carry on until the KRaft controller takes over the
>> > controller leadership and starts sending UMR/LISR.
>> >
>>
>> I agree that any scheme that requires complicated startup ordering is not a 
>> good idea. I was more thinking about having the zk-based brokers 
>> periodically send some kind of "I'm here" upgrade heartbeat to the 
>> controller quorum.
>>
>> > I wonder if we could infer reachability of the new KRaft controller by
>> > the brokers through ApiVersions? Once taking leadership, the active
>> > KRaft controller could monitor its NodeApiVersions to see that each of
>> > the ZK brokers had reached it. Would this be enough to verify
>> > reachability?
>> >
>>
>> If the new KRaft controller has already taken leadership, it's too late to 
>> do anything about brokers that can't reach it. Unless we want to implement 
>> automatic rollback to the pre-upgrade state, which sounds quite complex and 
>> error prone.
>>
>> So, to sum up: my proposal was that the ZK based brokers periodically (like 
>> maybe every 10 seconds) try to contact the quorum with an RPC containing 
>> their IBP and readiness state. I guess the main negative thing is that it 
>> would be more network traffic (but probably not significantly more.) The 
>> main positive thing is that we'd never start an upgrade without first 
>> verifying the quorum configuration on the ZK-based brokers. A neutral thing 
>> (neither good nor bad, I think) is that we'd have a new RPC rather than a 
>> znode change.
>>
>> best,
>> Colin
>>
>>
>> > -David
>> >
>> >
>> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> wrote:
>> >>
>> >> Hi David,
>> >>
>> >> Looks great. Some questions:
>> >>
>> >> I agree with Jun that it would be good to rename 
>> >> metadata.migration.enable to something more zk-specific. Maybe 
>> >> zk.metadata.migration.enable ?
>> >>
>> >> SImilarly, for MigrationRecord: can we rename this to 
>> >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
>> >>
>> >> With ZkMigrationStateRecord, one thing to keep in mind here is that we 
>> >> will eventually compact all the metadata logs into a snapshot. That 
>> >> snapshot will then have to keep alive the memory of the old migration. So 
>> >> it is not really a matter of replaying the old metadata logs (probably) 
>> >> but a matter of checking to see what the ZkMigrationState is, which I 
>> >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, 
>> >> we already migrated / are migrating.
>> >>
>> >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought 
>> >> ZK already tracked this information in the mzxid of the znode?
>> >>
>> >> It is true that technically it is only needed in UMR, but I would still 
>> >> suggest including KRaftControllerId in LeaderAndIsrRequest because it 
>> >> will make debugging much easier.
>> >>
>> >> I would suggest not implementing the topic deletion state machine, but 
>> >> just deleting topics eagerly when in migration mode. We can implement 
>> >> this behavior change by keying off of whether KRaftControllerId is 
>> >> present in LeaderAndIsrRequest. On broker startup, we'll be sent a full 
>> >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
>> >> expected (again, this behavior change would only be for migration mode)
>> >>
>> >> For existing KRaft controllers, will 
>> >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
>> >> (MigrationFinalized)? I assume this is true, but it would be good to 
>> >> spell it out. Sorry if this is answered somewhere else.
>> >>
>> >> As you point out, the ZK brokers being upgraded will need to contact the 
>> >> KRaft quorum in order to forward requests to there, once we are in 
>> >> migration mode. This raises a question: rather than changing the broker 
>> >> registration, can we have those brokers send an RPC to the kraft 
>> >> controller quorum instead? This would serve to confirm that they can 
>> >> reach the quorum. Then the quorum could wait for all of the brokers to 
>> >> check in this way before starting the migration (It would know all the 
>> >> brokers by looking at /brokers)
>> >>
>> >> best,
>> >> Colin
>> >>
>> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote:
>> >> > Hello everyone, I'd like to start the vote on KIP-866.
>> >> >
>> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
>> >> >
>> >> > Thanks!
>> >> > David Arthur
>> >
>> >
>> >
>> > --
>> > David Arthur
>
>
>
> -- 
> David Arthur

Re: [VOTE] KIP-866 ZooKeeper to KRaft Migration

Reply via email to