Hi David,

Thanks for the response. Replies inline.

On Wed, Nov 9, 2022, at 08:17, David Arthur wrote:
> Colin
>
>>  Maybe zk.metadata.migration.enable ?
>
> Done. I went with "zookeeper.metadata.migration.enable" since our
> other ZK configs start with "zookeeper.*"
>
>> SImilarly, for MigrationRecord: can we rename this to 
>> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
>
> Sure
>
>> With ZkMigrationStateRecord, one thing to keep in mind here is that we will 
>> eventually compact all the metadata logs into a snapshot. That snapshot will 
>> then have to keep alive the memory of the old migration. So it is not really 
>> a matter of replaying the old metadata logs (probably) but a matter of 
>> checking to see what the ZkMigrationState is, which I suppose could be 
>> Optional<ZkMigrationState>. If it's not Optional.empty, we already migrated 
>> / are migrating.
>
> Yea, makes sense.
>
>> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK 
>> already tracked this information in the mzxid of the znode?
>
> Yes, Jun pointed this out previously, I missed this update in the KIP.
> Fixed now.
>
>> It is true that technically it is only needed in UMR, but I would still 
>> suggest including KRaftControllerId in LeaderAndIsrRequest because it will 
>> make debugging much easier.
>>
>> I would suggest not implementing the topic deletion state machine, but just 
>> deleting topics eagerly when in migration mode. We can implement this 
>> behavior change by keying off of whether KRaftControllerId is present in 
>> LeaderAndIsrRequest. On broker startup, we'll be sent a full 
>> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
>> expected (again, this behavior change would only be for migration mode)
>
> Sounds good to me. Since this is somewhat of an implementation detail,
> do you think we need this included in the KIP?

Yeah, maybe we don't need to go into the delete behavior here. But I think the 
KIP should specify that we have KRaftControllerId in both LeaderAndIsrRequest. 
That will allow us to implement this behavior conditionally on zk-based brokers 
when in dual write mode.

>
>> For existing KRaft controllers, will 
>> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
>> (MigrationFinalized)? I assume this is true, but it would be good to spell 
>> it out. Sorry if this is answered somewhere else.
>
> We discussed using 0 (None) as the value to report for original,
> un-migrated KRaft clusters. 4 (MigrationFinalized) would be for
> clusters which underwent a migration. I have some description of this
> in the table under "Migration Overview"
>

I don't feel that strongly about this, but wouldn't it be a good idea for 
MigrationState to have a different value for ZK-based clusters and KRaft-based 
clusters? If you have a bunch of clusters and you take an aggregate of this 
metric, it would be good to get a report of three numbers:
1. unupgraded ZK
2. in progress upgrades
3. kraft

I guess we could get that from examining some other metrics too, though. Not 
sure, what do you think?

>> As you point out, the ZK brokers being upgraded will need to contact the 
>> KRaft quorum in order to forward requests to there, once we are in migration 
>> mode. This raises a question: rather than changing the broker registration, 
>> can we have those brokers send an RPC to the kraft controller quorum 
>> instead? This would serve to confirm that they can reach the quorum. Then 
>> the quorum could wait for all of the brokers to check in this way before 
>> starting the migration (It would know all the brokers by looking at /brokers)
>
> One of the motivations I had for putting the migration details in the
> broker registration is that it removes ordering constraints between
> the brokers and controllers when starting the migration. If we set the
> brokers in migration mode before the KRaft quorum is available, they
> will just carry on until the KRaft controller takes over the
> controller leadership and starts sending UMR/LISR.
>

I agree that any scheme that requires complicated startup ordering is not a 
good idea. I was more thinking about having the zk-based brokers periodically 
send some kind of "I'm here" upgrade heartbeat to the controller quorum.

> I wonder if we could infer reachability of the new KRaft controller by
> the brokers through ApiVersions? Once taking leadership, the active
> KRaft controller could monitor its NodeApiVersions to see that each of
> the ZK brokers had reached it. Would this be enough to verify
> reachability?
>

If the new KRaft controller has already taken leadership, it's too late to do 
anything about brokers that can't reach it. Unless we want to implement 
automatic rollback to the pre-upgrade state, which sounds quite complex and 
error prone.

So, to sum up: my proposal was that the ZK based brokers periodically (like 
maybe every 10 seconds) try to contact the quorum with an RPC containing their 
IBP and readiness state. I guess the main negative thing is that it would be 
more network traffic (but probably not significantly more.) The main positive 
thing is that we'd never start an upgrade without first verifying the quorum 
configuration on the ZK-based brokers. A neutral thing (neither good nor bad, I 
think) is that we'd have a new RPC rather than a znode change.

best,
Colin


> -David
>
>
> On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> wrote:
>>
>> Hi David,
>>
>> Looks great. Some questions:
>>
>> I agree with Jun that it would be good to rename metadata.migration.enable 
>> to something more zk-specific. Maybe zk.metadata.migration.enable ?
>>
>> SImilarly, for MigrationRecord: can we rename this to 
>> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
>>
>> With ZkMigrationStateRecord, one thing to keep in mind here is that we will 
>> eventually compact all the metadata logs into a snapshot. That snapshot will 
>> then have to keep alive the memory of the old migration. So it is not really 
>> a matter of replaying the old metadata logs (probably) but a matter of 
>> checking to see what the ZkMigrationState is, which I suppose could be 
>> Optional<ZkMigrationState>. If it's not Optional.empty, we already migrated 
>> / are migrating.
>>
>> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK 
>> already tracked this information in the mzxid of the znode?
>>
>> It is true that technically it is only needed in UMR, but I would still 
>> suggest including KRaftControllerId in LeaderAndIsrRequest because it will 
>> make debugging much easier.
>>
>> I would suggest not implementing the topic deletion state machine, but just 
>> deleting topics eagerly when in migration mode. We can implement this 
>> behavior change by keying off of whether KRaftControllerId is present in 
>> LeaderAndIsrRequest. On broker startup, we'll be sent a full 
>> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
>> expected (again, this behavior change would only be for migration mode)
>>
>> For existing KRaft controllers, will 
>> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
>> (MigrationFinalized)? I assume this is true, but it would be good to spell 
>> it out. Sorry if this is answered somewhere else.
>>
>> As you point out, the ZK brokers being upgraded will need to contact the 
>> KRaft quorum in order to forward requests to there, once we are in migration 
>> mode. This raises a question: rather than changing the broker registration, 
>> can we have those brokers send an RPC to the kraft controller quorum 
>> instead? This would serve to confirm that they can reach the quorum. Then 
>> the quorum could wait for all of the brokers to check in this way before 
>> starting the migration (It would know all the brokers by looking at /brokers)
>>
>> best,
>> Colin
>>
>> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote:
>> > Hello everyone, I'd like to start the vote on KIP-866.
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
>> >
>> > Thanks!
>> > David Arthur
>
>
>
> --
> David Arthur

Reply via email to