Thanks, Colin.

> never start an upgrade without first verifying the quorum configuration on 
> the ZK-based brokers

I agree that this is a pretty big benefit. I could imagine debugging
and fixing connection problems mid-migration would be a big pain.
Especially if you had some brokers correctly configured, and others
not.

Adding a heartbeat raises some questions about what to do if a broker
goes into a bad state, or stops heartbeating, during a migration.
However, I think the same is true for a registration based approach,
so maybe it's not an increase in net complexity.

I've replaced the ZK registration section with a new RPC and brief
description. Please take a look.

Thanks!
David

On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> Hi David,
>
> Thanks for the response. Replies inline.
>
> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote:
> > Colin
> >
> >>  Maybe zk.metadata.migration.enable ?
> >
> > Done. I went with "zookeeper.metadata.migration.enable" since our
> > other ZK configs start with "zookeeper.*"
> >
> >> SImilarly, for MigrationRecord: can we rename this to 
> >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> >
> > Sure
> >
> >> With ZkMigrationStateRecord, one thing to keep in mind here is that we 
> >> will eventually compact all the metadata logs into a snapshot. That 
> >> snapshot will then have to keep alive the memory of the old migration. So 
> >> it is not really a matter of replaying the old metadata logs (probably) 
> >> but a matter of checking to see what the ZkMigrationState is, which I 
> >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, 
> >> we already migrated / are migrating.
> >
> > Yea, makes sense.
> >
> >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK 
> >> already tracked this information in the mzxid of the znode?
> >
> > Yes, Jun pointed this out previously, I missed this update in the KIP.
> > Fixed now.
> >
> >> It is true that technically it is only needed in UMR, but I would still 
> >> suggest including KRaftControllerId in LeaderAndIsrRequest because it will 
> >> make debugging much easier.
> >>
> >> I would suggest not implementing the topic deletion state machine, but 
> >> just deleting topics eagerly when in migration mode. We can implement this 
> >> behavior change by keying off of whether KRaftControllerId is present in 
> >> LeaderAndIsrRequest. On broker startup, we'll be sent a full 
> >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
> >> expected (again, this behavior change would only be for migration mode)
> >
> > Sounds good to me. Since this is somewhat of an implementation detail,
> > do you think we need this included in the KIP?
>
> Yeah, maybe we don't need to go into the delete behavior here. But I think 
> the KIP should specify that we have KRaftControllerId in both 
> LeaderAndIsrRequest. That will allow us to implement this behavior 
> conditionally on zk-based brokers when in dual write mode.
>
> >
> >> For existing KRaft controllers, will 
> >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
> >> (MigrationFinalized)? I assume this is true, but it would be good to spell 
> >> it out. Sorry if this is answered somewhere else.
> >
> > We discussed using 0 (None) as the value to report for original,
> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for
> > clusters which underwent a migration. I have some description of this
> > in the table under "Migration Overview"
> >
>
> I don't feel that strongly about this, but wouldn't it be a good idea for 
> MigrationState to have a different value for ZK-based clusters and 
> KRaft-based clusters? If you have a bunch of clusters and you take an 
> aggregate of this metric, it would be good to get a report of three numbers:
> 1. unupgraded ZK
> 2. in progress upgrades
> 3. kraft
>
> I guess we could get that from examining some other metrics too, though. Not 
> sure, what do you think?
>
> >> As you point out, the ZK brokers being upgraded will need to contact the 
> >> KRaft quorum in order to forward requests to there, once we are in 
> >> migration mode. This raises a question: rather than changing the broker 
> >> registration, can we have those brokers send an RPC to the kraft 
> >> controller quorum instead? This would serve to confirm that they can reach 
> >> the quorum. Then the quorum could wait for all of the brokers to check in 
> >> this way before starting the migration (It would know all the brokers by 
> >> looking at /brokers)
> >
> > One of the motivations I had for putting the migration details in the
> > broker registration is that it removes ordering constraints between
> > the brokers and controllers when starting the migration. If we set the
> > brokers in migration mode before the KRaft quorum is available, they
> > will just carry on until the KRaft controller takes over the
> > controller leadership and starts sending UMR/LISR.
> >
>
> I agree that any scheme that requires complicated startup ordering is not a 
> good idea. I was more thinking about having the zk-based brokers periodically 
> send some kind of "I'm here" upgrade heartbeat to the controller quorum.
>
> > I wonder if we could infer reachability of the new KRaft controller by
> > the brokers through ApiVersions? Once taking leadership, the active
> > KRaft controller could monitor its NodeApiVersions to see that each of
> > the ZK brokers had reached it. Would this be enough to verify
> > reachability?
> >
>
> If the new KRaft controller has already taken leadership, it's too late to do 
> anything about brokers that can't reach it. Unless we want to implement 
> automatic rollback to the pre-upgrade state, which sounds quite complex and 
> error prone.
>
> So, to sum up: my proposal was that the ZK based brokers periodically (like 
> maybe every 10 seconds) try to contact the quorum with an RPC containing 
> their IBP and readiness state. I guess the main negative thing is that it 
> would be more network traffic (but probably not significantly more.) The main 
> positive thing is that we'd never start an upgrade without first verifying 
> the quorum configuration on the ZK-based brokers. A neutral thing (neither 
> good nor bad, I think) is that we'd have a new RPC rather than a znode change.
>
> best,
> Colin
>
>
> > -David
> >
> >
> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> wrote:
> >>
> >> Hi David,
> >>
> >> Looks great. Some questions:
> >>
> >> I agree with Jun that it would be good to rename metadata.migration.enable 
> >> to something more zk-specific. Maybe zk.metadata.migration.enable ?
> >>
> >> SImilarly, for MigrationRecord: can we rename this to 
> >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> >>
> >> With ZkMigrationStateRecord, one thing to keep in mind here is that we 
> >> will eventually compact all the metadata logs into a snapshot. That 
> >> snapshot will then have to keep alive the memory of the old migration. So 
> >> it is not really a matter of replaying the old metadata logs (probably) 
> >> but a matter of checking to see what the ZkMigrationState is, which I 
> >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, 
> >> we already migrated / are migrating.
> >>
> >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK 
> >> already tracked this information in the mzxid of the znode?
> >>
> >> It is true that technically it is only needed in UMR, but I would still 
> >> suggest including KRaftControllerId in LeaderAndIsrRequest because it will 
> >> make debugging much easier.
> >>
> >> I would suggest not implementing the topic deletion state machine, but 
> >> just deleting topics eagerly when in migration mode. We can implement this 
> >> behavior change by keying off of whether KRaftControllerId is present in 
> >> LeaderAndIsrRequest. On broker startup, we'll be sent a full 
> >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as 
> >> expected (again, this behavior change would only be for migration mode)
> >>
> >> For existing KRaft controllers, will 
> >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 
> >> (MigrationFinalized)? I assume this is true, but it would be good to spell 
> >> it out. Sorry if this is answered somewhere else.
> >>
> >> As you point out, the ZK brokers being upgraded will need to contact the 
> >> KRaft quorum in order to forward requests to there, once we are in 
> >> migration mode. This raises a question: rather than changing the broker 
> >> registration, can we have those brokers send an RPC to the kraft 
> >> controller quorum instead? This would serve to confirm that they can reach 
> >> the quorum. Then the quorum could wait for all of the brokers to check in 
> >> this way before starting the migration (It would know all the brokers by 
> >> looking at /brokers)
> >>
> >> best,
> >> Colin
> >>
> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote:
> >> > Hello everyone, I'd like to start the vote on KIP-866.
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> >> >
> >> > Thanks!
> >> > David Arthur
> >
> >
> >
> > --
> > David Arthur



-- 
David Arthur

Reply via email to