Thanks, Colin. > never start an upgrade without first verifying the quorum configuration on > the ZK-based brokers
I agree that this is a pretty big benefit. I could imagine debugging and fixing connection problems mid-migration would be a big pain. Especially if you had some brokers correctly configured, and others not. Adding a heartbeat raises some questions about what to do if a broker goes into a bad state, or stops heartbeating, during a migration. However, I think the same is true for a registration based approach, so maybe it's not an increase in net complexity. I've replaced the ZK registration section with a new RPC and brief description. Please take a look. Thanks! David On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> wrote: > > Hi David, > > Thanks for the response. Replies inline. > > On Wed, Nov 9, 2022, at 08:17, David Arthur wrote: > > Colin > > > >> Maybe zk.metadata.migration.enable ? > > > > Done. I went with "zookeeper.metadata.migration.enable" since our > > other ZK configs start with "zookeeper.*" > > > >> SImilarly, for MigrationRecord: can we rename this to > >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > > > > Sure > > > >> With ZkMigrationStateRecord, one thing to keep in mind here is that we > >> will eventually compact all the metadata logs into a snapshot. That > >> snapshot will then have to keep alive the memory of the old migration. So > >> it is not really a matter of replaying the old metadata logs (probably) > >> but a matter of checking to see what the ZkMigrationState is, which I > >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, > >> we already migrated / are migrating. > > > > Yea, makes sense. > > > >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK > >> already tracked this information in the mzxid of the znode? > > > > Yes, Jun pointed this out previously, I missed this update in the KIP. > > Fixed now. > > > >> It is true that technically it is only needed in UMR, but I would still > >> suggest including KRaftControllerId in LeaderAndIsrRequest because it will > >> make debugging much easier. > >> > >> I would suggest not implementing the topic deletion state machine, but > >> just deleting topics eagerly when in migration mode. We can implement this > >> behavior change by keying off of whether KRaftControllerId is present in > >> LeaderAndIsrRequest. On broker startup, we'll be sent a full > >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > >> expected (again, this behavior change would only be for migration mode) > > > > Sounds good to me. Since this is somewhat of an implementation detail, > > do you think we need this included in the KIP? > > Yeah, maybe we don't need to go into the delete behavior here. But I think > the KIP should specify that we have KRaftControllerId in both > LeaderAndIsrRequest. That will allow us to implement this behavior > conditionally on zk-based brokers when in dual write mode. > > > > >> For existing KRaft controllers, will > >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 > >> (MigrationFinalized)? I assume this is true, but it would be good to spell > >> it out. Sorry if this is answered somewhere else. > > > > We discussed using 0 (None) as the value to report for original, > > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for > > clusters which underwent a migration. I have some description of this > > in the table under "Migration Overview" > > > > I don't feel that strongly about this, but wouldn't it be a good idea for > MigrationState to have a different value for ZK-based clusters and > KRaft-based clusters? If you have a bunch of clusters and you take an > aggregate of this metric, it would be good to get a report of three numbers: > 1. unupgraded ZK > 2. in progress upgrades > 3. kraft > > I guess we could get that from examining some other metrics too, though. Not > sure, what do you think? > > >> As you point out, the ZK brokers being upgraded will need to contact the > >> KRaft quorum in order to forward requests to there, once we are in > >> migration mode. This raises a question: rather than changing the broker > >> registration, can we have those brokers send an RPC to the kraft > >> controller quorum instead? This would serve to confirm that they can reach > >> the quorum. Then the quorum could wait for all of the brokers to check in > >> this way before starting the migration (It would know all the brokers by > >> looking at /brokers) > > > > One of the motivations I had for putting the migration details in the > > broker registration is that it removes ordering constraints between > > the brokers and controllers when starting the migration. If we set the > > brokers in migration mode before the KRaft quorum is available, they > > will just carry on until the KRaft controller takes over the > > controller leadership and starts sending UMR/LISR. > > > > I agree that any scheme that requires complicated startup ordering is not a > good idea. I was more thinking about having the zk-based brokers periodically > send some kind of "I'm here" upgrade heartbeat to the controller quorum. > > > I wonder if we could infer reachability of the new KRaft controller by > > the brokers through ApiVersions? Once taking leadership, the active > > KRaft controller could monitor its NodeApiVersions to see that each of > > the ZK brokers had reached it. Would this be enough to verify > > reachability? > > > > If the new KRaft controller has already taken leadership, it's too late to do > anything about brokers that can't reach it. Unless we want to implement > automatic rollback to the pre-upgrade state, which sounds quite complex and > error prone. > > So, to sum up: my proposal was that the ZK based brokers periodically (like > maybe every 10 seconds) try to contact the quorum with an RPC containing > their IBP and readiness state. I guess the main negative thing is that it > would be more network traffic (but probably not significantly more.) The main > positive thing is that we'd never start an upgrade without first verifying > the quorum configuration on the ZK-based brokers. A neutral thing (neither > good nor bad, I think) is that we'd have a new RPC rather than a znode change. > > best, > Colin > > > > -David > > > > > > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> wrote: > >> > >> Hi David, > >> > >> Looks great. Some questions: > >> > >> I agree with Jun that it would be good to rename metadata.migration.enable > >> to something more zk-specific. Maybe zk.metadata.migration.enable ? > >> > >> SImilarly, for MigrationRecord: can we rename this to > >> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > >> > >> With ZkMigrationStateRecord, one thing to keep in mind here is that we > >> will eventually compact all the metadata logs into a snapshot. That > >> snapshot will then have to keep alive the memory of the old migration. So > >> it is not really a matter of replaying the old metadata logs (probably) > >> but a matter of checking to see what the ZkMigrationState is, which I > >> suppose could be Optional<ZkMigrationState>. If it's not Optional.empty, > >> we already migrated / are migrating. > >> > >> For the /migration ZNode, is "last_update_time_ms" necessary? I thought ZK > >> already tracked this information in the mzxid of the znode? > >> > >> It is true that technically it is only needed in UMR, but I would still > >> suggest including KRaftControllerId in LeaderAndIsrRequest because it will > >> make debugging much easier. > >> > >> I would suggest not implementing the topic deletion state machine, but > >> just deleting topics eagerly when in migration mode. We can implement this > >> behavior change by keying off of whether KRaftControllerId is present in > >> LeaderAndIsrRequest. On broker startup, we'll be sent a full > >> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > >> expected (again, this behavior change would only be for migration mode) > >> > >> For existing KRaft controllers, will > >> kafka.controller:type=KafkaController,name=MigrationState show up as 4 > >> (MigrationFinalized)? I assume this is true, but it would be good to spell > >> it out. Sorry if this is answered somewhere else. > >> > >> As you point out, the ZK brokers being upgraded will need to contact the > >> KRaft quorum in order to forward requests to there, once we are in > >> migration mode. This raises a question: rather than changing the broker > >> registration, can we have those brokers send an RPC to the kraft > >> controller quorum instead? This would serve to confirm that they can reach > >> the quorum. Then the quorum could wait for all of the brokers to check in > >> this way before starting the migration (It would know all the brokers by > >> looking at /brokers) > >> > >> best, > >> Colin > >> > >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote: > >> > Hello everyone, I'd like to start the vote on KIP-866. > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration > >> > > >> > Thanks! > >> > David Arthur > > > > > > > > -- > > David Arthur -- David Arthur