Thanks, +1 from me. I suspect we might be in for at least one surprise with the re-implemented controller RPCs, but I agree the alternative has risks as well.
Best, Jason On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote: > On Fri, Nov 11, 2022, at 08:59, David Arthur wrote: > > Thanks, Colin. > > > >> never start an upgrade without first verifying the quorum configuration > on the ZK-based brokers > > > > I agree that this is a pretty big benefit. I could imagine debugging > > and fixing connection problems mid-migration would be a big pain. > > Especially if you had some brokers correctly configured, and others > > not. > > > > Adding a heartbeat raises some questions about what to do if a broker > > goes into a bad state, or stops heartbeating, during a migration. > > However, I think the same is true for a registration based approach, > > so maybe it's not an increase in net complexity. > > > > Hi David, > > Yeah. I think the goal should be for the set of heartbeaters to match the > set of broker registrations under /brokers > > Obviously, people could add or remove brokers after the upgrade has begun, > but that's unavoidable, I think. We can at least ensure that at the time we > enter upgrade, all the brokers are ready. > > > I've replaced the ZK registration section with a new RPC and brief > > description. Please take a look. > > > > Thanks, David. With these changes it LGTM to me. > > +1 (binding) > > Colin > > > Thanks! > > David > > > > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> wrote: > >> > >> Hi David, > >> > >> Thanks for the response. Replies inline. > >> > >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote: > >> > Colin > >> > > >> >> Maybe zk.metadata.migration.enable ? > >> > > >> > Done. I went with "zookeeper.metadata.migration.enable" since our > >> > other ZK configs start with "zookeeper.*" > >> > > >> >> SImilarly, for MigrationRecord: can we rename this to > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > >> > > >> > Sure > >> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > we will eventually compact all the metadata logs into a snapshot. That > snapshot will then have to keep alive the memory of the old migration. So > it is not really a matter of replaying the old metadata logs (probably) but > a matter of checking to see what the ZkMigrationState is, which I suppose > could be Optional<ZkMigrationState>. If it's not Optional.empty, we already > migrated / are migrating. > >> > > >> > Yea, makes sense. > >> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > thought ZK already tracked this information in the mzxid of the znode? > >> > > >> > Yes, Jun pointed this out previously, I missed this update in the KIP. > >> > Fixed now. > >> > > >> >> It is true that technically it is only needed in UMR, but I would > still suggest including KRaftControllerId in LeaderAndIsrRequest because it > will make debugging much easier. > >> >> > >> >> I would suggest not implementing the topic deletion state machine, > but just deleting topics eagerly when in migration mode. We can implement > this behavior change by keying off of whether KRaftControllerId is present > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > expected (again, this behavior change would only be for migration mode) > >> > > >> > Sounds good to me. Since this is somewhat of an implementation detail, > >> > do you think we need this included in the KIP? > >> > >> Yeah, maybe we don't need to go into the delete behavior here. But I > think the KIP should specify that we have KRaftControllerId in both > LeaderAndIsrRequest. That will allow us to implement this behavior > conditionally on zk-based brokers when in dual write mode. > >> > >> > > >> >> For existing KRaft controllers, will > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > (MigrationFinalized)? I assume this is true, but it would be good to spell > it out. Sorry if this is answered somewhere else. > >> > > >> > We discussed using 0 (None) as the value to report for original, > >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for > >> > clusters which underwent a migration. I have some description of this > >> > in the table under "Migration Overview" > >> > > >> > >> I don't feel that strongly about this, but wouldn't it be a good idea > for MigrationState to have a different value for ZK-based clusters and > KRaft-based clusters? If you have a bunch of clusters and you take an > aggregate of this metric, it would be good to get a report of three numbers: > >> 1. unupgraded ZK > >> 2. in progress upgrades > >> 3. kraft > >> > >> I guess we could get that from examining some other metrics too, > though. Not sure, what do you think? > >> > >> >> As you point out, the ZK brokers being upgraded will need to contact > the KRaft quorum in order to forward requests to there, once we are in > migration mode. This raises a question: rather than changing the broker > registration, can we have those brokers send an RPC to the kraft controller > quorum instead? This would serve to confirm that they can reach the quorum. > Then the quorum could wait for all of the brokers to check in this way > before starting the migration (It would know all the brokers by looking at > /brokers) > >> > > >> > One of the motivations I had for putting the migration details in the > >> > broker registration is that it removes ordering constraints between > >> > the brokers and controllers when starting the migration. If we set the > >> > brokers in migration mode before the KRaft quorum is available, they > >> > will just carry on until the KRaft controller takes over the > >> > controller leadership and starts sending UMR/LISR. > >> > > >> > >> I agree that any scheme that requires complicated startup ordering is > not a good idea. I was more thinking about having the zk-based brokers > periodically send some kind of "I'm here" upgrade heartbeat to the > controller quorum. > >> > >> > I wonder if we could infer reachability of the new KRaft controller by > >> > the brokers through ApiVersions? Once taking leadership, the active > >> > KRaft controller could monitor its NodeApiVersions to see that each of > >> > the ZK brokers had reached it. Would this be enough to verify > >> > reachability? > >> > > >> > >> If the new KRaft controller has already taken leadership, it's too late > to do anything about brokers that can't reach it. Unless we want to > implement automatic rollback to the pre-upgrade state, which sounds quite > complex and error prone. > >> > >> So, to sum up: my proposal was that the ZK based brokers periodically > (like maybe every 10 seconds) try to contact the quorum with an RPC > containing their IBP and readiness state. I guess the main negative thing > is that it would be more network traffic (but probably not significantly > more.) The main positive thing is that we'd never start an upgrade without > first verifying the quorum configuration on the ZK-based brokers. A neutral > thing (neither good nor bad, I think) is that we'd have a new RPC rather > than a znode change. > >> > >> best, > >> Colin > >> > >> > >> > -David > >> > > >> > > >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> > wrote: > >> >> > >> >> Hi David, > >> >> > >> >> Looks great. Some questions: > >> >> > >> >> I agree with Jun that it would be good to rename > metadata.migration.enable to something more zk-specific. Maybe > zk.metadata.migration.enable ? > >> >> > >> >> SImilarly, for MigrationRecord: can we rename this to > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > >> >> > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > we will eventually compact all the metadata logs into a snapshot. That > snapshot will then have to keep alive the memory of the old migration. So > it is not really a matter of replaying the old metadata logs (probably) but > a matter of checking to see what the ZkMigrationState is, which I suppose > could be Optional<ZkMigrationState>. If it's not Optional.empty, we already > migrated / are migrating. > >> >> > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > thought ZK already tracked this information in the mzxid of the znode? > >> >> > >> >> It is true that technically it is only needed in UMR, but I would > still suggest including KRaftControllerId in LeaderAndIsrRequest because it > will make debugging much easier. > >> >> > >> >> I would suggest not implementing the topic deletion state machine, > but just deleting topics eagerly when in migration mode. We can implement > this behavior change by keying off of whether KRaftControllerId is present > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > expected (again, this behavior change would only be for migration mode) > >> >> > >> >> For existing KRaft controllers, will > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > (MigrationFinalized)? I assume this is true, but it would be good to spell > it out. Sorry if this is answered somewhere else. > >> >> > >> >> As you point out, the ZK brokers being upgraded will need to contact > the KRaft quorum in order to forward requests to there, once we are in > migration mode. This raises a question: rather than changing the broker > registration, can we have those brokers send an RPC to the kraft controller > quorum instead? This would serve to confirm that they can reach the quorum. > Then the quorum could wait for all of the brokers to check in this way > before starting the migration (It would know all the brokers by looking at > /brokers) > >> >> > >> >> best, > >> >> Colin > >> >> > >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote: > >> >> > Hello everyone, I'd like to start the vote on KIP-866. > >> >> > > >> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration > >> >> > > >> >> > Thanks! > >> >> > David Arthur > >> > > >> > > >> > > >> > -- > >> > David Arthur > > > > > > > > -- > > David Arthur >