Hey everyone, I'm going to close out the vote. There were three +1 votes from PMC members and no committer or community votes.
+1 PMC votes: * Colin McCabe * Jason Gustafson * Jun Rao With that, the vote passes. Thanks to everyone who reviewed this KIP! Cheers, David On Tue, Nov 29, 2022 at 7:48 PM Jun Rao <j...@confluent.io.invalid> wrote: > > Hi, David, > > Thanks for the KIP. +1 > > Jun > > On Tue, Nov 22, 2022 at 3:43 PM Jason Gustafson <ja...@confluent.io.invalid> > wrote: > > > Thanks, +1 from me. I suspect we might be in for at least one surprise with > > the re-implemented controller RPCs, but I agree the alternative has risks > > as well. > > > > Best, > > Jason > > > > On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > On Fri, Nov 11, 2022, at 08:59, David Arthur wrote: > > > > Thanks, Colin. > > > > > > > >> never start an upgrade without first verifying the quorum > > configuration > > > on the ZK-based brokers > > > > > > > > I agree that this is a pretty big benefit. I could imagine debugging > > > > and fixing connection problems mid-migration would be a big pain. > > > > Especially if you had some brokers correctly configured, and others > > > > not. > > > > > > > > Adding a heartbeat raises some questions about what to do if a broker > > > > goes into a bad state, or stops heartbeating, during a migration. > > > > However, I think the same is true for a registration based approach, > > > > so maybe it's not an increase in net complexity. > > > > > > > > > > Hi David, > > > > > > Yeah. I think the goal should be for the set of heartbeaters to match the > > > set of broker registrations under /brokers > > > > > > Obviously, people could add or remove brokers after the upgrade has > > begun, > > > but that's unavoidable, I think. We can at least ensure that at the time > > we > > > enter upgrade, all the brokers are ready. > > > > > > > I've replaced the ZK registration section with a new RPC and brief > > > > description. Please take a look. > > > > > > > > > > Thanks, David. With these changes it LGTM to me. > > > > > > +1 (binding) > > > > > > Colin > > > > > > > Thanks! > > > > David > > > > > > > > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> > > wrote: > > > >> > > > >> Hi David, > > > >> > > > >> Thanks for the response. Replies inline. > > > >> > > > >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote: > > > >> > Colin > > > >> > > > > >> >> Maybe zk.metadata.migration.enable ? > > > >> > > > > >> > Done. I went with "zookeeper.metadata.migration.enable" since our > > > >> > other ZK configs start with "zookeeper.*" > > > >> > > > > >> >> SImilarly, for MigrationRecord: can we rename this to > > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > > > >> > > > > >> > Sure > > > >> > > > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > > > we will eventually compact all the metadata logs into a snapshot. That > > > snapshot will then have to keep alive the memory of the old migration. So > > > it is not really a matter of replaying the old metadata logs (probably) > > but > > > a matter of checking to see what the ZkMigrationState is, which I suppose > > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we > > already > > > migrated / are migrating. > > > >> > > > > >> > Yea, makes sense. > > > >> > > > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > > > thought ZK already tracked this information in the mzxid of the znode? > > > >> > > > > >> > Yes, Jun pointed this out previously, I missed this update in the > > KIP. > > > >> > Fixed now. > > > >> > > > > >> >> It is true that technically it is only needed in UMR, but I would > > > still suggest including KRaftControllerId in LeaderAndIsrRequest because > > it > > > will make debugging much easier. > > > >> >> > > > >> >> I would suggest not implementing the topic deletion state machine, > > > but just deleting topics eagerly when in migration mode. We can implement > > > this behavior change by keying off of whether KRaftControllerId is > > present > > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > > > expected (again, this behavior change would only be for migration mode) > > > >> > > > > >> > Sounds good to me. Since this is somewhat of an implementation > > detail, > > > >> > do you think we need this included in the KIP? > > > >> > > > >> Yeah, maybe we don't need to go into the delete behavior here. But I > > > think the KIP should specify that we have KRaftControllerId in both > > > LeaderAndIsrRequest. That will allow us to implement this behavior > > > conditionally on zk-based brokers when in dual write mode. > > > >> > > > >> > > > > >> >> For existing KRaft controllers, will > > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > > > (MigrationFinalized)? I assume this is true, but it would be good to > > spell > > > it out. Sorry if this is answered somewhere else. > > > >> > > > > >> > We discussed using 0 (None) as the value to report for original, > > > >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for > > > >> > clusters which underwent a migration. I have some description of > > this > > > >> > in the table under "Migration Overview" > > > >> > > > > >> > > > >> I don't feel that strongly about this, but wouldn't it be a good idea > > > for MigrationState to have a different value for ZK-based clusters and > > > KRaft-based clusters? If you have a bunch of clusters and you take an > > > aggregate of this metric, it would be good to get a report of three > > numbers: > > > >> 1. unupgraded ZK > > > >> 2. in progress upgrades > > > >> 3. kraft > > > >> > > > >> I guess we could get that from examining some other metrics too, > > > though. Not sure, what do you think? > > > >> > > > >> >> As you point out, the ZK brokers being upgraded will need to > > contact > > > the KRaft quorum in order to forward requests to there, once we are in > > > migration mode. This raises a question: rather than changing the broker > > > registration, can we have those brokers send an RPC to the kraft > > controller > > > quorum instead? This would serve to confirm that they can reach the > > quorum. > > > Then the quorum could wait for all of the brokers to check in this way > > > before starting the migration (It would know all the brokers by looking > > at > > > /brokers) > > > >> > > > > >> > One of the motivations I had for putting the migration details in > > the > > > >> > broker registration is that it removes ordering constraints between > > > >> > the brokers and controllers when starting the migration. If we set > > the > > > >> > brokers in migration mode before the KRaft quorum is available, they > > > >> > will just carry on until the KRaft controller takes over the > > > >> > controller leadership and starts sending UMR/LISR. > > > >> > > > > >> > > > >> I agree that any scheme that requires complicated startup ordering is > > > not a good idea. I was more thinking about having the zk-based brokers > > > periodically send some kind of "I'm here" upgrade heartbeat to the > > > controller quorum. > > > >> > > > >> > I wonder if we could infer reachability of the new KRaft controller > > by > > > >> > the brokers through ApiVersions? Once taking leadership, the active > > > >> > KRaft controller could monitor its NodeApiVersions to see that each > > of > > > >> > the ZK brokers had reached it. Would this be enough to verify > > > >> > reachability? > > > >> > > > > >> > > > >> If the new KRaft controller has already taken leadership, it's too > > late > > > to do anything about brokers that can't reach it. Unless we want to > > > implement automatic rollback to the pre-upgrade state, which sounds quite > > > complex and error prone. > > > >> > > > >> So, to sum up: my proposal was that the ZK based brokers periodically > > > (like maybe every 10 seconds) try to contact the quorum with an RPC > > > containing their IBP and readiness state. I guess the main negative thing > > > is that it would be more network traffic (but probably not significantly > > > more.) The main positive thing is that we'd never start an upgrade > > without > > > first verifying the quorum configuration on the ZK-based brokers. A > > neutral > > > thing (neither good nor bad, I think) is that we'd have a new RPC rather > > > than a znode change. > > > >> > > > >> best, > > > >> Colin > > > >> > > > >> > > > >> > -David > > > >> > > > > >> > > > > >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> > > > wrote: > > > >> >> > > > >> >> Hi David, > > > >> >> > > > >> >> Looks great. Some questions: > > > >> >> > > > >> >> I agree with Jun that it would be good to rename > > > metadata.migration.enable to something more zk-specific. Maybe > > > zk.metadata.migration.enable ? > > > >> >> > > > >> >> SImilarly, for MigrationRecord: can we rename this to > > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > > > >> >> > > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > > > we will eventually compact all the metadata logs into a snapshot. That > > > snapshot will then have to keep alive the memory of the old migration. So > > > it is not really a matter of replaying the old metadata logs (probably) > > but > > > a matter of checking to see what the ZkMigrationState is, which I suppose > > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we > > already > > > migrated / are migrating. > > > >> >> > > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > > > thought ZK already tracked this information in the mzxid of the znode? > > > >> >> > > > >> >> It is true that technically it is only needed in UMR, but I would > > > still suggest including KRaftControllerId in LeaderAndIsrRequest because > > it > > > will make debugging much easier. > > > >> >> > > > >> >> I would suggest not implementing the topic deletion state machine, > > > but just deleting topics eagerly when in migration mode. We can implement > > > this behavior change by keying off of whether KRaftControllerId is > > present > > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > > > expected (again, this behavior change would only be for migration mode) > > > >> >> > > > >> >> For existing KRaft controllers, will > > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > > > (MigrationFinalized)? I assume this is true, but it would be good to > > spell > > > it out. Sorry if this is answered somewhere else. > > > >> >> > > > >> >> As you point out, the ZK brokers being upgraded will need to > > contact > > > the KRaft quorum in order to forward requests to there, once we are in > > > migration mode. This raises a question: rather than changing the broker > > > registration, can we have those brokers send an RPC to the kraft > > controller > > > quorum instead? This would serve to confirm that they can reach the > > quorum. > > > Then the quorum could wait for all of the brokers to check in this way > > > before starting the migration (It would know all the brokers by looking > > at > > > /brokers) > > > >> >> > > > >> >> best, > > > >> >> Colin > > > >> >> > > > >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote: > > > >> >> > Hello everyone, I'd like to start the vote on KIP-866. > > > >> >> > > > > >> >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration > > > >> >> > > > > >> >> > Thanks! > > > >> >> > David Arthur > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > David Arthur > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > -- David Arthur