Hi, David, Thanks for the KIP. +1
Jun On Tue, Nov 22, 2022 at 3:43 PM Jason Gustafson <ja...@confluent.io.invalid> wrote: > Thanks, +1 from me. I suspect we might be in for at least one surprise with > the re-implemented controller RPCs, but I agree the alternative has risks > as well. > > Best, > Jason > > On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote: > > > On Fri, Nov 11, 2022, at 08:59, David Arthur wrote: > > > Thanks, Colin. > > > > > >> never start an upgrade without first verifying the quorum > configuration > > on the ZK-based brokers > > > > > > I agree that this is a pretty big benefit. I could imagine debugging > > > and fixing connection problems mid-migration would be a big pain. > > > Especially if you had some brokers correctly configured, and others > > > not. > > > > > > Adding a heartbeat raises some questions about what to do if a broker > > > goes into a bad state, or stops heartbeating, during a migration. > > > However, I think the same is true for a registration based approach, > > > so maybe it's not an increase in net complexity. > > > > > > > Hi David, > > > > Yeah. I think the goal should be for the set of heartbeaters to match the > > set of broker registrations under /brokers > > > > Obviously, people could add or remove brokers after the upgrade has > begun, > > but that's unavoidable, I think. We can at least ensure that at the time > we > > enter upgrade, all the brokers are ready. > > > > > I've replaced the ZK registration section with a new RPC and brief > > > description. Please take a look. > > > > > > > Thanks, David. With these changes it LGTM to me. > > > > +1 (binding) > > > > Colin > > > > > Thanks! > > > David > > > > > > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> > wrote: > > >> > > >> Hi David, > > >> > > >> Thanks for the response. Replies inline. > > >> > > >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote: > > >> > Colin > > >> > > > >> >> Maybe zk.metadata.migration.enable ? > > >> > > > >> > Done. I went with "zookeeper.metadata.migration.enable" since our > > >> > other ZK configs start with "zookeeper.*" > > >> > > > >> >> SImilarly, for MigrationRecord: can we rename this to > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > > >> > > > >> > Sure > > >> > > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > > we will eventually compact all the metadata logs into a snapshot. That > > snapshot will then have to keep alive the memory of the old migration. So > > it is not really a matter of replaying the old metadata logs (probably) > but > > a matter of checking to see what the ZkMigrationState is, which I suppose > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we > already > > migrated / are migrating. > > >> > > > >> > Yea, makes sense. > > >> > > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > > thought ZK already tracked this information in the mzxid of the znode? > > >> > > > >> > Yes, Jun pointed this out previously, I missed this update in the > KIP. > > >> > Fixed now. > > >> > > > >> >> It is true that technically it is only needed in UMR, but I would > > still suggest including KRaftControllerId in LeaderAndIsrRequest because > it > > will make debugging much easier. > > >> >> > > >> >> I would suggest not implementing the topic deletion state machine, > > but just deleting topics eagerly when in migration mode. We can implement > > this behavior change by keying off of whether KRaftControllerId is > present > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > > expected (again, this behavior change would only be for migration mode) > > >> > > > >> > Sounds good to me. Since this is somewhat of an implementation > detail, > > >> > do you think we need this included in the KIP? > > >> > > >> Yeah, maybe we don't need to go into the delete behavior here. But I > > think the KIP should specify that we have KRaftControllerId in both > > LeaderAndIsrRequest. That will allow us to implement this behavior > > conditionally on zk-based brokers when in dual write mode. > > >> > > >> > > > >> >> For existing KRaft controllers, will > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > > (MigrationFinalized)? I assume this is true, but it would be good to > spell > > it out. Sorry if this is answered somewhere else. > > >> > > > >> > We discussed using 0 (None) as the value to report for original, > > >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for > > >> > clusters which underwent a migration. I have some description of > this > > >> > in the table under "Migration Overview" > > >> > > > >> > > >> I don't feel that strongly about this, but wouldn't it be a good idea > > for MigrationState to have a different value for ZK-based clusters and > > KRaft-based clusters? If you have a bunch of clusters and you take an > > aggregate of this metric, it would be good to get a report of three > numbers: > > >> 1. unupgraded ZK > > >> 2. in progress upgrades > > >> 3. kraft > > >> > > >> I guess we could get that from examining some other metrics too, > > though. Not sure, what do you think? > > >> > > >> >> As you point out, the ZK brokers being upgraded will need to > contact > > the KRaft quorum in order to forward requests to there, once we are in > > migration mode. This raises a question: rather than changing the broker > > registration, can we have those brokers send an RPC to the kraft > controller > > quorum instead? This would serve to confirm that they can reach the > quorum. > > Then the quorum could wait for all of the brokers to check in this way > > before starting the migration (It would know all the brokers by looking > at > > /brokers) > > >> > > > >> > One of the motivations I had for putting the migration details in > the > > >> > broker registration is that it removes ordering constraints between > > >> > the brokers and controllers when starting the migration. If we set > the > > >> > brokers in migration mode before the KRaft quorum is available, they > > >> > will just carry on until the KRaft controller takes over the > > >> > controller leadership and starts sending UMR/LISR. > > >> > > > >> > > >> I agree that any scheme that requires complicated startup ordering is > > not a good idea. I was more thinking about having the zk-based brokers > > periodically send some kind of "I'm here" upgrade heartbeat to the > > controller quorum. > > >> > > >> > I wonder if we could infer reachability of the new KRaft controller > by > > >> > the brokers through ApiVersions? Once taking leadership, the active > > >> > KRaft controller could monitor its NodeApiVersions to see that each > of > > >> > the ZK brokers had reached it. Would this be enough to verify > > >> > reachability? > > >> > > > >> > > >> If the new KRaft controller has already taken leadership, it's too > late > > to do anything about brokers that can't reach it. Unless we want to > > implement automatic rollback to the pre-upgrade state, which sounds quite > > complex and error prone. > > >> > > >> So, to sum up: my proposal was that the ZK based brokers periodically > > (like maybe every 10 seconds) try to contact the quorum with an RPC > > containing their IBP and readiness state. I guess the main negative thing > > is that it would be more network traffic (but probably not significantly > > more.) The main positive thing is that we'd never start an upgrade > without > > first verifying the quorum configuration on the ZK-based brokers. A > neutral > > thing (neither good nor bad, I think) is that we'd have a new RPC rather > > than a znode change. > > >> > > >> best, > > >> Colin > > >> > > >> > > >> > -David > > >> > > > >> > > > >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> > > wrote: > > >> >> > > >> >> Hi David, > > >> >> > > >> >> Looks great. Some questions: > > >> >> > > >> >> I agree with Jun that it would be good to rename > > metadata.migration.enable to something more zk-specific. Maybe > > zk.metadata.migration.enable ? > > >> >> > > >> >> SImilarly, for MigrationRecord: can we rename this to > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. > > >> >> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that > > we will eventually compact all the metadata logs into a snapshot. That > > snapshot will then have to keep alive the memory of the old migration. So > > it is not really a matter of replaying the old metadata logs (probably) > but > > a matter of checking to see what the ZkMigrationState is, which I suppose > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we > already > > migrated / are migrating. > > >> >> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I > > thought ZK already tracked this information in the mzxid of the znode? > > >> >> > > >> >> It is true that technically it is only needed in UMR, but I would > > still suggest including KRaftControllerId in LeaderAndIsrRequest because > it > > will make debugging much easier. > > >> >> > > >> >> I would suggest not implementing the topic deletion state machine, > > but just deleting topics eagerly when in migration mode. We can implement > > this behavior change by keying off of whether KRaftControllerId is > present > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as > > expected (again, this behavior change would only be for migration mode) > > >> >> > > >> >> For existing KRaft controllers, will > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 > > (MigrationFinalized)? I assume this is true, but it would be good to > spell > > it out. Sorry if this is answered somewhere else. > > >> >> > > >> >> As you point out, the ZK brokers being upgraded will need to > contact > > the KRaft quorum in order to forward requests to there, once we are in > > migration mode. This raises a question: rather than changing the broker > > registration, can we have those brokers send an RPC to the kraft > controller > > quorum instead? This would serve to confirm that they can reach the > quorum. > > Then the quorum could wait for all of the brokers to check in this way > > before starting the migration (It would know all the brokers by looking > at > > /brokers) > > >> >> > > >> >> best, > > >> >> Colin > > >> >> > > >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote: > > >> >> > Hello everyone, I'd like to start the vote on KIP-866. > > >> >> > > > >> >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration > > >> >> > > > >> >> > Thanks! > > >> >> > David Arthur > > >> > > > >> > > > >> > > > >> > -- > > >> > David Arthur > > > > > > > > > > > > -- > > > David Arthur > > >