Re: [VOTE] KIP-866 ZooKeeper to KRaft Migration

David Arthur Wed, 30 Nov 2022 12:28:09 -0800

Hey everyone, I'm going to close out the vote. There were three +1
votes from PMC members and no committer or community votes.


+1 PMC votes:
* Colin McCabe
* Jason Gustafson
* Jun Rao

With that, the vote passes. Thanks to everyone who reviewed this KIP!

Cheers,
David

On Tue, Nov 29, 2022 at 7:48 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> Hi, David,
>
> Thanks for the KIP. +1
>
> Jun
>
> On Tue, Nov 22, 2022 at 3:43 PM Jason Gustafson <ja...@confluent.io.invalid>
> wrote:
>
> > Thanks, +1 from me. I suspect we might be in for at least one surprise with
> > the re-implemented controller RPCs, but I agree the alternative has risks
> > as well.
> >
> > Best,
> > Jason
> >
> > On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > On Fri, Nov 11, 2022, at 08:59, David Arthur wrote:
> > > > Thanks, Colin.
> > > >
> > > >> never start an upgrade without first verifying the quorum
> > configuration
> > > on the ZK-based brokers
> > > >
> > > > I agree that this is a pretty big benefit. I could imagine debugging
> > > > and fixing connection problems mid-migration would be a big pain.
> > > > Especially if you had some brokers correctly configured, and others
> > > > not.
> > > >
> > > > Adding a heartbeat raises some questions about what to do if a broker
> > > > goes into a bad state, or stops heartbeating, during a migration.
> > > > However, I think the same is true for a registration based approach,
> > > > so maybe it's not an increase in net complexity.
> > > >
> > >
> > > Hi David,
> > >
> > > Yeah. I think the goal should be for the set of heartbeaters to match the
> > > set of broker registrations under /brokers
> > >
> > > Obviously, people could add or remove brokers after the upgrade has
> > begun,
> > > but that's unavoidable, I think. We can at least ensure that at the time
> > we
> > > enter upgrade, all the brokers are ready.
> > >
> > > > I've replaced the ZK registration section with a new RPC and brief
> > > > description. Please take a look.
> > > >
> > >
> > > Thanks, David. With these changes it LGTM to me.
> > >
> > > +1 (binding)
> > >
> > > Colin
> > >
> > > > Thanks!
> > > > David
> > > >
> > > > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org>
> > wrote:
> > > >>
> > > >> Hi David,
> > > >>
> > > >> Thanks for the response. Replies inline.
> > > >>
> > > >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote:
> > > >> > Colin
> > > >> >
> > > >> >>  Maybe zk.metadata.migration.enable ?
> > > >> >
> > > >> > Done. I went with "zookeeper.metadata.migration.enable" since our
> > > >> > other ZK configs start with "zookeeper.*"
> > > >> >
> > > >> >> SImilarly, for MigrationRecord: can we rename this to
> > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> > > >> >
> > > >> > Sure
> > > >> >
> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that
> > > we will eventually compact all the metadata logs into a snapshot. That
> > > snapshot will then have to keep alive the memory of the old migration. So
> > > it is not really a matter of replaying the old metadata logs (probably)
> > but
> > > a matter of checking to see what the ZkMigrationState is, which I suppose
> > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we
> > already
> > > migrated / are migrating.
> > > >> >
> > > >> > Yea, makes sense.
> > > >> >
> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I
> > > thought ZK already tracked this information in the mzxid of the znode?
> > > >> >
> > > >> > Yes, Jun pointed this out previously, I missed this update in the
> > KIP.
> > > >> > Fixed now.
> > > >> >
> > > >> >> It is true that technically it is only needed in UMR, but I would
> > > still suggest including KRaftControllerId in LeaderAndIsrRequest because
> > it
> > > will make debugging much easier.
> > > >> >>
> > > >> >> I would suggest not implementing the topic deletion state machine,
> > > but just deleting topics eagerly when in migration mode. We can implement
> > > this behavior change by keying off of whether KRaftControllerId is
> > present
> > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full
> > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as
> > > expected (again, this behavior change would only be for migration mode)
> > > >> >
> > > >> > Sounds good to me. Since this is somewhat of an implementation
> > detail,
> > > >> > do you think we need this included in the KIP?
> > > >>
> > > >> Yeah, maybe we don't need to go into the delete behavior here. But I
> > > think the KIP should specify that we have KRaftControllerId in both
> > > LeaderAndIsrRequest. That will allow us to implement this behavior
> > > conditionally on zk-based brokers when in dual write mode.
> > > >>
> > > >> >
> > > >> >> For existing KRaft controllers, will
> > > kafka.controller:type=KafkaController,name=MigrationState show up as 4
> > > (MigrationFinalized)? I assume this is true, but it would be good to
> > spell
> > > it out. Sorry if this is answered somewhere else.
> > > >> >
> > > >> > We discussed using 0 (None) as the value to report for original,
> > > >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for
> > > >> > clusters which underwent a migration. I have some description of
> > this
> > > >> > in the table under "Migration Overview"
> > > >> >
> > > >>
> > > >> I don't feel that strongly about this, but wouldn't it be a good idea
> > > for MigrationState to have a different value for ZK-based clusters and
> > > KRaft-based clusters? If you have a bunch of clusters and you take an
> > > aggregate of this metric, it would be good to get a report of three
> > numbers:
> > > >> 1. unupgraded ZK
> > > >> 2. in progress upgrades
> > > >> 3. kraft
> > > >>
> > > >> I guess we could get that from examining some other metrics too,
> > > though. Not sure, what do you think?
> > > >>
> > > >> >> As you point out, the ZK brokers being upgraded will need to
> > contact
> > > the KRaft quorum in order to forward requests to there, once we are in
> > > migration mode. This raises a question: rather than changing the broker
> > > registration, can we have those brokers send an RPC to the kraft
> > controller
> > > quorum instead? This would serve to confirm that they can reach the
> > quorum.
> > > Then the quorum could wait for all of the brokers to check in this way
> > > before starting the migration (It would know all the brokers by looking
> > at
> > > /brokers)
> > > >> >
> > > >> > One of the motivations I had for putting the migration details in
> > the
> > > >> > broker registration is that it removes ordering constraints between
> > > >> > the brokers and controllers when starting the migration. If we set
> > the
> > > >> > brokers in migration mode before the KRaft quorum is available, they
> > > >> > will just carry on until the KRaft controller takes over the
> > > >> > controller leadership and starts sending UMR/LISR.
> > > >> >
> > > >>
> > > >> I agree that any scheme that requires complicated startup ordering is
> > > not a good idea. I was more thinking about having the zk-based brokers
> > > periodically send some kind of "I'm here" upgrade heartbeat to the
> > > controller quorum.
> > > >>
> > > >> > I wonder if we could infer reachability of the new KRaft controller
> > by
> > > >> > the brokers through ApiVersions? Once taking leadership, the active
> > > >> > KRaft controller could monitor its NodeApiVersions to see that each
> > of
> > > >> > the ZK brokers had reached it. Would this be enough to verify
> > > >> > reachability?
> > > >> >
> > > >>
> > > >> If the new KRaft controller has already taken leadership, it's too
> > late
> > > to do anything about brokers that can't reach it. Unless we want to
> > > implement automatic rollback to the pre-upgrade state, which sounds quite
> > > complex and error prone.
> > > >>
> > > >> So, to sum up: my proposal was that the ZK based brokers periodically
> > > (like maybe every 10 seconds) try to contact the quorum with an RPC
> > > containing their IBP and readiness state. I guess the main negative thing
> > > is that it would be more network traffic (but probably not significantly
> > > more.) The main positive thing is that we'd never start an upgrade
> > without
> > > first verifying the quorum configuration on the ZK-based brokers. A
> > neutral
> > > thing (neither good nor bad, I think) is that we'd have a new RPC rather
> > > than a znode change.
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> > -David
> > > >> >
> > > >> >
> > > >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >> >>
> > > >> >> Hi David,
> > > >> >>
> > > >> >> Looks great. Some questions:
> > > >> >>
> > > >> >> I agree with Jun that it would be good to rename
> > > metadata.migration.enable to something more zk-specific. Maybe
> > > zk.metadata.migration.enable ?
> > > >> >>
> > > >> >> SImilarly, for MigrationRecord: can we rename this to
> > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> > > >> >>
> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that
> > > we will eventually compact all the metadata logs into a snapshot. That
> > > snapshot will then have to keep alive the memory of the old migration. So
> > > it is not really a matter of replaying the old metadata logs (probably)
> > but
> > > a matter of checking to see what the ZkMigrationState is, which I suppose
> > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we
> > already
> > > migrated / are migrating.
> > > >> >>
> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I
> > > thought ZK already tracked this information in the mzxid of the znode?
> > > >> >>
> > > >> >> It is true that technically it is only needed in UMR, but I would
> > > still suggest including KRaftControllerId in LeaderAndIsrRequest because
> > it
> > > will make debugging much easier.
> > > >> >>
> > > >> >> I would suggest not implementing the topic deletion state machine,
> > > but just deleting topics eagerly when in migration mode. We can implement
> > > this behavior change by keying off of whether KRaftControllerId is
> > present
> > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full
> > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as
> > > expected (again, this behavior change would only be for migration mode)
> > > >> >>
> > > >> >> For existing KRaft controllers, will
> > > kafka.controller:type=KafkaController,name=MigrationState show up as 4
> > > (MigrationFinalized)? I assume this is true, but it would be good to
> > spell
> > > it out. Sorry if this is answered somewhere else.
> > > >> >>
> > > >> >> As you point out, the ZK brokers being upgraded will need to
> > contact
> > > the KRaft quorum in order to forward requests to there, once we are in
> > > migration mode. This raises a question: rather than changing the broker
> > > registration, can we have those brokers send an RPC to the kraft
> > controller
> > > quorum instead? This would serve to confirm that they can reach the
> > quorum.
> > > Then the quorum could wait for all of the brokers to check in this way
> > > before starting the migration (It would know all the brokers by looking
> > at
> > > /brokers)
> > > >> >>
> > > >> >> best,
> > > >> >> Colin
> > > >> >>
> > > >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote:
> > > >> >> > Hello everyone, I'd like to start the vote on KIP-866.
> > > >> >> >
> > > >> >> >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > > >> >> >
> > > >> >> > Thanks!
> > > >> >> > David Arthur
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > David Arthur
> > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > >
> >



-- 
David Arthur

Re: [VOTE] KIP-866 ZooKeeper to KRaft Migration

Reply via email to