Thanks, +1 from me. I suspect we might be in for at least one surprise with
the re-implemented controller RPCs, but I agree the alternative has risks
as well.

Best,
Jason

On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Fri, Nov 11, 2022, at 08:59, David Arthur wrote:
> > Thanks, Colin.
> >
> >> never start an upgrade without first verifying the quorum configuration
> on the ZK-based brokers
> >
> > I agree that this is a pretty big benefit. I could imagine debugging
> > and fixing connection problems mid-migration would be a big pain.
> > Especially if you had some brokers correctly configured, and others
> > not.
> >
> > Adding a heartbeat raises some questions about what to do if a broker
> > goes into a bad state, or stops heartbeating, during a migration.
> > However, I think the same is true for a registration based approach,
> > so maybe it's not an increase in net complexity.
> >
>
> Hi David,
>
> Yeah. I think the goal should be for the set of heartbeaters to match the
> set of broker registrations under /brokers
>
> Obviously, people could add or remove brokers after the upgrade has begun,
> but that's unavoidable, I think. We can at least ensure that at the time we
> enter upgrade, all the brokers are ready.
>
> > I've replaced the ZK registration section with a new RPC and brief
> > description. Please take a look.
> >
>
> Thanks, David. With these changes it LGTM to me.
>
> +1 (binding)
>
> Colin
>
> > Thanks!
> > David
> >
> > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> wrote:
> >>
> >> Hi David,
> >>
> >> Thanks for the response. Replies inline.
> >>
> >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote:
> >> > Colin
> >> >
> >> >>  Maybe zk.metadata.migration.enable ?
> >> >
> >> > Done. I went with "zookeeper.metadata.migration.enable" since our
> >> > other ZK configs start with "zookeeper.*"
> >> >
> >> >> SImilarly, for MigrationRecord: can we rename this to
> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> >> >
> >> > Sure
> >> >
> >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that
> we will eventually compact all the metadata logs into a snapshot. That
> snapshot will then have to keep alive the memory of the old migration. So
> it is not really a matter of replaying the old metadata logs (probably) but
> a matter of checking to see what the ZkMigrationState is, which I suppose
> could be Optional<ZkMigrationState>. If it's not Optional.empty, we already
> migrated / are migrating.
> >> >
> >> > Yea, makes sense.
> >> >
> >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I
> thought ZK already tracked this information in the mzxid of the znode?
> >> >
> >> > Yes, Jun pointed this out previously, I missed this update in the KIP.
> >> > Fixed now.
> >> >
> >> >> It is true that technically it is only needed in UMR, but I would
> still suggest including KRaftControllerId in LeaderAndIsrRequest because it
> will make debugging much easier.
> >> >>
> >> >> I would suggest not implementing the topic deletion state machine,
> but just deleting topics eagerly when in migration mode. We can implement
> this behavior change by keying off of whether KRaftControllerId is present
> in LeaderAndIsrRequest. On broker startup, we'll be sent a full
> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as
> expected (again, this behavior change would only be for migration mode)
> >> >
> >> > Sounds good to me. Since this is somewhat of an implementation detail,
> >> > do you think we need this included in the KIP?
> >>
> >> Yeah, maybe we don't need to go into the delete behavior here. But I
> think the KIP should specify that we have KRaftControllerId in both
> LeaderAndIsrRequest. That will allow us to implement this behavior
> conditionally on zk-based brokers when in dual write mode.
> >>
> >> >
> >> >> For existing KRaft controllers, will
> kafka.controller:type=KafkaController,name=MigrationState show up as 4
> (MigrationFinalized)? I assume this is true, but it would be good to spell
> it out. Sorry if this is answered somewhere else.
> >> >
> >> > We discussed using 0 (None) as the value to report for original,
> >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for
> >> > clusters which underwent a migration. I have some description of this
> >> > in the table under "Migration Overview"
> >> >
> >>
> >> I don't feel that strongly about this, but wouldn't it be a good idea
> for MigrationState to have a different value for ZK-based clusters and
> KRaft-based clusters? If you have a bunch of clusters and you take an
> aggregate of this metric, it would be good to get a report of three numbers:
> >> 1. unupgraded ZK
> >> 2. in progress upgrades
> >> 3. kraft
> >>
> >> I guess we could get that from examining some other metrics too,
> though. Not sure, what do you think?
> >>
> >> >> As you point out, the ZK brokers being upgraded will need to contact
> the KRaft quorum in order to forward requests to there, once we are in
> migration mode. This raises a question: rather than changing the broker
> registration, can we have those brokers send an RPC to the kraft controller
> quorum instead? This would serve to confirm that they can reach the quorum.
> Then the quorum could wait for all of the brokers to check in this way
> before starting the migration (It would know all the brokers by looking at
> /brokers)
> >> >
> >> > One of the motivations I had for putting the migration details in the
> >> > broker registration is that it removes ordering constraints between
> >> > the brokers and controllers when starting the migration. If we set the
> >> > brokers in migration mode before the KRaft quorum is available, they
> >> > will just carry on until the KRaft controller takes over the
> >> > controller leadership and starts sending UMR/LISR.
> >> >
> >>
> >> I agree that any scheme that requires complicated startup ordering is
> not a good idea. I was more thinking about having the zk-based brokers
> periodically send some kind of "I'm here" upgrade heartbeat to the
> controller quorum.
> >>
> >> > I wonder if we could infer reachability of the new KRaft controller by
> >> > the brokers through ApiVersions? Once taking leadership, the active
> >> > KRaft controller could monitor its NodeApiVersions to see that each of
> >> > the ZK brokers had reached it. Would this be enough to verify
> >> > reachability?
> >> >
> >>
> >> If the new KRaft controller has already taken leadership, it's too late
> to do anything about brokers that can't reach it. Unless we want to
> implement automatic rollback to the pre-upgrade state, which sounds quite
> complex and error prone.
> >>
> >> So, to sum up: my proposal was that the ZK based brokers periodically
> (like maybe every 10 seconds) try to contact the quorum with an RPC
> containing their IBP and readiness state. I guess the main negative thing
> is that it would be more network traffic (but probably not significantly
> more.) The main positive thing is that we'd never start an upgrade without
> first verifying the quorum configuration on the ZK-based brokers. A neutral
> thing (neither good nor bad, I think) is that we'd have a new RPC rather
> than a znode change.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> > -David
> >> >
> >> >
> >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> >> >>
> >> >> Hi David,
> >> >>
> >> >> Looks great. Some questions:
> >> >>
> >> >> I agree with Jun that it would be good to rename
> metadata.migration.enable to something more zk-specific. Maybe
> zk.metadata.migration.enable ?
> >> >>
> >> >> SImilarly, for MigrationRecord: can we rename this to
> ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState.
> >> >>
> >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that
> we will eventually compact all the metadata logs into a snapshot. That
> snapshot will then have to keep alive the memory of the old migration. So
> it is not really a matter of replaying the old metadata logs (probably) but
> a matter of checking to see what the ZkMigrationState is, which I suppose
> could be Optional<ZkMigrationState>. If it's not Optional.empty, we already
> migrated / are migrating.
> >> >>
> >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I
> thought ZK already tracked this information in the mzxid of the znode?
> >> >>
> >> >> It is true that technically it is only needed in UMR, but I would
> still suggest including KRaftControllerId in LeaderAndIsrRequest because it
> will make debugging much easier.
> >> >>
> >> >> I would suggest not implementing the topic deletion state machine,
> but just deleting topics eagerly when in migration mode. We can implement
> this behavior change by keying off of whether KRaftControllerId is present
> in LeaderAndIsrRequest. On broker startup, we'll be sent a full
> LeaderAndIsrRequest and can delete stray partitions whose IDs are not as
> expected (again, this behavior change would only be for migration mode)
> >> >>
> >> >> For existing KRaft controllers, will
> kafka.controller:type=KafkaController,name=MigrationState show up as 4
> (MigrationFinalized)? I assume this is true, but it would be good to spell
> it out. Sorry if this is answered somewhere else.
> >> >>
> >> >> As you point out, the ZK brokers being upgraded will need to contact
> the KRaft quorum in order to forward requests to there, once we are in
> migration mode. This raises a question: rather than changing the broker
> registration, can we have those brokers send an RPC to the kraft controller
> quorum instead? This would serve to confirm that they can reach the quorum.
> Then the quorum could wait for all of the brokers to check in this way
> before starting the migration (It would know all the brokers by looking at
> /brokers)
> >> >>
> >> >> best,
> >> >> Colin
> >> >>
> >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote:
> >> >> > Hello everyone, I'd like to start the vote on KIP-866.
> >> >> >
> >> >> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> >> >> >
> >> >> > Thanks!
> >> >> > David Arthur
> >> >
> >> >
> >> >
> >> > --
> >> > David Arthur
> >
> >
> >
> > --
> > David Arthur
>

Reply via email to