Hi, David, Thanks for the reply.
20/21. Thanks for the explanation. The suggestion sounds good. Could you include that in the doc? 40. metadata.migration.enable: We may do future metadata migrations within KRaft. Could we make the name more specific to ZK migration like zookeeper.metadata.migration.enable? Thanks, Jun On Tue, Nov 8, 2022 at 9:47 AM David Arthur <mum...@gmail.com> wrote: > Ah, sorry about that, you're right. Since we won't support ZK > migrations in combined mode, is this issue avoided? > > Essentially, we would only set ZkMigrationReady in ApiVersionsResponse if > > * process.roles=controller > * kafka.metadata.migration.enabled=true > > Also, the following would be an invalid config that should prevent startup: > > * process.roles=broker,controller > * kafka.metadata.migration.enabled=true > > Does this seem reasonable? > > Thanks! > David > > On Tue, Nov 8, 2022 at 11:12 AM Jun Rao <j...@confluent.io.invalid> wrote: > > > > Hi, David, > > > > I am not sure that we are fully settled on the following. > > > > 20/21. Since separate listeners are optional, it seems that the broker > > can't distinguish between ApiVersion requests coming from the client or > > other brokers. This means the clients will get ZkMigrationReady in the > > ApiVersion response, which is weird. > > > > Thanks, > > > > Jun > > > > On Tue, Nov 8, 2022 at 7:18 AM David Arthur <mum...@gmail.com> wrote: > > > > > Thanks for the discussion everyone, I'm going to move ahead with the > > > vote for this KIP. > > > > > > -David > > > > > > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao <j...@confluent.io.invalid> > wrote: > > > > > > > > Hi, David, > > > > > > > > Thanks for the reply. > > > > > > > > 20/21 Yes, but separate listeners are optional. It's possible for the > > > nodes > > > > to use a single port for both client and server side communications. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > 20/21, in combined mode we still have a separate listener for the > > > > > controller APIs, e.g., > > > > > > > > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093 > > > > > > > > > > inter.broker.listener.name=PLAINTEXT > > > > > > > > > > controller.listener.names=CONTROLLER > > > > > > > > > > advertised.listeners=PLAINTEXT://localhost:9092 > > > > > > > > > > > > > > > > > > > > Clients still talk to the broker through the advertised listener, > and > > > only > > > > > brokers and other controllers will communicate over the controller > > > > > listener. > > > > > > > > > > 40. Sounds good, I updated the KIP > > > > > > > > > > Thanks! > > > > > David > > > > > > > > > > > > > > > > > > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao <j...@confluent.io.invalid> > > > wrote: > > > > > > > > > > > Hi, David, > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > 20/21. When KRaft runs in the combined mode, does a controller > know > > > > > whether > > > > > > an ApiRequest is from a client or another broker? > > > > > > > > > > > > 40. Adding a "None" state sounds reasonable. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > 20/21 If we use a tagged field, then I don't think clients > need to > > > be > > > > > > > concerned with it, right?. In ApiVersionsResponse sent by > brokers > > > to > > > > > > > clients, this field would be omitted. Clients can't connect > > > directly to > > > > > > the > > > > > > > KRaft controller nodes. Also, we have a precedent of sending > > > controller > > > > > > > node state between controllers through ApiVersions > > > ("metadata.version" > > > > > > > feature), so I think it fits well. > > > > > > > > > > > > > > 40. For new KRaft clusters, we could add another state to > indicate > > > it > > > > > was > > > > > > > not migrated from ZK, but created as a KRaft cluster. Maybe > > > > > > > "kafka.controller:type=KafkaController,name=MigrationState" => > > > "None" ? > > > > > > We > > > > > > > could also omit that metric for unmigrated clusters, but I'm > not a > > > fan > > > > > of > > > > > > > using the absence of a value to signal something. > > > > > > > > > > > > > > ----- > > > > > > > > > > > > > > Akhilesh, thanks for reviewing the KIP! > > > > > > > > > > > > > > 1. MigrationState and MetadataType are mostly the same on the > > > > > controller, > > > > > > > but we do have the "MigratingZkData" state that seems useful to > > > report > > > > > > as a > > > > > > > metric. Aside from looking at logs, observing the controller in > > > this > > > > > > state > > > > > > > is the only way to see how long its taking to copy data from > ZK. > > > > > > > > > > > > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's > > > > > question > > > > > > > about non-migrated clusters. I think it's useful to have a > distinct > > > > > > > MigrationState for clusters that have been migrated and those > that > > > were > > > > > > > never migrated. This does mean we'll report the MigrationState > long > > > > > after > > > > > > > the migration is complete, but we can drop these metrics in 4.0 > > > once ZK > > > > > > is > > > > > > > removed. > > > > > > > > > > > > > > 2. The "ZkMigrationReady" will indicate that the controller has > > > > > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We > need > > > > > some > > > > > > > way to indicate that the whole quorum is correctly configured > to > > > handle > > > > > > the > > > > > > > migration so we don't failover to a controller that's not > > > configured > > > > > for > > > > > > > ZK. Did I understand your question correctly? > > > > > > > > > > > > > > 3. Yea, good idea. While the KRaft controller has > > > > > > > MigrationState=MigrationIneligible, we could also report > > > > > > > > > > > > > > > > > > > > > > "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount". > > > > > > > It might be useful to report ineligible controllers as well > since > > > that > > > > > > can > > > > > > > prevent the migration from starting. > > > > > > > > > > > > > > 4. I think I covered this in "Incompatible Brokers". We > effectively > > > > > fence > > > > > > > these brokers by not sending them metadata RPCs. > > > > > > > > > > > > > > Thanks! > > > > > > > David > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti < > > > > > akhilesh....@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. I have some questions/suggestions. > > > > > > > > > > > > > > > > > > > > > > > > 1) I see two new metrics: > > > > > > > > kafka.controller:type=KafkaController,name=MetadataType and > > > > > > > > kafka.controller:type=KafkaController,name=MigrationState. > Won't > > > the > > > > > > > second > > > > > > > > metric already cover the cases of the first metric? Also, > > > instead of > > > > > > > > MigrationFinalized, we could directly say the state is > > > KRaftMode. So > > > > > we > > > > > > > can > > > > > > > > use the same value for default KRaft clusters. > > > > > > > > > > > > > > > > > > > > > > > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft > > > Controller. By > > > > > > > > default, we plan to start the Controller quorum in " > > > > > > > > *kafka.metadata.migration.enable*" config set to true. Then > do we > > > > > need > > > > > > > this > > > > > > > > additional information again to make sure The controllers are > > > ready > > > > > for > > > > > > > > migration? What would happen if the Controller assumes it is > > > ready > > > > > for > > > > > > > > migration from 3.4 by default if it doesn't see both > > > > > MigrationMetadata > > > > > > > > records? > > > > > > > > > > > > > > > > > > > > > > > > 3) I see that we do not impose order on rolling the brokers > with > > > > > > > migration > > > > > > > > flags and provisioning the controller quorum. Along with the > > > KRaft > > > > > > > > controller emitting migration state metrics, it may be > better to > > > emit > > > > > > the > > > > > > > > broker count for the brokers not ready for migration yet. > This > > > will > > > > > > give > > > > > > > us > > > > > > > > more insight into any roll issues. > > > > > > > > > > > > > > > > > > > > > > > > 4) Once the KRaft controller is in migration mode, we should > also > > > > > > > > prevent/handle ZkBrokerRegistrations that don't enable > migration > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > Akhilesh > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 1, 2022 at 10:49 AM Jun Rao > <j...@confluent.io.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, David, > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > 20/21. Regarding the new ZkMigrationReady field in > > > > > > ApiVersionsResponse, > > > > > > > > it > > > > > > > > > seems that this is a bit intrusive since it exposes > unneeded > > > info > > > > > to > > > > > > > the > > > > > > > > > clients. Another option is to add that field as part of the > > > Fetch > > > > > > > > request. > > > > > > > > > We can choose to only set that field in the very first > Fetch > > > > > request > > > > > > > > from a > > > > > > > > > Quorum follower. > > > > > > > > > > > > > > > > > > 40. For > > > kafka.controller:type=KafkaController,name=MigrationState, > > > > > > what > > > > > > > > is > > > > > > > > > the value for a brand new KRaft cluster? > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Mon, Oct 31, 2022 at 2:35 PM David Arthur > > > > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > > > > > > > 30. I think we can keep the single ControllerId field in > > > those > > > > > > > requests > > > > > > > > > > since they are only used for fencing (as far as I know). > > > > > > Internally, > > > > > > > > the > > > > > > > > > > broker components that handle those requests will > compare the > > > > > > > > > ControllerId > > > > > > > > > > with that of MetadataCache (which is updated via UMR). > > > > > > > > > > > > > > > > > > > > The reason we need the separate KRaftControllerId in the > > > > > > > UpdateMetadata > > > > > > > > > > code path so that we can have different connection > behavior > > > for a > > > > > > > KRaft > > > > > > > > > > controller vs ZK controller. > > > > > > > > > > > > > > > > > > > > 31. It seems reasonable to keep the MigrationRecord in > the > > > > > > snapshot. > > > > > > > I > > > > > > > > > was > > > > > > > > > > thinking the same thing in terms of understanding the > loss > > > for a > > > > > > > > > > migration-after-finalization. However, once a snapshot > has > > > been > > > > > > taken > > > > > > > > > that > > > > > > > > > > includes the final MigrationRecord, we can't easily see > which > > > > > > records > > > > > > > > > came > > > > > > > > > > after it. > > > > > > > > > > > > > > > > > > > > 32. You're correct, we can just use the modify time from > the > > > > > Stat. > > > > > > > The > > > > > > > > > > other two fields are primarily informational and are > there > > > for > > > > > > > > operators > > > > > > > > > > who want to inspect the state of the migration. They > aren't > > > > > > required > > > > > > > > for > > > > > > > > > > correctness > > > > > > > > > > > > > > > > > > > > 33. Yes that's right. I detail that in "Controller > > > Leadership" > > > > > > > section > > > > > > > > > > > > > > > > > > > > 34. Right, I'll fix that. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > On Mon, Oct 31, 2022 at 2:55 PM Jun Rao > > > <j...@confluent.io.invalid > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi, David, > > > > > > > > > > > > > > > > > > > > > > Thanks for the updated KIP. A few more comments. > > > > > > > > > > > > > > > > > > > > > > 30. LeaderAndIsrRequest/StopReplicaRequest both have a > > > > > > controllerId > > > > > > > > > > field. > > > > > > > > > > > Should we add a KRaftControllerId field like > > > UpdateMetadata? > > > > > > > > > > > > > > > > > > > > > > 31. "If a migration has been finalized, but the KRaft > > > quroum > > > > > > comes > > > > > > > up > > > > > > > > > > with > > > > > > > > > > > kafka.metadata.migration.enable, we must not re-enter > the > > > > > > migration > > > > > > > > > mode. > > > > > > > > > > > In this case, while replaying the log, the controller > can > > > see > > > > > the > > > > > > > > > second > > > > > > > > > > > MigrationRecord and know that the migration is > finalized > > > and > > > > > > should > > > > > > > > not > > > > > > > > > > be > > > > > > > > > > > resumed. " Hmm, do we want to keep the MigrationRecord > in > > > the > > > > > > > > snapshot > > > > > > > > > > and > > > > > > > > > > > the metadata log forever after migration is finalized? > If > > > not, > > > > > we > > > > > > > > can't > > > > > > > > > > > know for sure whether a migration has happened or not. > > > Also, it > > > > > > > might > > > > > > > > > be > > > > > > > > > > > useful to support switching back to ZK mode after the > > > migration > > > > > > is > > > > > > > > > > > finalized, with the understanding of potential metadata > > > loss. > > > > > In > > > > > > > that > > > > > > > > > > case, > > > > > > > > > > > we could just trim all metadata log and recopy the ZK > > > metadata > > > > > > > back. > > > > > > > > > > > > > > > > > > > > > > 32. The /migration node in ZK: Do we need > > > last_update_time_ms > > > > > > since > > > > > > > > ZK > > > > > > > > > > > Stats already has an MTime? Also, how do we plan to > use the > > > > > > > > > > > kraft_controller_id and kraft_controller_epoch fields? > > > > > > > > > > > > > > > > > > > > > > 33. Controller migration: We will force a write to the > > > > > > > "/controller" > > > > > > > > > and > > > > > > > > > > > "/controller_epoch" ZNodes before copying ZK data, > right? > > > > > > > > > > > > > > > > > > > > > > 34. "Operator can remove the persistent "/controller" > and > > > > > > > > > > > "/controller_epoch" nodes allowing for ZK controller > > > election > > > > > to > > > > > > > take > > > > > > > > > > > place". I guess the operator only needs to remove the > > > > > /controller > > > > > > > > path? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 31, 2022 at 7:17 AM David Arthur > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > Happy Monday, everyone! I've updated the KIP with the > > > > > following > > > > > > > > > > changes: > > > > > > > > > > > > > > > > > > > > > > > > * Clarified MetadataType metric usages (broker vs > > > controller) > > > > > > > > > > > > * Added ZkMigrationReady tagged field to > > > ApiVersionsResponse > > > > > > (for > > > > > > > > use > > > > > > > > > > by > > > > > > > > > > > > KRaft controller quorum) > > > > > > > > > > > > * Added MigrationRecord with two states: Started and > > > Finished > > > > > > > > > > > > * Documented ZK configs for KRaft controller > > > > > > > > > > > > * Simplified state machine description (internally, > more > > > > > states > > > > > > > > will > > > > > > > > > > > exist, > > > > > > > > > > > > but only the four documented are interesting to > > > operators) > > > > > > > > > > > > * Clarified some things in Controller Migration > section > > > > > > > > > > > > * Removed KRaft -> ZK parts of Broker Registration > > > > > > > > > > > > * Added Misconfigurations section to Failure Modes > > > > > > > > > > > > > > > > > > > > > > > > Let me know if I've missed anything from the past two > > > weeks > > > > > of > > > > > > > > > > > discussion. > > > > > > > > > > > > > > > > > > > > > > > > Thanks again to everyone who has reviewed this KIP so > > > far! > > > > > > > > > > > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 28, 2022 at 2:26 PM Jun Rao > > > > > > <j...@confluent.io.invalid > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, David, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21. Sounds good. > > > > > > > > > > > > > > > > > > > > > > > > > > Could you update the doc with all the changes being > > > > > > discussed? > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 28, 2022 at 10:11 AM David Arthur > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21. I was also wondering about a "migration" > > > record. > > > > > In > > > > > > > > > addition > > > > > > > > > > > to > > > > > > > > > > > > > the > > > > > > > > > > > > > > scenario you mentioned, we also need a way to > > > prevent the > > > > > > > > cluster > > > > > > > > > > > from > > > > > > > > > > > > > > re-entering the dual write mode after the > migration > > > has > > > > > > been > > > > > > > > > > > > finalized. I > > > > > > > > > > > > > > could see this happening inadvertently via a > change > > > in > > > > > some > > > > > > > > > > > > configuration > > > > > > > > > > > > > > management system. How about we add a record that > > > marks > > > > > the > > > > > > > > > > beginning > > > > > > > > > > > > and > > > > > > > > > > > > > > end of the dual-write mode. The first occurrence > of > > > the > > > > > > > record > > > > > > > > > > could > > > > > > > > > > > be > > > > > > > > > > > > > > included in the metadata transaction when we > migrate > > > data > > > > > > > from > > > > > > > > > ZK. > > > > > > > > > > > > > > > > > > > > > > > > > > > > With this, the active controller would decide > > > whether to > > > > > > > enter > > > > > > > > > dual > > > > > > > > > > > > write > > > > > > > > > > > > > > mode, finalize the migration based, or fail > based on: > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Metadata log state > > > > > > > > > > > > > > * It's own configuration > > > > > > ("kafka.metadata.migration.enable", > > > > > > > > > > > > > > "zookeeper.connect", etc) > > > > > > > > > > > > > > * The other controllers configuration (via > > > > > > > ApiVersionsResponse) > > > > > > > > > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 22. Since we will need the fencing anyways as a > > > > > safe-guard, > > > > > > > > then > > > > > > > > > I > > > > > > > > > > > > agree > > > > > > > > > > > > > > would could skip the registration of KRaft > brokers > > > in ZK > > > > > to > > > > > > > > > simply > > > > > > > > > > > > > things a > > > > > > > > > > > > > > bit. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 27, 2022 at 5:11 PM Jun Rao > > > > > > > > <j...@confluent.io.invalid > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21. Relying upon the presence of ZK configs > to > > > > > > determine > > > > > > > > > > whether > > > > > > > > > > > > the > > > > > > > > > > > > > > > KRaft controller is in a dual write mode seems > a > > > bit > > > > > > error > > > > > > > > > prone. > > > > > > > > > > > If > > > > > > > > > > > > > > > someone accidentally adds a ZK configuration > to a > > > brand > > > > > > new > > > > > > > > > KRaft > > > > > > > > > > > > > > cluster, > > > > > > > > > > > > > > > ideally it shouldn't cause the controller to > get > > > into a > > > > > > > weird > > > > > > > > > > > state. > > > > > > > > > > > > > Have > > > > > > > > > > > > > > > we considered storing the migration state in a > > > metadata > > > > > > > > record? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 22. If we have the broker fencing logic, do we > > > need to > > > > > > > write > > > > > > > > > the > > > > > > > > > > > > broker > > > > > > > > > > > > > > > registration path in ZK for KRaft brokers at > all? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:02 PM David Arthur > > > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21. A KRaft controller will recover the > > > migration > > > > > > > state > > > > > > > > by > > > > > > > > > > > > reading > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > "/migration" ZNode. If the migration enable > > > config is > > > > > > > set, > > > > > > > > > and > > > > > > > > > > > the > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > migration is complete, it will enter the > > > dual-write > > > > > > mode. > > > > > > > > > > Before > > > > > > > > > > > an > > > > > > > > > > > > > > > > operator can decommission ZK, they will need > to > > > > > > finalize > > > > > > > > the > > > > > > > > > > > > > migration > > > > > > > > > > > > > > > > which involves removing the migration config > and > > > the > > > > > ZK > > > > > > > > > config. > > > > > > > > > > > > I'll > > > > > > > > > > > > > > > > clarify this in the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 22. Yea, we could see an incorrect broker ID > > > during > > > > > > that > > > > > > > > > > window. > > > > > > > > > > > > If > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > ended up with a state where we saw a ZK > broker ID > > > > > that > > > > > > > > > > conflicted > > > > > > > > > > > > > with > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > KRaft broker ID, we would need to fence one > of > > > them. > > > > > I > > > > > > > > would > > > > > > > > > > > > probably > > > > > > > > > > > > > > opt > > > > > > > > > > > > > > > > to fence the KRaft broker in that case since > > > broker > > > > > > > > > > registration > > > > > > > > > > > > and > > > > > > > > > > > > > > > > fencing is more robust in KRaft. Hopefully > this > > > is a > > > > > > rare > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 26. Sounds good. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao > > > > > > > > > > <j...@confluent.io.invalid > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few more comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21. Using a tagged field in > > > ApiVersionRequest > > > > > > could > > > > > > > > > work. > > > > > > > > > > > > > Related > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > this, how does a KRaft controller know that > > > it's in > > > > > > the > > > > > > > > > dual > > > > > > > > > > > > write > > > > > > > > > > > > > > > mode? > > > > > > > > > > > > > > > > > Does it need to read the /controller path > from > > > ZK? > > > > > > > After > > > > > > > > > the > > > > > > > > > > > > > > migration, > > > > > > > > > > > > > > > > > people may have the ZK cluster > decommissioned, > > > but > > > > > > > still > > > > > > > > > have > > > > > > > > > > > the > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > configs left in the KRaft controller. Will > this > > > > > cause > > > > > > > the > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > to be stuck because it doesn't know which > mode > > > it > > > > > is > > > > > > > in? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 22. Using the ephemeral node matches the > > > current > > > > > > > ZK-based > > > > > > > > > > > broker > > > > > > > > > > > > > > > behavior > > > > > > > > > > > > > > > > > better. However, it leaves a window for > > > incorrect > > > > > > > broker > > > > > > > > > > > > > registration > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > sneak in during KRaft controller failover. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 26. Then, we could just remove Broker > > > Registration > > > > > in > > > > > > > > that > > > > > > > > > > > > section. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 2:21 PM David > Arthur > > > > > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 20/21 It could definitely cause problems > if > > > we > > > > > > > failover > > > > > > > > > to > > > > > > > > > > a > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > without > "kafka.metadata.migration.enable". > > > The > > > > > only > > > > > > > > > > > mechanism I > > > > > > > > > > > > > > know > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > controllers to learn things about one > > > another is > > > > > > > > > > ApiVersions. > > > > > > > > > > > > We > > > > > > > > > > > > > > > > > currently > > > > > > > > > > > > > > > > > > use this for checking support for > > > > > > "metadata.version" > > > > > > > > (in > > > > > > > > > > > KRaft > > > > > > > > > > > > > > mode). > > > > > > > > > > > > > > > > We > > > > > > > > > > > > > > > > > > could add a "zk.migration" feature flag > > > that's > > > > > > > enabled > > > > > > > > > on a > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > > > if the config is set. Another possibility > > > would > > > > > be > > > > > > a > > > > > > > > > tagged > > > > > > > > > > > > field > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > ApiVersionResponse that indicated if the > > > config > > > > > was > > > > > > > > set. > > > > > > > > > > Both > > > > > > > > > > > > > seem > > > > > > > > > > > > > > > > > somewhat > > > > > > > > > > > > > > > > > > inelegant. I think a tagged field would > be a > > > bit > > > > > > > > simpler > > > > > > > > > > (and > > > > > > > > > > > > > > > arguably > > > > > > > > > > > > > > > > > less > > > > > > > > > > > > > > > > > > hacky). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For 20, we could avoid entering the > migration > > > > > state > > > > > > > > > unless > > > > > > > > > > > the > > > > > > > > > > > > > > whole > > > > > > > > > > > > > > > > > quorum > > > > > > > > > > > > > > > > > > had the field present in their > > > NodeApiVersions. > > > > > For > > > > > > > 21, > > > > > > > > > we > > > > > > > > > > > > could > > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > > > leaving the migration state unless the > whole > > > > > quorum > > > > > > > did > > > > > > > > > not > > > > > > > > > > > > have > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > field > > > > > > > > > > > > > > > > > > in their NodeApiVersions. Do you think > this > > > would > > > > > > be > > > > > > > > > > > > sufficient? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 22. Right, we need to write the broker > info > > > back > > > > > to > > > > > > > ZK > > > > > > > > > just > > > > > > > > > > > as > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > safeguard > > > > > > > > > > > > > > > > > > against incorrect broker IDs getting > > > registered > > > > > > into > > > > > > > > ZK. > > > > > > > > > I > > > > > > > > > > > was > > > > > > > > > > > > > > > thinking > > > > > > > > > > > > > > > > > > these would be persistent nodes, but it's > > > > > probably > > > > > > > fine > > > > > > > > > to > > > > > > > > > > > make > > > > > > > > > > > > > > them > > > > > > > > > > > > > > > > > > ephemeral and have the active KRaft > > > controller > > > > > keep > > > > > > > > them > > > > > > > > > up > > > > > > > > > > > to > > > > > > > > > > > > > > date. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 23. Right. When the broker comes up as a > > > KRaft > > > > > > > broker, > > > > > > > > > it's > > > > > > > > > > > old > > > > > > > > > > > > > > > > > > /brokers/ids ZNode will be gone and it > will > > > > > > register > > > > > > > > > itself > > > > > > > > > > > > with > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > controller. The controller will know > that it > > > is > > > > > now > > > > > > > in > > > > > > > > > > KRaft > > > > > > > > > > > > mode > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > stop sending it the old RPCs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 24. Ok, I'll add these > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 25. I realize now the "/controller_epoch" > > > node is > > > > > > > > already > > > > > > > > > > > > > > persistent. > > > > > > > > > > > > > > > > It > > > > > > > > > > > > > > > > > > should be sufficient to remove the > > > "/controller" > > > > > > node > > > > > > > > to > > > > > > > > > > > > trigger > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > election. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 26. Hmm, not sure, but I don't think it > uses > > > > > > watches. > > > > > > > > > > > > KafkaServer > > > > > > > > > > > > > > > > > registers > > > > > > > > > > > > > > > > > > the broker info and later loads all > brokers > > > in > > > > > the > > > > > > > > > cluster > > > > > > > > > > to > > > > > > > > > > > > > check > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > the endpoints don't conflict. Things > that do > > > use > > > > > > > > watches > > > > > > > > > > are > > > > > > > > > > > > > > dynamic > > > > > > > > > > > > > > > > > > configs, ACLs, and some others (listed > in the > > > > > KIP). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 4:26 PM David > Arthur > > > < > > > > > > > > > > > > > > > > david.art...@confluent.io> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Luke and Andrew, thanks for taking a > look! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think the names of the state machine > in > > > the > > > > > KIP > > > > > > > > > aren't > > > > > > > > > > > the > > > > > > > > > > > > > > best. > > > > > > > > > > > > > > > > I'll > > > > > > > > > > > > > > > > > > > try to improve that section in general. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. "MigrationReady" is probably better > > > named > > > > > > > > > > > > "MigratingFromZk" > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > > > something. It's meant to be the state > when > > > the > > > > > > > KRaft > > > > > > > > > > > > controller > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > actively > > > > > > > > > > > > > > > > > > > migrating the data out of ZK. This > happens > > > > > after > > > > > > we > > > > > > > > > have > > > > > > > > > > > > > detected > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > cluster is eligible and before we > enter the > > > > > dual > > > > > > > > write > > > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. "MigrationActive" should probably be > > > called > > > > > > > > > > > > > "BrokerMigration". > > > > > > > > > > > > > > > > > > > Transitioning to "MigrationFinished" > can > > > happen > > > > > > > > > > > automatically > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > known ZK brokers have been migrated. > Since > > > we > > > > > > don't > > > > > > > > > have > > > > > > > > > > > > > > permanent > > > > > > > > > > > > > > > > > > > registrations of ZK brokers, we can > use the > > > > > > > partition > > > > > > > > > > > > > assignments > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > proxy for this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. A metric for unready brokers makes > > > sense. We > > > > > > can > > > > > > > > > also > > > > > > > > > > > log > > > > > > > > > > > > a > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > the controller when it tries to start > the > > > > > > migration > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. The MetadataType metric is meant to > > > help see > > > > > > > this. > > > > > > > > > > E.g., > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > brokers > > > > > > > > > > > > > > > > > > > would have MetadataType=ZK, some would > have > > > > > > > > > > > MetadataType=Dual > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. On ZK brokers, not having this > config > > > set > > > > > > would > > > > > > > > > > prevent > > > > > > > > > > > > the > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > > > registration data from being written > to ZK. > > > > > This > > > > > > > > means > > > > > > > > > > the > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > > won't send RPCs to it. If a broker has > been > > > > > > > migrated > > > > > > > > to > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > already, > > > > > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > > > > > > not sure there is any harm in removing > this > > > > > > config. > > > > > > > > If > > > > > > > > > we > > > > > > > > > > > > > decide > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > need to guarantee that KRaft brokers > have > > > this > > > > > > > config > > > > > > > > > set > > > > > > > > > > > > > during > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > migration, we can include it in the > broker > > > > > > > > registration > > > > > > > > > > > > that's > > > > > > > > > > > > > > sent > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > KRaft. This would let the controller > keep > > > that > > > > > > > broker > > > > > > > > > > > fenced. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. Once the KRaft controller takes > > > leadership, > > > > > > the > > > > > > > ZK > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > won't > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > active any more and will stop > reporting the > > > > > > > > > > MigrationState > > > > > > > > > > > > > > metric. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7. The "Dual" MetadataType is reported > by > > > > > brokers > > > > > > > > > running > > > > > > > > > > > in > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > mode > > > > > > > > > > > > > > > > > > > when the migration enable config is > set. I > > > > > think > > > > > > > the > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > should > > > > > > > > > > > > > > > > > > also > > > > > > > > > > > > > > > > > > > report this, not just the brokers. I'll > > > clarify > > > > > > in > > > > > > > > the > > > > > > > > > > KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > How will the code get all the > ZooKeeper > > > > > config? > > > > > > > > Will > > > > > > > > > it > > > > > > > > > > > do > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > sort > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > scan of the ZooKeeper data store? ... > Do we > > > > > need > > > > > > to > > > > > > > > do > > > > > > > > > > > > anything > > > > > > > > > > > > > > > > special > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > make sure we get all data from > ZooKeeper? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With a few exceptions, all data > written to > > > ZK > > > > > by > > > > > > > > Kafka > > > > > > > > > > > > happens > > > > > > > > > > > > > on > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > controller (single writer principle). > In > > > our > > > > > > > > migration > > > > > > > > > > > code, > > > > > > > > > > > > we > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > unconditionally update the > "/controller" > > > and > > > > > > > > > > > > > "/controller_epoch" > > > > > > > > > > > > > > > > ZNodes > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > effectively force the migration > component > > > to > > > > > gain > > > > > > > > > > > leadership > > > > > > > > > > > > of > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > > controller. Once this happens, we don't > > > expect > > > > > > any > > > > > > > > data > > > > > > > > > > to > > > > > > > > > > > be > > > > > > > > > > > > > > > written > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > ZK, so we can read it iteratively > without > > > > > > worrying > > > > > > > > > about > > > > > > > > > > > any > > > > > > > > > > > > > > > > concurrent > > > > > > > > > > > > > > > > > > > writes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As for the linearizable thing, I > believe > > > this > > > > > > just > > > > > > > > > means > > > > > > > > > > > that > > > > > > > > > > > > > > reads > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > served by a quorum follower which has > stale > > > > > data. > > > > > > > > We'll > > > > > > > > > > be > > > > > > > > > > > > > > reading > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > > the same way the ZK controller does, > so I > > > think > > > > > > we > > > > > > > > will > > > > > > > > > > be > > > > > > > > > > > > fine > > > > > > > > > > > > > > > > > regarding > > > > > > > > > > > > > > > > > > > consistency. If we wanted to be extra > > > careful, > > > > > we > > > > > > > > could > > > > > > > > > > > add a > > > > > > > > > > > > > > delay > > > > > > > > > > > > > > > > > prior > > > > > > > > > > > > > > > > > > > to iterating through the znodes to give > > > > > > partitioned > > > > > > > > ZK > > > > > > > > > > > > > followers > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > chance > > > > > > > > > > > > > > > > > > > to get kicked out of the quorum. I > don't > > > think > > > > > > > we'll > > > > > > > > > need > > > > > > > > > > > > that > > > > > > > > > > > > > > > though > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What happens if we commit the > transaction > > > > > then > > > > > > > fail > > > > > > > > > > right > > > > > > > > > > > > > after > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > restart? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we commit the "migration" > transaction > > > to the > > > > > > > > > metadata > > > > > > > > > > > log, > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > won't > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > a problem if we failover or restart. > We can > > > > > > recover > > > > > > > > our > > > > > > > > > > > state > > > > > > > > > > > > > > based > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > metadata log and what exists in ZK. If > we > > > see a > > > > > > > > > migration > > > > > > > > > > > > > > > transaction > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > the kraft log, but no "/migration" > ZNode, > > > we'll > > > > > > > know > > > > > > > > we > > > > > > > > > > > > failed > > > > > > > > > > > > > > > before > > > > > > > > > > > > > > > > > > > writing to ZK. If we see a partial > > > transaction > > > > > in > > > > > > > the > > > > > > > > > > log, > > > > > > > > > > > > then > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > abort it and restart the migration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This sort of leads to me wondering > if we > > > > > > > > will/should > > > > > > > > > > > check > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > metadata > > > > > > > > > > > > > > > > > > > log state before doing the migration? > That > > > > > could > > > > > > > also > > > > > > > > > > catch > > > > > > > > > > > > > > > mistakes > > > > > > > > > > > > > > > > > such > > > > > > > > > > > > > > > > > > > as if a KRaft quorum is used that > already > > > had > > > > > > some > > > > > > > > > > metadata > > > > > > > > > > > > > > which I > > > > > > > > > > > > > > > > > > assume > > > > > > > > > > > > > > > > > > > we want to prevent. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, we should check that the metadata > log > > > is > > > > > > empty > > > > > > > > > > before > > > > > > > > > > > > > > > > attempting a > > > > > > > > > > > > > > > > > > > migration (perhaps excepting the > bootstrap > > > > > > metadata > > > > > > > > -- > > > > > > > > > > need > > > > > > > > > > > > to > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > that one still). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 24, 2022 at 8:57 AM Andrew > > > Grant > > > > > > > > > > > > > > > > > <agr...@confluent.io.invalid > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hey David, > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> Thanks for the KIP. I had a few small > > > > > questions. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> “The ZK data migration will copy the > > > existing > > > > > ZK > > > > > > > > data > > > > > > > > > > into > > > > > > > > > > > > the > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> metadata log and establish the new > KRaft > > > > > active > > > > > > > > > > controller > > > > > > > > > > > > as > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > active > > > > > > > > > > > > > > > > > > >> controller from a ZK perspective.” > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> How will the code get all the > ZooKeeper > > > > > config? > > > > > > > Will > > > > > > > > > it > > > > > > > > > > do > > > > > > > > > > > > > some > > > > > > > > > > > > > > > sort > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > >> scan of the ZooKeeper data store? > Also I’m > > > > > not a > > > > > > > > > > ZooKeeper > > > > > > > > > > > > > > expert > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > > per > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://zookeeper.apache.org/doc/current/zookeeperInternals.html > > > > > > > > > > > > > > > > > “Read > > > > > > > > > > > > > > > > > > >> operations in ZooKeeper are *not > > > linearizable* > > > > > > > since > > > > > > > > > > they > > > > > > > > > > > > can > > > > > > > > > > > > > > > return > > > > > > > > > > > > > > > > > > >> potentially stale data.” Do we need > to do > > > > > > anything > > > > > > > > > > special > > > > > > > > > > > > to > > > > > > > > > > > > > > make > > > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > >> get all data from ZooKeeper? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> “For the initial migration, the > controller > > > > > will > > > > > > > > > utilize > > > > > > > > > > > > > KIP-868 > > > > > > > > > > > > > > > > > Metadata > > > > > > > > > > > > > > > > > > >> Transactions > > > > > > > > > > > > > > > > > > >> < > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> write all of the ZK metadata in a > single > > > > > > > > transaction. > > > > > > > > > If > > > > > > > > > > > the > > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > >> fails before this transaction is > > > finalized, > > > > > the > > > > > > > next > > > > > > > > > > > active > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > >> will > > > > > > > > > > > > > > > > > > >> abort the transaction and restart the > > > > > migration > > > > > > > > > > process.” > > > > > > > > > > > > What > > > > > > > > > > > > > > > > happens > > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > >> we commit the transaction then fail > right > > > > > after > > > > > > > that > > > > > > > > > and > > > > > > > > > > > > > > restart? > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > >> sort > > > > > > > > > > > > > > > > > > >> of leads to me wondering if we > will/should > > > > > check > > > > > > > the > > > > > > > > > > > > metadata > > > > > > > > > > > > > > log > > > > > > > > > > > > > > > > > state > > > > > > > > > > > > > > > > > > >> before doing the migration? That could > > > also > > > > > > catch > > > > > > > > > > mistakes > > > > > > > > > > > > > such > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > >> KRaft quorum is used that already had > some > > > > > > > metadata > > > > > > > > > > which > > > > > > > > > > > I > > > > > > > > > > > > > > assume > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > want > > > > > > > > > > > > > > > > > > >> to prevent. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> For the Test Plan section do you think > > > it’s > > > > > > worth > > > > > > > > > > calling > > > > > > > > > > > > out > > > > > > > > > > > > > > > > > > performance > > > > > > > > > > > > > > > > > > >> testing migrations of ZooKeeper > > > deployments > > > > > that > > > > > > > > have > > > > > > > > > > > > “large” > > > > > > > > > > > > > > > > amounts > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > >> metadata? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> Thanks, > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> Andrew > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> On Mon, Oct 24, 2022 at 3:20 AM Luke > Chen > > > < > > > > > > > > > > > > show...@gmail.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > Hi David, > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > Thanks for the good and complicated > > > > > proposal! > > > > > > :) > > > > > > > > > > > > > > > > > > >> > Some questions: > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 1. "MigrationReady" state: The KIP > said: > > > > > > > > > > > > > > > > > > >> > The KRaft quorum has been started > > > > > > > > > > > > > > > > > > >> > I don't think we'll automatically > enter > > > this > > > > > > > state > > > > > > > > > > after > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > quorum > > > > > > > > > > > > > > > > > > >> > started, do we? > > > > > > > > > > > > > > > > > > >> > I think KRaft active controller > should > > > take > > > > > > over > > > > > > > > > > > > leadership > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > > > writing a > > > > > > > > > > > > > > > > > > >> > znode in /controller path before we > > > entering > > > > > > > this > > > > > > > > > > sate, > > > > > > > > > > > is > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > correct? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 2. "MigrationActive" state: the KIP > > > said: > > > > > > > > > > > > > > > > > > >> > ZK state has been migrated, > controller > > > is in > > > > > > > > > > dual-write > > > > > > > > > > > > > mode, > > > > > > > > > > > > > > > > > brokers > > > > > > > > > > > > > > > > > > >> are > > > > > > > > > > > > > > > > > > >> > being restarted in KRaft mode > > > > > > > > > > > > > > > > > > >> > What confuses me is the last part: > > > "brokers > > > > > > are > > > > > > > > > being > > > > > > > > > > > > > > restarted > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> > mode". > > > > > > > > > > > > > > > > > > >> > How could we detect brokers are > being > > > > > > restarted > > > > > > > in > > > > > > > > > > KRaft > > > > > > > > > > > > > mode? > > > > > > > > > > > > > > > Old > > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > >> > broker is removed and new KRaft > broker > > > is up > > > > > > > > within > > > > > > > > > N > > > > > > > > > > > > > minutes? > > > > > > > > > > > > > > > > > > >> > I think we don't have to rely on the > > > > > condition > > > > > > > > > > "brokers > > > > > > > > > > > > are > > > > > > > > > > > > > > > being > > > > > > > > > > > > > > > > > > >> restarted > > > > > > > > > > > > > > > > > > >> > in KRaft mode" to enter this state. > > > > > > > > > > > > > > > > > > >> > "brokers are being restarted in > KRaft > > > mode" > > > > > > > should > > > > > > > > > be > > > > > > > > > > a > > > > > > > > > > > > > > process > > > > > > > > > > > > > > > > > > happened > > > > > > > > > > > > > > > > > > >> > between "MigrationActive" and > > > > > > > "MigrationFinished". > > > > > > > > > > Does > > > > > > > > > > > > that > > > > > > > > > > > > > > > make > > > > > > > > > > > > > > > > > > sense? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 3. When "Zookeeper" mode trying to > enter > > > > > > > > > > > > > "MigrationEligible", > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > there > > > > > > > > > > > > > > > > > > >> is a > > > > > > > > > > > > > > > > > > >> > cluster with tens of brokers, how > could > > > > > users > > > > > > > know > > > > > > > > > why > > > > > > > > > > > > this > > > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > > > >> cannot > > > > > > > > > > > > > > > > > > >> > be in "MigrationEligible" state, > yet? > > > Check > > > > > > that > > > > > > > > > znode > > > > > > > > > > > > > > manually > > > > > > > > > > > > > > > > one > > > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > > > >> one? > > > > > > > > > > > > > > > > > > >> > Or do we plan to create a tool to > help > > > them? > > > > > > Or > > > > > > > > > maybe > > > > > > > > > > > > expose > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> "unready > > > > > > > > > > > > > > > > > > >> > ZK brokers" metrics? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 4. Same for "MigrationActive" > entering > > > > > > > > > > > "MigrationFinished" > > > > > > > > > > > > > > > state. > > > > > > > > > > > > > > > > > > Since > > > > > > > > > > > > > > > > > > >> > that will be some process for > > > restarting ZK > > > > > > > broker > > > > > > > > > > into > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > > >> one by > > > > > > > > > > > > > > > > > > >> > one, could we expose the "remaining > ZK > > > > > > brokers" > > > > > > > as > > > > > > > > > > > > metrics? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 5. When users are in > > > > > > > > > > > > > > > > > > >> > > > > > > > > "MigrationReady"/"MigrationActive"/"MigrationFinished" > > > > > > > > > > > > > > > > > > >> > states, and they accidentally > change the > > > > > KRaft > > > > > > > > > > > controller > > > > > > > > > > > > > > > config: > > > > > > > > > > > > > > > > > > >> > "kafka.metadata.migration.enable" > > > > > > > > > > > > > > > > > > >> > to false. What will happen to this > > > cluster? > > > > > No > > > > > > > > > > > dual-write > > > > > > > > > > > > > for > > > > > > > > > > > > > > > it? > > > > > > > > > > > > > > > > > Will > > > > > > > > > > > > > > > > > > >> we > > > > > > > > > > > > > > > > > > >> > have any protection for it? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 6. About the "MigrationState" > metric: > > > > > > > > > > > > > > > > > > >> > The "ZooKeeper" and > "MigrationEligible" > > > is > > > > > > > > reported > > > > > > > > > by > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > controller, > > > > > > > > > > > > > > > > > > >> and > > > > > > > > > > > > > > > > > > >> > the rest of states are reported by > KRaft > > > > > > > > controller > > > > > > > > > > --> > > > > > > > > > > > > > makes > > > > > > > > > > > > > > > > sense > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > >> me. > > > > > > > > > > > > > > > > > > >> > One question from it is, when KRaft > > > > > controller > > > > > > > > takes > > > > > > > > > > > over > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > leadership > > > > > > > > > > > > > > > > > > >> > from ZK controller, what will the > > > > > > > "MigrationState" > > > > > > > > > > value > > > > > > > > > > > > in > > > > > > > > > > > > > > old > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > >> > controller? keep in > "MigrationEligible" > > > > > > doesn't > > > > > > > > make > > > > > > > > > > > > sense. > > > > > > > > > > > > > > Will > > > > > > > > > > > > > > > > > there > > > > > > > > > > > > > > > > > > >> be a > > > > > > > > > > > > > > > > > > >> > empty or null state? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > 7. About the "MetadataType" metric: > > > > > > > > > > > > > > > > > > >> > An enumeration of: ZooKeeper (1), > Dual > > > (2), > > > > > > > KRaft > > > > > > > > > (3). > > > > > > > > > > > > Each > > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > > >> reports > > > > > > > > > > > > > > > > > > >> > this. > > > > > > > > > > > > > > > > > > >> > I don't know how we could map the > > > migration > > > > > > > state > > > > > > > > to > > > > > > > > > > > > these 3 > > > > > > > > > > > > > > > > types. > > > > > > > > > > > > > > > > > > >> > What is the metadataType when > cluster in > > > > > > > > > > > "MigrationReady" > > > > > > > > > > > > > > state? > > > > > > > > > > > > > > > > > Still > > > > > > > > > > > > > > > > > > >> > Zookeeper? > > > > > > > > > > > > > > > > > > >> > When will brokers enter Dual type? > > > > > > > > > > > > > > > > > > >> > This is unclear in the KIP. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > Thank you. > > > > > > > > > > > > > > > > > > >> > Luke > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > On Thu, Oct 20, 2022 at 11:33 PM > David > > > > > Arthur > > > > > > > > > > > > > > > > > > >> > <david.art...@confluent.io.invalid> > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > Igor, thanks for taking a look! > Since > > > JBOD > > > > > > in > > > > > > > > > KRaft > > > > > > > > > > is > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > under > > > > > > > > > > > > > > > > > > >> > > discussion and not likely to land > > > before > > > > > the > > > > > > > ZK > > > > > > > > > > > > > migration, I > > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > > >> we'll > > > > > > > > > > > > > > > > > > >> > > need to defer it. For migrating > JBOD > > > > > > clusters > > > > > > > > from > > > > > > > > > > ZK > > > > > > > > > > > to > > > > > > > > > > > > > > > KRaft, > > > > > > > > > > > > > > > > > > we'll > > > > > > > > > > > > > > > > > > >> > also > > > > > > > > > > > > > > > > > > >> > > need to address the log dir > failure > > > > > > mechanism > > > > > > > > > which > > > > > > > > > > > > > > currently > > > > > > > > > > > > > > > > > uses a > > > > > > > > > > > > > > > > > > >> > > special ZNode written to by the > > > brokers. > > > > > > There > > > > > > > > is > > > > > > > > > an > > > > > > > > > > > old > > > > > > > > > > > > > KIP > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > > > > > > > > > > > > > > > > > >> > > which proposes a new RPC to move > that > > > ZK > > > > > > write > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > > > > > > > controller, > > > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > > >> I'm > > > > > > > > > > > > > > > > > > >> > > not sure if that's the approach > we'll > > > want > > > > > > to > > > > > > > > > take. > > > > > > > > > > I > > > > > > > > > > > > read > > > > > > > > > > > > > > > > through > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > discussion over in KIP-858 and it > > > sounds > > > > > > like > > > > > > > > > there > > > > > > > > > > > are > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > good > > > > > > > > > > > > > > > > > > >> ideas > > > > > > > > > > > > > > > > > > >> > > there. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > To answer your question more > directly, > > > > > > > migrating > > > > > > > > > ZK > > > > > > > > > > > > > clusters > > > > > > > > > > > > > > > > using > > > > > > > > > > > > > > > > > > >> JBOD > > > > > > > > > > > > > > > > > > >> > is > > > > > > > > > > > > > > > > > > >> > > out of scope for this KIP. It > might be > > > > > > > possible > > > > > > > > to > > > > > > > > > > > stop > > > > > > > > > > > > > > using > > > > > > > > > > > > > > > > JBOD > > > > > > > > > > > > > > > > > > >> using > > > > > > > > > > > > > > > > > > >> > > reassignments, but I'm not sure > about > > > > > that. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > -David > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > On Tue, Oct 18, 2022 at 12:17 PM > Igor > > > > > > Soarez < > > > > > > > > > > > > i...@soarez.me > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > Hi David, > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > Thanks for the KIP, this is very > > > > > exciting! > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > How does JBOD relate to this > work? > > > KRaft > > > > > > > mode > > > > > > > > > > > doesn't > > > > > > > > > > > > > yet > > > > > > > > > > > > > > > > > support > > > > > > > > > > > > > > > > > > >> > > > configuring brokers with > multiple > > > log > > > > > > > > > directories. > > > > > > > > > > > If > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > brokers > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> > the > > > > > > > > > > > > > > > > > > >> > > > existing cluster are configured > with > > > > > > > multiple > > > > > > > > > log > > > > > > > > > > > > dirs, > > > > > > > > > > > > > > does > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > migration > > > > > > > > > > > > > > > > > > >> > > > imply that the existing brokers > > > need to > > > > > > drop > > > > > > > > use > > > > > > > > > > of > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > feature? > > > > > > > > > > > > > > > > > > >> Or is > > > > > > > > > > > > > > > > > > >> > > > there some way to upgrade them > > > later? > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > Thanks, > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > -- > > > > > > > > > > > > > > > > > > >> > > > Igor > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > On Mon, Oct 17, 2022, at 10:07 > PM, > > > David > > > > > > > > Arthur > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > > I've updated the KIP with the > > > > > following > > > > > > > > > changes > > > > > > > > > > > (the > > > > > > > > > > > > > > > > > confluence > > > > > > > > > > > > > > > > > > >> diff > > > > > > > > > > > > > > > > > > >> > is > > > > > > > > > > > > > > > > > > >> > > > not > > > > > > > > > > > > > > > > > > >> > > > > helpful here since i > rearranged > > > some > > > > > > > things) > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > * Added > > > ZooKeeperBlockingKRaftMillis > > > > > > > metric > > > > > > > > > > > > > > > > > > >> > > > > * Added section on new broker > > > > > > registration > > > > > > > > > JSON > > > > > > > > > > > > > > > > > > >> > > > > * Removed section on > > > MigrationCheck > > > > > RPC > > > > > > > > > > > > > > > > > > >> > > > > * Added change to > > > > > UpdateMetadataRequest > > > > > > > > > > > > > > > > > > >> > > > > * Added section "Additional ZK > > > Broker > > > > > > > > Configs" > > > > > > > > > > > > > (includes > > > > > > > > > > > > > > > > > configs > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> > > > connect > > > > > > > > > > > > > > > > > > >> > > > > to KRaft quorum) > > > > > > > > > > > > > > > > > > >> > > > > * Added section on > "Incompatible > > > > > > Brokers" > > > > > > > > > under > > > > > > > > > > > > > Failure > > > > > > > > > > > > > > > > Modes > > > > > > > > > > > > > > > > > > >> > > > > * Clarified many things per > this > > > > > > > discussion > > > > > > > > > > thread > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > I realized we need the KRaft > > > > > controller > > > > > > to > > > > > > > > > pick > > > > > > > > > > > the > > > > > > > > > > > > > > > correct > > > > > > > > > > > > > > > > > > >> > > > > "metadata.version" when > > > initializing > > > > > the > > > > > > > > > > > migration. > > > > > > > > > > > > I > > > > > > > > > > > > > > > > included > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > IBP > > > > > > > > > > > > > > > > > > >> > > > of a > > > > > > > > > > > > > > > > > > >> > > > > broker in its registration > data > > > so the > > > > > > > KRaft > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > verify > > > > > > > > > > > > > > > > > > >> > the > > > > > > > > > > > > > > > > > > >> > > > IBP > > > > > > > > > > > > > > > > > > >> > > > > and pick the correct > > > > > "metadata.version" > > > > > > > when > > > > > > > > > > > > starting > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> migration. > > > > > > > > > > > > > > > > > > >> > > > > Otherwise, we could > inadvertently > > > > > > > downgrade > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> IBP/metadata.version > > > > > > > > > > > > > > > > > > >> > as > > > > > > > > > > > > > > > > > > >> > > > > part of the migration. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > I also added "clusterId" to > the > > > broker > > > > > > > > > > > registration > > > > > > > > > > > > > data > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> KRaft > > > > > > > > > > > > > > > > > > >> > > > could > > > > > > > > > > > > > > > > > > >> > > > > verify it. > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > -David > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > On Mon, Oct 17, 2022 at 12:14 > PM > > > Colin > > > > > > > > McCabe > > > > > > > > > < > > > > > > > > > > > > > > > > > > cmcc...@apache.org > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > >> On Fri, Oct 14, 2022, at > 11:39, > > > Jun > > > > > Rao > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > >> > Hi, Colin, > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > > >> > > > >> > 10. "That all goes away in > the > > > new > > > > > > > mode, > > > > > > > > > and > > > > > > > > > > we > > > > > > > > > > > > > just > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > > >> > code > > > > > > > > > > > > > > > > > > >> > > > which > > > > > > > > > > > > > > > > > > >> > > > >> > analyzes > __cluster_metadata and > > > > > > > reflects > > > > > > > > it > > > > > > > > > > in > > > > > > > > > > > 1) > > > > > > > > > > > > > > > updates > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > >> > and > > > > > > > > > > > > > > > > > > >> > > 2) > > > > > > > > > > > > > > > > > > >> > > > >> > messages sent out to > brokers." > > > > > > > > > > > > > > > > > > >> > > > >> > Hmm, I am not sure it's > that > > > > > simple. > > > > > > > Some > > > > > > > > > of > > > > > > > > > > > the > > > > > > > > > > > > > > > > complexity > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > >> the > > > > > > > > > > > > > > > > > > >> > > > >> ZK-based > > > > > > > > > > > > > > > > > > >> > > > >> > controller are (1) using > the > > > > > > pipelining > > > > > > > > > > > approach > > > > > > > > > > > > to > > > > > > > > > > > > > > > write > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > >> > for > > > > > > > > > > > > > > > > > > >> > > > >> better > > > > > > > > > > > > > > > > > > >> > > > >> > throughput and using > > > conditional > > > > > > writes > > > > > > > > for > > > > > > > > > > > > > > > correctness; > > > > > > > > > > > > > > > > > (2) > > > > > > > > > > > > > > > > > > >> > sending > > > > > > > > > > > > > > > > > > >> > > > the > > > > > > > > > > > > > > > > > > >> > > > >> > proper LeaderAndIsr and > > > > > > UpdateMetadata > > > > > > > > > > > requests. > > > > > > > > > > > > > For > > > > > > > > > > > > > > > > > example, > > > > > > > > > > > > > > > > > > >> > during > > > > > > > > > > > > > > > > > > >> > > > >> > controller failover, the > full > > > > > > metadata > > > > > > > > > needs > > > > > > > > > > to > > > > > > > > > > > > be > > > > > > > > > > > > > > sent > > > > > > > > > > > > > > > > > while > > > > > > > > > > > > > > > > > > >> > during > > > > > > > > > > > > > > > > > > >> > > > >> > individual broker failure, > only > > > > > some > > > > > > of > > > > > > > > the > > > > > > > > > > > > > metadata > > > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > >> be > > > > > > > > > > > > > > > > > > >> > > > updated. > > > > > > > > > > > > > > > > > > >> > > > >> > The controlled shutdown > > > handling > > > > > > > > sometimes > > > > > > > > > > uses > > > > > > > > > > > > > > > > > > >> StopReplicaRequest > > > > > > > > > > > > > > > > > > >> > > > and > > > > > > > > > > > > > > > > > > >> > > > >> > some other times uses > > > > > > > > LeaderAndIsrRequest. > > > > > > > > > > (3) > > > > > > > > > > > > > > > triggering > > > > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > > >> > events > > > > > > > > > > > > > > > > > > >> > > > >> based > > > > > > > > > > > > > > > > > > >> > > > >> > on the responses of > > > LeaderAndIsr > > > > > > (e.g. > > > > > > > > for > > > > > > > > > > > topic > > > > > > > > > > > > > > > > deletion). > > > > > > > > > > > > > > > > > > >> Some > > > > > > > > > > > > > > > > > > >> > of > > > > > > > > > > > > > > > > > > >> > > > those > > > > > > > > > > > > > > > > > > >> > > > >> > complexity could be > > > re-implemented > > > > > > in a > > > > > > > > > more > > > > > > > > > > > > > > efficient > > > > > > > > > > > > > > > > way, > > > > > > > > > > > > > > > > > > >> but we > > > > > > > > > > > > > > > > > > >> > > > need > > > > > > > > > > > > > > > > > > >> > > > >> to > > > > > > > > > > > > > > > > > > >> > > > >> > be really careful not to > > > generate > > > > > > > > > regression. > > > > > > > > > > > > Some > > > > > > > > > > > > > of > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > > >> > > > >> complexity > > > > > > > > > > > > > > > > > > >> > > > >> > just won't go away. > > > Reimplementing > > > > > > all > > > > > > > > > those > > > > > > > > > > > > logic > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > 30 > > > > > > > > > > > > > > > > > > >> or > > > > > > > > > > > > > > > > > > >> > so > > > > > > > > > > > > > > > > > > >> > > > >> events > > > > > > > > > > > > > > > > > > >> > > > >> > in the ZK-based controller > is > > > > > > possible, > > > > > > > > but > > > > > > > > > > > > seems a > > > > > > > > > > > > > > bit > > > > > > > > > > > > > > > > > > >> daunting > > > > > > > > > > > > > > > > > > >> > and > > > > > > > > > > > > > > > > > > >> > > > >> risky. > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> Hi Jun, > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> Thanks for the response. I > agree > > > > > that > > > > > > > > there > > > > > > > > > is > > > > > > > > > > > > some > > > > > > > > > > > > > > work > > > > > > > > > > > > > > > > > here > > > > > > > > > > > > > > > > > > >> but I > > > > > > > > > > > > > > > > > > >> > > > don't > > > > > > > > > > > > > > > > > > >> > > > >> think it's as bad as it might > > > seem. > > > > > > Most > > > > > > > of > > > > > > > > > the > > > > > > > > > > > > code > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > writing > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> > ZK > > > > > > > > > > > > > > > > > > >> > > > can > > > > > > > > > > > > > > > > > > >> > > > >> be reused from the old > > > controller. We > > > > > > > > should > > > > > > > > > at > > > > > > > > > > > > least > > > > > > > > > > > > > > > > > evaluate > > > > > > > > > > > > > > > > > > >> using > > > > > > > > > > > > > > > > > > >> > > the > > > > > > > > > > > > > > > > > > >> > > > >> old ControllerChannelManager > as > > > > > well. I > > > > > > > > don't > > > > > > > > > > > know > > > > > > > > > > > > if > > > > > > > > > > > > > > > we'll > > > > > > > > > > > > > > > > > end > > > > > > > > > > > > > > > > > > >> up > > > > > > > > > > > > > > > > > > >> > > > doing it > > > > > > > > > > > > > > > > > > >> > > > >> or not but it's a > possibility. > > > (The > > > > > > main > > > > > > > > > reason > > > > > > > > > > > to > > > > > > > > > > > > > not > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > >> that > > > > > > > > > > > > > > > > > > >> > > the > > > > > > > > > > > > > > > > > > >> > > > >> response handling will be a > bit > > > > > > > different) > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> The question of what to do > > > during a > > > > > > > > > controller > > > > > > > > > > > > > failover > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > >> > > > interesting > > > > > > > > > > > > > > > > > > >> > > > >> one. Technically, we should > be > > > able > > > > > to > > > > > > > > > continue > > > > > > > > > > > > > sending > > > > > > > > > > > > > > > > > > >> incremental > > > > > > > > > > > > > > > > > > >> > > > updates > > > > > > > > > > > > > > > > > > >> > > > >> to the legacy nodes, for the > same > > > > > > reason > > > > > > > we > > > > > > > > > can > > > > > > > > > > > in > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > >> > > However, > > > > > > > > > > > > > > > > > > >> > > > >> probably we should just copy > > > what ZK > > > > > > mode > > > > > > > > > does > > > > > > > > > > > and > > > > > > > > > > > > > send > > > > > > > > > > > > > > > > them > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > >> full > > > > > > > > > > > > > > > > > > >> > > > >> metadata update. This will > allow > > > us > > > > > to > > > > > > > have > > > > > > > > > an > > > > > > > > > > > easy > > > > > > > > > > > > > way > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > handle > > > > > > > > > > > > > > > > > > >> > > > >> divergences caused by lost > RPCs > > > by > > > > > > > changing > > > > > > > > > the > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > (just > > > > > > > > > > > > > > > > > > >> as > > > > > > > > > > > > > > > > > > >> > we > > > > > > > > > > > > > > > > > > >> > > > do > > > > > > > > > > > > > > > > > > >> > > > >> in ZK mode, unfortunately). I > > > suppose > > > > > > we > > > > > > > > > should > > > > > > > > > > > > > > document > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> the > > > > > > > > > > > > > > > > > > >> > > > KIP... > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> I agree controlled shutdown > is > > > tricky > > > > > > to > > > > > > > > get > > > > > > > > > > just > > > > > > > > > > > > > > right. > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > > >> suppose > > > > > > > > > > > > > > > > > > >> > > this > > > > > > > > > > > > > > > > > > >> > > > is > > > > > > > > > > > > > > > > > > >> > > > >> a case where the RPCs we send > > > out are > > > > > > not > > > > > > > > > > purely > > > > > > > > > > > > > "fire > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > >> forget"; > > > > > > > > > > > > > > > > > > >> > we > > > > > > > > > > > > > > > > > > >> > > > have > > > > > > > > > > > > > > > > > > >> > > > >> to listen for the response. > But > > > that > > > > > > can > > > > > > > be > > > > > > > > > > done > > > > > > > > > > > in > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > >> event-based > > > > > > > > > > > > > > > > > > >> > > way. > > > > > > > > > > > > > > > > > > >> > > > >> Controlled shutdown is also > > > probably > > > > > > the > > > > > > > > last > > > > > > > > > > > thing > > > > > > > > > > > > > > we'll > > > > > > > > > > > > > > > > > > >> implement > > > > > > > > > > > > > > > > > > >> > > > once we > > > > > > > > > > > > > > > > > > >> > > > >> have the basic lift and > shift. > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> Topic deletion is another > > > annoying > > > > > > > thing. I > > > > > > > > > > > wonder > > > > > > > > > > > > if > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > > >> just > > > > > > > > > > > > > > > > > > >> > > > delete > > > > > > > > > > > > > > > > > > >> > > > >> immediately here and skip the > > > > > > complexity > > > > > > > of > > > > > > > > > > > > > > implementing > > > > > > > > > > > > > > > > > > >> "deleting > > > > > > > > > > > > > > > > > > >> > > > state." > > > > > > > > > > > > > > > > > > >> > > > >> Topic IDs will exist in IBP > 3.4, > > > in > > > > > > both > > > > > > > ZK > > > > > > > > > and > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > mode, > > > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > >> > > > should be > > > > > > > > > > > > > > > > > > >> > > > >> OK, right? > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> best, > > > > > > > > > > > > > > > > > > >> > > > >> Colin > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > > >> > > > >> > Thanks, > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > > >> > > > >> > Jun > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > > >> > > > >> > On Fri, Oct 14, 2022 at > 9:29 AM > > > > > Colin > > > > > > > > > McCabe > > > > > > > > > > < > > > > > > > > > > > > > > > > > > >> cmcc...@apache.org> > > > > > > > > > > > > > > > > > > >> > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> On Thu, Oct 13, 2022, at > > > 11:44, > > > > > Jun > > > > > > > Rao > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > >> >> > Hi, Colin, > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > Thanks for the reply. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > 10. This is a bit on the > > > > > > > > implementation > > > > > > > > > > > side. > > > > > > > > > > > > If > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > look > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > >> > the > > > > > > > > > > > > > > > > > > >> > > > >> existing > > > > > > > > > > > > > > > > > > >> > > > >> >> > ZK-based controller, > most > > > of the > > > > > > > logic > > > > > > > > > is > > > > > > > > > > > > around > > > > > > > > > > > > > > > > > > >> maintaining an > > > > > > > > > > > > > > > > > > >> > > > >> in-memory > > > > > > > > > > > > > > > > > > >> > > > >> >> > state of all the > resources > > > > > > (broker, > > > > > > > > > topic, > > > > > > > > > > > > > > > partition, > > > > > > > > > > > > > > > > > > etc), > > > > > > > > > > > > > > > > > > >> > > > >> >> reading/writing > > > > > > > > > > > > > > > > > > >> > > > >> >> > to ZK, sending > LeaderAndIsr > > > and > > > > > > > > > > > UpdateMetadata > > > > > > > > > > > > > > > > requests > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > >> > > > handling > > > > > > > > > > > > > > > > > > >> > > > >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> > responses to brokers. > So we > > > need > > > > > > all > > > > > > > > > that > > > > > > > > > > > > logic > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > dual > > > > > > > > > > > > > > > > > > >> > write > > > > > > > > > > > > > > > > > > >> > > > >> mode. > > > > > > > > > > > > > > > > > > >> > > > >> >> One > > > > > > > > > > > > > > > > > > >> > > > >> >> > option is to duplicate > all > > > that > > > > > > > logic > > > > > > > > in > > > > > > > > > > > some > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > code. > > > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > >> > can > > > > > > > > > > > > > > > > > > >> > > > be a > > > > > > > > > > > > > > > > > > >> > > > >> bit > > > > > > > > > > > > > > > > > > >> > > > >> >> > error prone and makes > the > > > code a > > > > > > bit > > > > > > > > > > harder > > > > > > > > > > > to > > > > > > > > > > > > > > > > maintain > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > >> we > > > > > > > > > > > > > > > > > > >> > > need > > > > > > > > > > > > > > > > > > >> > > > to > > > > > > > > > > > > > > > > > > >> > > > >> fix > > > > > > > > > > > > > > > > > > >> > > > >> >> > some critical issues in > > > ZK-based > > > > > > > > > > > controllers. > > > > > > > > > > > > > > > Another > > > > > > > > > > > > > > > > > > >> option is > > > > > > > > > > > > > > > > > > >> > > to > > > > > > > > > > > > > > > > > > >> > > > try > > > > > > > > > > > > > > > > > > >> > > > >> >> > reusing the existing > code > > > in the > > > > > > > > > ZK-based > > > > > > > > > > > > > > > controller. > > > > > > > > > > > > > > > > > For > > > > > > > > > > > > > > > > > > >> > > example, > > > > > > > > > > > > > > > > > > >> > > > we > > > > > > > > > > > > > > > > > > >> > > > >> >> could > > > > > > > > > > > > > > > > > > >> > > > >> >> > start the EventManager > in > > > the > > > > > > > ZK-based > > > > > > > > > > > > > controller, > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > let > > > > > > > > > > > > > > > > > > >> the > > > > > > > > > > > > > > > > > > >> > > > KRaft > > > > > > > > > > > > > > > > > > >> > > > >> >> > controller ingest new > > > events. > > > > > This > > > > > > > has > > > > > > > > > its > > > > > > > > > > > own > > > > > > > > > > > > > > > > > challenges: > > > > > > > > > > > > > > > > > > >> (1) > > > > > > > > > > > > > > > > > > >> > > the > > > > > > > > > > > > > > > > > > >> > > > >> >> existing > > > > > > > > > > > > > > > > > > >> > > > >> >> > logic only logs ZK > failures > > > and > > > > > > > > doesn't > > > > > > > > > > > expose > > > > > > > > > > > > > > them > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > caller; > > > > > > > > > > > > > > > > > > >> > > > (2) > > > > > > > > > > > > > > > > > > >> > > > >> >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> > existing logic may add > new > > > > > events > > > > > > to > > > > > > > > the > > > > > > > > > > > queue > > > > > > > > > > > > > > > itself > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > >> > > > probably > > > > > > > > > > > > > > > > > > >> > > > >> >> need > > > > > > > > > > > > > > > > > > >> > > > >> >> > to think through how > this is > > > > > > > > coordinated > > > > > > > > > > > with > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> > > controller; > > > > > > > > > > > > > > > > > > >> > > > >> (3) > > > > > > > > > > > > > > > > > > >> > > > >> >> it > > > > > > > > > > > > > > > > > > >> > > > >> >> > registers some ZK > listeners > > > > > > > > > unnecessarily > > > > > > > > > > > (may > > > > > > > > > > > > > not > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > big > > > > > > > > > > > > > > > > > > >> > > > concern). > > > > > > > > > > > > > > > > > > >> > > > >> So > > > > > > > > > > > > > > > > > > >> > > > >> >> we > > > > > > > > > > > > > > > > > > >> > > > >> >> > need to get around those > > > issues > > > > > > > > > somehow. I > > > > > > > > > > > am > > > > > > > > > > > > > > > > wondering > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > >> we > > > > > > > > > > > > > > > > > > >> > > have > > > > > > > > > > > > > > > > > > >> > > > >> >> > considered both options > and > > > > > which > > > > > > > > > approach > > > > > > > > > > > we > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > leaning > > > > > > > > > > > > > > > > > > >> > towards > > > > > > > > > > > > > > > > > > >> > > > for > > > > > > > > > > > > > > > > > > >> > > > >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> > implementation. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> Yes, this is a good > question. > > > My > > > > > > take > > > > > > > is > > > > > > > > > > that > > > > > > > > > > > a > > > > > > > > > > > > > big > > > > > > > > > > > > > > > part > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > >> the > > > > > > > > > > > > > > > > > > >> > > > >> complexity > > > > > > > > > > > > > > > > > > >> > > > >> >> in the old controller code > > > results > > > > > > > from > > > > > > > > > the > > > > > > > > > > > fact > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > >> ZK > > > > > > > > > > > > > > > > > > >> > as > > > > > > > > > > > > > > > > > > >> > > a > > > > > > > > > > > > > > > > > > >> > > > >> >> multi-writer database for > > > > > > propagating > > > > > > > > > > > > information > > > > > > > > > > > > > > > > between > > > > > > > > > > > > > > > > > > >> > different > > > > > > > > > > > > > > > > > > >> > > > >> >> components. So in the old > > > > > > controller, > > > > > > > > > every > > > > > > > > > > > > write > > > > > > > > > > > > > to > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> > be > > > > > > > > > > > > > > > > > > >> > > > >> >> structured as a compare > and > > > swap > > > > > to > > > > > > be > > > > > > > > > fully > > > > > > > > > > > > > > correct. > > > > > > > > > > > > > > > > > Every > > > > > > > > > > > > > > > > > > >> time > > > > > > > > > > > > > > > > > > >> > we > > > > > > > > > > > > > > > > > > >> > > > get > > > > > > > > > > > > > > > > > > >> > > > >> >> notified about something, > it's > > > > > > usually > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > form > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > "this > > > > > > > > > > > > > > > > > > >> znode > > > > > > > > > > > > > > > > > > >> > > > >> changed" > > > > > > > > > > > > > > > > > > >> > > > >> >> which prompts a full > reload of > > > > > part > > > > > > of > > > > > > > > the > > > > > > > > > > > data > > > > > > > > > > > > in > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > (which > > > > > > > > > > > > > > > > > > >> > itself > > > > > > > > > > > > > > > > > > >> > > > has > > > > > > > > > > > > > > > > > > >> > > > >> >> multiple parts, loading, > > > > > > > deserializing, > > > > > > > > > > > > > reconciling, > > > > > > > > > > > > > > > > etc.) > > > > > > > > > > > > > > > > > > >> That > > > > > > > > > > > > > > > > > > >> > all > > > > > > > > > > > > > > > > > > >> > > > goes > > > > > > > > > > > > > > > > > > >> > > > >> >> away in the new mode, and > we > > > just > > > > > > have > > > > > > > > > some > > > > > > > > > > > code > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > >> analyzes > > > > > > > > > > > > > > > > > > >> > > > >> >> __cluster_metadata and > > > reflects it > > > > > > in > > > > > > > 1) > > > > > > > > > > > updates > > > > > > > > > > > > > to > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > 2) > > > > > > > > > > > > > > > > > > >> > > > messages > > > > > > > > > > > > > > > > > > >> > > > >> sent > > > > > > > > > > > > > > > > > > >> > > > >> >> out to brokers. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> This is pretty decoupled > from > > > the > > > > > > > other > > > > > > > > > > logic > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> QuorumController > > > > > > > > > > > > > > > > > > >> > > and > > > > > > > > > > > > > > > > > > >> > > > >> >> should be easy to unit > test, > > > since > > > > > > the > > > > > > > > > same > > > > > > > > > > > > inputs > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> log > > > > > > > > > > > > > > > > > > >> > > > always > > > > > > > > > > > > > > > > > > >> > > > >> >> produce the same output > in ZK. > > > > > > > > Basically, > > > > > > > > > ZK > > > > > > > > > > > is > > > > > > > > > > > > > > > > write-only > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > >> > us, > > > > > > > > > > > > > > > > > > >> > > > we do > > > > > > > > > > > > > > > > > > >> > > > >> >> not read it (with the big > > > > > exception > > > > > > of > > > > > > > > > > broker > > > > > > > > > > > > > > > > registration > > > > > > > > > > > > > > > > > > >> > znodes) > > > > > > > > > > > > > > > > > > >> > > > and I > > > > > > > > > > > > > > > > > > >> > > > >> >> think that will greatly > > > simplify > > > > > > > things. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> So I think dual-write > mode as > > > > > > > described > > > > > > > > > here > > > > > > > > > > > > will > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > >> > substantially > > > > > > > > > > > > > > > > > > >> > > > >> simpler > > > > > > > > > > > > > > > > > > >> > > > >> >> than trying to run part or > > > all of > > > > > > the > > > > > > > > old > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> > parallel. I > > > > > > > > > > > > > > > > > > >> > > > do > > > > > > > > > > > > > > > > > > >> > > > >> >> think we will reuse a > bunch > > > of the > > > > > > > > > > > > serialization / > > > > > > > > > > > > > > > > > > >> > deserialization > > > > > > > > > > > > > > > > > > >> > > > code > > > > > > > > > > > > > > > > > > >> > > > >> for > > > > > > > > > > > > > > > > > > >> > > > >> >> znodes and possibly the > code > > > for > > > > > > > > > > communicating > > > > > > > > > > > > > with > > > > > > > > > > > > > > > ZK. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> best, > > > > > > > > > > > > > > > > > > >> > > > >> >> Colin > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > 14. Good point and make > > > sense. > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > Thanks, > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > Jun > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> > On Wed, Oct 12, 2022 at > > > 3:27 PM > > > > > > > Colin > > > > > > > > > > > McCabe < > > > > > > > > > > > > > > > > > > >> > cmcc...@apache.org > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > >> wrote: > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Hi Jun, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Thanks for taking a > look. > > > I can > > > > > > > > answer > > > > > > > > > > some > > > > > > > > > > > > > > > questions > > > > > > > > > > > > > > > > > > here > > > > > > > > > > > > > > > > > > >> > > > because I > > > > > > > > > > > > > > > > > > >> > > > >> >> >> collaborated on this a > > > bit, and > > > > > > > David > > > > > > > > > is > > > > > > > > > > on > > > > > > > > > > > > > > > vacation > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > >> few > > > > > > > > > > > > > > > > > > >> > > > days. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> On Wed, Oct 12, 2022, > at > > > 14:41, > > > > > > Jun > > > > > > > > Rao > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > Hi, David, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks for the KIP. > A few > > > > > > > comments > > > > > > > > > > below. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 10. It's still not > very > > > clear > > > > > > to > > > > > > > me > > > > > > > > > how > > > > > > > > > > > the > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> controller > > > > > > > > > > > > > > > > > > >> > > > works > > > > > > > > > > > > > > > > > > >> > > > >> in > > > > > > > > > > > > > > > > > > >> > > > >> >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > dual writes mode to > > > KRaft log > > > > > > and > > > > > > > > ZK > > > > > > > > > > when > > > > > > > > > > > > the > > > > > > > > > > > > > > > > brokers > > > > > > > > > > > > > > > > > > >> still > > > > > > > > > > > > > > > > > > >> > > run > > > > > > > > > > > > > > > > > > >> > > > in > > > > > > > > > > > > > > > > > > >> > > > >> ZK > > > > > > > > > > > > > > > > > > >> > > > >> >> >> mode. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > Does the KRaft > controller > > > > > run a > > > > > > > ZK > > > > > > > > > > based > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> > > parallel > > > > > > > > > > > > > > > > > > >> > > > or > > > > > > > > > > > > > > > > > > >> > > > >> do > > > > > > > > > > > > > > > > > > >> > > > >> >> we > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > derive what needs to > be > > > > > written > > > > > > > to > > > > > > > > ZK > > > > > > > > > > > based > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> > > controller > > > > > > > > > > > > > > > > > > >> > > > >> logic? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> We derive what needs > to be > > > > > > written > > > > > > > to > > > > > > > > > ZK > > > > > > > > > > > > based > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> > > controller > > > > > > > > > > > > > > > > > > >> > > > >> >> logic. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > I am also not sure > how > > > the > > > > > > KRaft > > > > > > > > > > > controller > > > > > > > > > > > > > > > handles > > > > > > > > > > > > > > > > > > >> broker > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > registration/deregistration, > > > > > > > since > > > > > > > > > > > brokers > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > > > >> running > > > > > > > > > > > > > > > > > > >> > > in > > > > > > > > > > > > > > > > > > >> > > > ZK > > > > > > > > > > > > > > > > > > >> > > > >> >> mode > > > > > > > > > > > > > > > > > > >> > > > >> >> >> and > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > are not heartbeating > to > > > the > > > > > > KRaft > > > > > > > > > > > > controller. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> The new controller will > > > listen > > > > > > for > > > > > > > > > broker > > > > > > > > > > > > > > > > registrations > > > > > > > > > > > > > > > > > > >> under > > > > > > > > > > > > > > > > > > >> > > > >> /brokers. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> This is the only znode > > > watch > > > > > that > > > > > > > the > > > > > > > > > new > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > >> do. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> We did consider > changing > > > how > > > > > > > ZK-based > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > registration > > > > > > > > > > > > > > > > > > >> > > worked, > > > > > > > > > > > > > > > > > > >> > > > >> but it > > > > > > > > > > > > > > > > > > >> > > > >> >> >> just ended up being too > > > much > > > > > work > > > > > > > for > > > > > > > > > not > > > > > > > > > > > > > enough > > > > > > > > > > > > > > > > gain. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 12. "A new set of > nodes > > > will > > > > > be > > > > > > > > > > > provisioned > > > > > > > > > > > > > to > > > > > > > > > > > > > > > host > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > > controller > > > > > > > > > > > > > > > > > > >> > > > >> >> >> quorum." > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > I guess we don't > support > > > > > > starting > > > > > > > > the > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > >> > quorum > > > > > > > > > > > > > > > > > > >> > > on > > > > > > > > > > > > > > > > > > >> > > > >> >> existing > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > brokers. It would be > > > useful > > > > > to > > > > > > > make > > > > > > > > > > that > > > > > > > > > > > > > clear. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Agreed > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 13. "Once the quorum > is > > > > > > > established > > > > > > > > > > and a > > > > > > > > > > > > > > leader > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > >> elected, > > > > > > > > > > > > > > > > > > >> > > the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> controller > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > will check the state > of > > > the > > > > > > > cluster > > > > > > > > > > using > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> MigrationCheck > > > > > > > > > > > > > > > > > > >> > > > RPC." > > > > > > > > > > > > > > > > > > >> > > > >> How > > > > > > > > > > > > > > > > > > >> > > > >> >> >> does > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > the quorum controller > > > detect > > > > > > > other > > > > > > > > > > > brokers? > > > > > > > > > > > > > > Does > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > controller > > > > > > > > > > > > > > > > > > >> > > > >> node > > > > > > > > > > > > > > > > > > >> > > > >> >> need > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > to be configured > with ZK > > > > > > > connection > > > > > > > > > > > string? > > > > > > > > > > > > > If > > > > > > > > > > > > > > > so, > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > >> would > > > > > > > > > > > > > > > > > > >> > be > > > > > > > > > > > > > > > > > > >> > > > >> useful > > > > > > > > > > > > > > > > > > >> > > > >> >> to > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > document the > additional > > > > > configs > > > > > > > > that > > > > > > > > > > the > > > > > > > > > > > > > quorum > > > > > > > > > > > > > > > > > > >> controller > > > > > > > > > > > > > > > > > > >> > > > needs to > > > > > > > > > > > > > > > > > > >> > > > >> >> set. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Yes, the controllers > > > monitor ZK > > > > > > for > > > > > > > > > > broker > > > > > > > > > > > > > > > > > registrations, > > > > > > > > > > > > > > > > > > >> as I > > > > > > > > > > > > > > > > > > >> > > > >> mentioned > > > > > > > > > > > > > > > > > > >> > > > >> >> >> above. So they need > > > zk.connect > > > > > > and > > > > > > > > the > > > > > > > > > > > other > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > connection > > > > > > > > > > > > > > > > > > >> > > > >> >> configurations. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 14. "In order to > prevent > > > > > > further > > > > > > > > > writes > > > > > > > > > > > to > > > > > > > > > > > > > ZK, > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > first > > > > > > > > > > > > > > > > > > >> > thing > > > > > > > > > > > > > > > > > > >> > > > the > > > > > > > > > > > > > > > > > > >> > > > >> new > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft quorum must do > is > > > take > > > > > > over > > > > > > > > > > > > leadership > > > > > > > > > > > > > of > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > > >> > > > controller. > > > > > > > > > > > > > > > > > > >> > > > >> " > > > > > > > > > > > > > > > > > > >> > > > >> >> The > > > > > > > > > > > > > > > > > > >> > > > >> >> >> ZK > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > controller processing > > > changes > > > > > > to > > > > > > > > > > > > /controller > > > > > > > > > > > > > > > update > > > > > > > > > > > > > > > > > > >> > > > asynchronously. > > > > > > > > > > > > > > > > > > >> > > > >> >> How > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > does the KRaft > controller > > > > > know > > > > > > > when > > > > > > > > > the > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > > >> > > > resigned > > > > > > > > > > > > > > > > > > >> > > > >> >> before > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > it can safely copy > the ZK > > > > > data? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> This should be done > through > > > > > > > > > > > > > > > > > > >> expectedControllerEpochZkVersion, > > > > > > > > > > > > > > > > > > >> > > just > > > > > > > > > > > > > > > > > > >> > > > >> like > > > > > > > > > > > > > > > > > > >> > > > >> >> in > > > > > > > > > > > > > > > > > > >> > > > >> >> >> ZK mode, right? We > should > > > bump > > > > > > this > > > > > > > > > epoch > > > > > > > > > > > > value > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > any > > > > > > > > > > > > > > > > > > >> > > writes > > > > > > > > > > > > > > > > > > >> > > > >> from > > > > > > > > > > > > > > > > > > >> > > > >> >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> old controller will > not go > > > > > > > through. I > > > > > > > > > > agree > > > > > > > > > > > > we > > > > > > > > > > > > > > > should > > > > > > > > > > > > > > > > > > spell > > > > > > > > > > > > > > > > > > >> > this > > > > > > > > > > > > > > > > > > >> > > > out > > > > > > > > > > > > > > > > > > >> > > > >> in > > > > > > > > > > > > > > > > > > >> > > > >> >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> KIP. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 15. We have the > following > > > > > > > > sentences. > > > > > > > > > > One > > > > > > > > > > > > says > > > > > > > > > > > > > > > > > > >> ControllerId > > > > > > > > > > > > > > > > > > >> > is > > > > > > > > > > > > > > > > > > >> > > a > > > > > > > > > > > > > > > > > > >> > > > >> random > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft broker and the > > > other > > > > > says > > > > > > > > it's > > > > > > > > > > the > > > > > > > > > > > > > active > > > > > > > > > > > > > > > > > > >> controller. > > > > > > > > > > > > > > > > > > >> > > > Which > > > > > > > > > > > > > > > > > > >> > > > >> one > > > > > > > > > > > > > > > > > > >> > > > >> >> is > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > correct? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > "UpdateMetadata: for > > > certain > > > > > > > > metadata > > > > > > > > > > > > > changes, > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> > > > controller > > > > > > > > > > > > > > > > > > >> > > > >> >> will > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > need to send > > > > > > > UpdateMetadataRequests > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > > ZK > > > > > > > > > > > > > > > > > brokers. > > > > > > > > > > > > > > > > > > >> For > > > > > > > > > > > > > > > > > > >> > the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > “ControllerId” field > in > > > this > > > > > > > > request, > > > > > > > > > > the > > > > > > > > > > > > > > > > controller > > > > > > > > > > > > > > > > > > >> should > > > > > > > > > > > > > > > > > > >> > > > >> specify a > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > random KRaft broker." > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > "In the > > > UpdateMetadataRequest > > > > > > > sent > > > > > > > > by > > > > > > > > > > the > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > > > > >> controller > > > > > > > > > > > > > > > > > > >> > to > > > > > > > > > > > > > > > > > > >> > > > the > > > > > > > > > > > > > > > > > > >> > > > >> ZK > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > brokers, the > ControllerId > > > > > will > > > > > > > > point > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > > > > > active > > > > > > > > > > > > > > > > > > >> > controller > > > > > > > > > > > > > > > > > > >> > > > which > > > > > > > > > > > > > > > > > > >> > > > >> >> will > > > > > > > > > > > > > > > > > > >> > > > >> >> >> be > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > used for the > inter-broker > > > > > > > > requests." > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, this seems like > an > > > error > > > > > to > > > > > > > me > > > > > > > > as > > > > > > > > > > > > well. A > > > > > > > > > > > > > > > > random > > > > > > > > > > > > > > > > > > >> value > > > > > > > > > > > > > > > > > > >> > is > > > > > > > > > > > > > > > > > > >> > > > not > > > > > > > > > > > > > > > > > > >> > > > >> >> really > > > > > > > > > > > > > > > > > > >> > > > >> >> >> useful. Plus the text > here > > > is > > > > > > > > > > > > > self-contradictory, > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > > >> > pointed > > > > > > > > > > > > > > > > > > >> > > > out. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> I suspect what we > should do > > > > > here > > > > > > is > > > > > > > > > add a > > > > > > > > > > > new > > > > > > > > > > > > > > > field, > > > > > > > > > > > > > > > > > > >> > > > >> KRaftControllerId, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> and populate it with > the > > > real > > > > > > > > > controller > > > > > > > > > > > ID, > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > leave > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> old > > > > > > > > > > > > > > > > > > >> > > > >> >> controllerId > > > > > > > > > > > > > > > > > > >> > > > >> >> >> field as -1. A ZK-based > > > broker > > > > > > that > > > > > > > > > sees > > > > > > > > > > > this > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > then > > > > > > > > > > > > > > > > > > >> consult > > > > > > > > > > > > > > > > > > >> > > its > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > controller.quorum.voters > > > > > > > > configuration > > > > > > > > > to > > > > > > > > > > > see > > > > > > > > > > > > > > where > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > >> should > > > > > > > > > > > > > > > > > > >> > > send > > > > > > > > > > > > > > > > > > >> > > > >> >> >> controller-bound RPCs. > That > > > > > > > (static) > > > > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > lets > > > > > > > > > > > > > > > > > > us > > > > > > > > > > > > > > > > > > >> map > > > > > > > > > > > > > > > > > > >> > > > >> between > > > > > > > > > > > > > > > > > > >> > > > >> >> >> controller ID and > > > host:port. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> We should still keep > our > > > > > existing > > > > > > > > epoch > > > > > > > > > > > logic > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > deciding > > > > > > > > > > > > > > > > > > >> > when > > > > > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadataRequest / > > > > > > > > > > > LeaderAndIsrRequests > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > stale, > > > > > > > > > > > > > > > > > > >> with > > > > > > > > > > > > > > > > > > >> > the > > > > > > > > > > > > > > > > > > >> > > > >> caveat > > > > > > > > > > > > > > > > > > >> > > > >> >> >> that any kraft-based > epoch > > > > > should > > > > > > > be > > > > > > > > > > > treated > > > > > > > > > > > > as > > > > > > > > > > > > > > > > greater > > > > > > > > > > > > > > > > > > >> than > > > > > > > > > > > > > > > > > > >> > any > > > > > > > > > > > > > > > > > > >> > > > >> >> ZK-based > > > > > > > > > > > > > > > > > > >> > > > >> >> >> epoch. After all, the > kraft > > > > > epoch > > > > > > > is > > > > > > > > > > coming > > > > > > > > > > > > > from > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > epoch > > > > > > > > > > > > > > > > > > >> of > > > > > > > > > > > > > > > > > > >> > > > >> >> >> __cluster_metadata, > > > whereas the > > > > > > ZK > > > > > > > > > epoch > > > > > > > > > > > > comes > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > ZK. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > 16. "Additionally, > the > > > > > > controller > > > > > > > > > must > > > > > > > > > > > > > specify > > > > > > > > > > > > > > > if a > > > > > > > > > > > > > > > > > > >> broker > > > > > > > > > > > > > > > > > > >> > in > > > > > > > > > > > > > > > > > > >> > > > >> >> >> “LiveBrokers” > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > is KRaft or ZK." Does > > > that > > > > > > > require > > > > > > > > > any > > > > > > > > > > > > > protocol > > > > > > > > > > > > > > > > > changes > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadata? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, I am also curious > > > why the > > > > > > we > > > > > > > > need > > > > > > > > > > to > > > > > > > > > > > > care > > > > > > > > > > > > > > > > whether > > > > > > > > > > > > > > > > > > >> > brokers > > > > > > > > > > > > > > > > > > >> > > > are > > > > > > > > > > > > > > > > > > >> > > > >> ZK > > > > > > > > > > > > > > > > > > >> > > > >> >> or > > > > > > > > > > > > > > > > > > >> > > > >> >> >> KRaft in > > > UpdateMetadataRequest. > > > > > > We > > > > > > > > > don't > > > > > > > > > > > > reveal > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > >> > clients, > > > > > > > > > > > > > > > > > > >> > > > so > > > > > > > > > > > > > > > > > > >> > > > >> can > > > > > > > > > > > > > > > > > > >> > > > >> >> we > > > > > > > > > > > > > > > > > > >> > > > >> >> >> just leave this out? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> best, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> Colin > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > Jun > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > On Wed, Oct 5, 2022 > at > > > 10:07 > > > > > AM > > > > > > > > > Mickael > > > > > > > > > > > > > Maison > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > >> > > > >> >> mickael.mai...@gmail.com > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > wrote: > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Hi David, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks for starting > this > > > > > > > important > > > > > > > > > > KIP. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> I've just taken a > quick > > > look > > > > > > so > > > > > > > > far > > > > > > > > > > but > > > > > > > > > > > > I've > > > > > > > > > > > > > > > got a > > > > > > > > > > > > > > > > > > >> couple > > > > > > > > > > > > > > > > > > >> > of > > > > > > > > > > > > > > > > > > >> > > > >> initial > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> questions: > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> 1) What happens if > a non > > > > > KRaft > > > > > > > > > > > compatible > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > (or > > > > > > > > > > > > > > > > > > >> with > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > kafka.metadata.migration.enable > > > > > > > > set > > > > > > > > > to > > > > > > > > > > > > > false) > > > > > > > > > > > > > > > > joins > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > cluster > > > > > > > > > > > > > > > > > > >> > > > >> after > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> the migration is > > > triggered? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> 2) In the Failure > Modes > > > > > > section > > > > > > > > you > > > > > > > > > > > > mention > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > scenario > > > > > > > > > > > > > > > > > > >> > where > > > > > > > > > > > > > > > > > > >> > > a > > > > > > > > > > > > > > > > > > >> > > > >> write > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> to ZK fails. What > > > happens > > > > > when > > > > > > > the > > > > > > > > > > > > > divergence > > > > > > > > > > > > > > > > limit > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > >> > > > reached? Is > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> this a fatal > condition? > > > How > > > > > > much > > > > > > > > > > > > divergence > > > > > > > > > > > > > > > should > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > >> > allow? > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Mickael > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> On Wed, Oct 5, 2022 > at > > > 12:20 > > > > > > AM > > > > > > > > > David > > > > > > > > > > > > > Arthur < > > > > > > > > > > > > > > > > > > >> > > mum...@gmail.com > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > >> >> wrote: > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > Hey folks, I > wanted > > > to get > > > > > > the > > > > > > > > > ball > > > > > > > > > > > > > rolling > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >> > > discussion > > > > > > > > > > > > > > > > > > >> > > > >> for > > > > > > > > > > > > > > > > > > >> > > > >> >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > ZooKeeper > migration > > > KIP. > > > > > > This > > > > > > > > KIP > > > > > > > > > > > > details > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > plan > > > > > > > > > > > > > > > > > > >> to > > > > > > > > > > > > > > > > > > >> > do > > > > > > > > > > > > > > > > > > >> > > > an > > > > > > > > > > > > > > > > > > >> > > > >> >> online > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration of > metadata > > > from > > > > > > > > > ZooKeeper > > > > > > > > > > > to > > > > > > > > > > > > > > KRaft > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > > well > > > > > > > > > > > > > > > > > > >> as > > > > > > > > > > > > > > > > > > >> > a > > > > > > > > > > > > > > > > > > >> > > > >> rolling > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > upgrade of > brokers to > > > > > KRaft > > > > > > > > mode. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > The general idea > is to > > > > > keep > > > > > > > > KRaft > > > > > > > > > > and > > > > > > > > > > > > > > > ZooKeeper > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> sync > > > > > > > > > > > > > > > > > > >> > > > during > > > > > > > > > > > > > > > > > > >> > > > >> the > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration, so both > > > types > > > > > of > > > > > > > > > brokers > > > > > > > > > > > can > > > > > > > > > > > > > > exist > > > > > > > > > > > > > > > > > > >> > > simultaneously. > > > > > > > > > > > > > > > > > > >> > > > >> Then, > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > once everything is > > > > > migrated > > > > > > > and > > > > > > > > > > > updated, > > > > > > > > > > > > > we > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > turn > > > > > > > > > > > > > > > > > > >> off > > > > > > > > > > > > > > > > > > >> > > > >> ZooKeeper > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > writes. > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > This is a pretty > > > complex > > > > > > KIP, > > > > > > > so > > > > > > > > > > > please > > > > > > > > > > > > > > take a > > > > > > > > > > > > > > > > > look > > > > > > > > > > > > > > > > > > :) > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > Thanks! > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > David > > > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> >> > > > > > > > > > > > > > > > > > > >> > > > >> >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > -- > > > > > > > > > > > > > > > > > > >> > > > > -David > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > -- > > > > > > > > > > > > > > > > > > >> > > -David > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -David > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -David > > > > > > > > > > > > > > > > > -- > > > David Arthur > > > > > > > -- > David Arthur >