[
https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805548#comment-17805548
]
Luke Chen commented on KAFKA-16101:
-----------------------------------
It seems we didn't consider it during our KIP design. I'm thinking we might
need another flag to let brokers know this is rollbacking and update the
controller node in ZK for brokers. Thoughts?
> Kafka cluster unavailable during KRaft migration rollback procedure
> -------------------------------------------------------------------
>
> Key: KAFKA-16101
> URL: https://issues.apache.org/jira/browse/KAFKA-16101
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Affects Versions: 3.6.1
> Reporter: Paolo Patierno
> Priority: Major
>
> Hello,
> I was trying the KRaft migration rollback procedure locally and I came across
> a potential bug or anyway a situation where the cluster is not
> usable/available for a certain amount of time.
> In order to test the procedure, I start with a one broker (broker ID = 0) and
> one zookeeper node cluster. Then I start the migration with a one KRaft
> controller node (broker ID = 1). The migration runs fine and it reaches the
> point of "dual write" state.
> From this point, I try to run the rollback procedure as described in the
> documentation.
> As first step, this involves ...
> * stopping the broker
> * removing the __cluster_metadata folder
> * removing ZooKeeper migration flag and controller(s) related configuration
> from the broker
> * restarting the broker
> With the above steps done, the broker starts in ZooKeeper mode (no migration,
> no KRaft controllers knowledge) and it keeps logging the following messages
> in DEBUG:
> {code:java}
> [2024-01-08 11:51:20,608] DEBUG
> [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't
> cached, looking for local metadata changes
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,608] DEBUG
> [zk-broker-0-to-controller-forwarding-channel-manager]: No controller
> provided, retrying after backoff
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG
> [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't
> cached, looking for local metadata changes
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG
> [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller
> provided, retrying after backoff
> (kafka.server.BrokerToControllerRequestThread) {code}
> What's happening should be clear.
> The /controller znode in ZooKeeper still reports the KRaft controller (broker
> ID = 1) as controller. The broker gets it from the znode but doesn't know how
> to reach it.
> The issue is that until the procedure isn't fully completed with the next
> steps (shutting down KRaft controller, deleting /controller znode), the
> cluster is unusable. Any admin or client operation against the broker doesn't
> work, just hangs, the broker doesn't reply.
> Imagining this scenario to a more complex one with 10-20-50 brokers and
> partitions' replicas spread across them, when the brokers are rolled one by
> one (in ZK mode) reporting the above error, the topics will become not
> available one after the other, until all brokers are in such a state and
> nothing can work. This is because from a KRaft controller perspective (still
> running), the brokers are not available anymore and the partitions' replicas
> are out of sync.
> Of course, as soon as you complete the rollback procedure, after deleting the
> /controller znode, the brokers are able to elect a new controller among them
> and everything recovers to work.
> My first question ... isn't the cluster supposed to work during rollback and
> being always available during the rollback when the procedure is not
> completed yet? Or having the cluster not available is an assumption during
> the rollback, until it's fully completed?
> This "unavailability" time window could be reduced by deleting the
> /controller znode before shutting down the KRaft controllers to allow the
> brokers electing a new controller among them, but in this case, could there
> be a race condition where KRaft controllers still running could steal
> leadership again?
> Or is there anything missing in the documentation maybe which is driving to
> this problem?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)