Hi Sanaa,
from my experiece about running migration it never happened to me and it
should not happen anyway.

When a (ZooKeeper-based) broker registers to be the controller at the
beginning, you can see that the corresponding /controller znode will have
an -1 as epoch.
Something like:
{"version":2,"brokerid":0,"timestamp":"1710845218527","kraftControllerEpoch":-1}
When you deploy the KRaft quorum controller and roll the brokers to
register and start the migration, the controller role is got by one of the
KRaft controller and its epoch will be for sure greater than -1.
Something
like: 
{"version":2,"brokerid":4,"timestamp":"1710844690234","kraftControllerEpoch":10}
A KRaft controller is able to "steal" the controller role even because its
epoch will be for sure greater than -1.
So during or after a migration, a broker could not get the controller role
because only one with epoch greater than the current one can do that (and
for a ZooKeeper-based broker it will be always -1).
A broker can get back to be controller when you rollback migration so you
delete the controllers, and you have to delete the /controller znode (as
the procedure describe). Only in this case a broker is able to "win" the
/controller by using -1 as epoch (because the /controller znode doesn't
exist anymore).
Not sure if in your case you made some mistakes during the migration or
rolling the brokers.

Thanks,
Paolo.


On Mon, 18 Mar 2024 at 21:22, Sanaa Syed <sanaa.s...@shopify.com.invalid>
wrote:

> Hello,
>
> I've begun migrating some of my Zookeeper Kafka clusters to KRaft. A
> behaviour I've noticed twice across two different kafka cluster
> environments is after provisioning a kraft controller quorum in migration
> mode, it is possible for a kafka broker to become an active controller
> alongside a kraft controller broker.
>
> For example, here are the steps I follow and the behaviour I notice (I'm
> currently using Kafka v3.6):
> 1. Enable the KRaft migration on the existing Kafka brokers (set the
> `controller.quorum.voter`, `controller.listener.names` and
> `zookeeper.metadata.migration.enable` configs in the server.properties
> file).
> 2. Deploy a kraft controller statefulset and service with the migration
> enabled so that data is copied over from Zookeeper and we enter a
> dual-write mode.
> 3. After a few minutes, I see that the migration has completed (it's a
> pretty small cluster). At this point, the kraft controller pod has been
> elected to be the controller (and I see this in zookeeper when I run `get
> /controller`). If the kafka brokers or kraft controller pods are restarted
> at any point after the migration is completed, a kafka broker is elected to
> be the controller and is reflected in zookeeper as well. Now, I have two
> active controllers - 1 is a kafka broker and 1 is a kraft controller
> broker.
>
> A couple questions I have:
> 1. Is this the expected behaviour? If so, how long after a migration has
> been completed should we hold off on restarting kafka brokers to avoid this
> situation?
> 2. Why is it possible for a kafka broker to be a controller again
> post-migration?
> 3. How do we come back to a state where a kraft controller broker is the
> only controller again in the least disruptive way possible?
>
> Thank you,
> Sanaa
>


-- 
Paolo Patierno

*Senior Principal Software Engineer @ Red Hat**Microsoft MVP on **Azure*

Twitter : @ppatierno <http://twitter.com/ppatierno>
Linkedin : paolopatierno <http://it.linkedin.com/in/paolopatierno>
GitHub : ppatierno <https://github.com/ppatierno>

Reply via email to