Re: KRaft Migration and Kafka Controller behaviour

2024-03-19 Thread Paolo Patierno
Hi Sanaa,
from my experiece about running migration it never happened to me and it
should not happen anyway.

When a (ZooKeeper-based) broker registers to be the controller at the
beginning, you can see that the corresponding /controller znode will have
an -1 as epoch.
Something like:
{"version":2,"brokerid":0,"timestamp":"1710845218527","kraftControllerEpoch":-1}
When you deploy the KRaft quorum controller and roll the brokers to
register and start the migration, the controller role is got by one of the
KRaft controller and its epoch will be for sure greater than -1.
Something
like: 
{"version":2,"brokerid":4,"timestamp":"1710844690234","kraftControllerEpoch":10}
A KRaft controller is able to "steal" the controller role even because its
epoch will be for sure greater than -1.
So during or after a migration, a broker could not get the controller role
because only one with epoch greater than the current one can do that (and
for a ZooKeeper-based broker it will be always -1).
A broker can get back to be controller when you rollback migration so you
delete the controllers, and you have to delete the /controller znode (as
the procedure describe). Only in this case a broker is able to "win" the
/controller by using -1 as epoch (because the /controller znode doesn't
exist anymore).
Not sure if in your case you made some mistakes during the migration or
rolling the brokers.

Thanks,
Paolo.


On Mon, 18 Mar 2024 at 21:22, Sanaa Syed 
wrote:

> Hello,
>
> I've begun migrating some of my Zookeeper Kafka clusters to KRaft. A
> behaviour I've noticed twice across two different kafka cluster
> environments is after provisioning a kraft controller quorum in migration
> mode, it is possible for a kafka broker to become an active controller
> alongside a kraft controller broker.
>
> For example, here are the steps I follow and the behaviour I notice (I'm
> currently using Kafka v3.6):
> 1. Enable the KRaft migration on the existing Kafka brokers (set the
> `controller.quorum.voter`, `controller.listener.names` and
> `zookeeper.metadata.migration.enable` configs in the server.properties
> file).
> 2. Deploy a kraft controller statefulset and service with the migration
> enabled so that data is copied over from Zookeeper and we enter a
> dual-write mode.
> 3. After a few minutes, I see that the migration has completed (it's a
> pretty small cluster). At this point, the kraft controller pod has been
> elected to be the controller (and I see this in zookeeper when I run `get
> /controller`). If the kafka brokers or kraft controller pods are restarted
> at any point after the migration is completed, a kafka broker is elected to
> be the controller and is reflected in zookeeper as well. Now, I have two
> active controllers - 1 is a kafka broker and 1 is a kraft controller
> broker.
>
> A couple questions I have:
> 1. Is this the expected behaviour? If so, how long after a migration has
> been completed should we hold off on restarting kafka brokers to avoid this
> situation?
> 2. Why is it possible for a kafka broker to be a controller again
> post-migration?
> 3. How do we come back to a state where a kraft controller broker is the
> only controller again in the least disruptive way possible?
>
> Thank you,
> Sanaa
>


-- 
Paolo Patierno

*Senior Principal Software Engineer @ Red Hat**Microsoft MVP on **Azure*

Twitter : @ppatierno 
Linkedin : paolopatierno 
GitHub : ppatierno 


Kafka followers with higher leader epoch than leader

2024-03-19 Thread Karl Sorensen
Hi,

I have an unusual situation where I have a cluster running Kafka 3.5.1 in
strimzi where 4 of the __consumer_offset partitions have dropped under min
isr.

Everything else appears to be working fine.
Upon investigating, i've found that the partition followers appear to be
out of sync with the leader in terms of leader epoch

For example the leader-epoch-checkpoint file on the leader partition is
0
4
0 0
1 4
4 6
27 10

while the followers are
0
5
0 0
1 4
4 6
5 7
6 9

which appears to me like the followers are 2 elections ahead of the leader
and i'm not sure how they got to this situation.
I've attempted to force a new leader election via kafka-leader-elections
but it refused for both PREFERRED and UNCLEAN.
I've also tried a manual partition assignment to move the leader to another
broker but it wont do it.

What is even more strange is that if i watch the leader-epoch-checkpoint
file on one of the followers I can see it constantly changing as it tries
to sort itself out.
[kafka@internal-001-kafka-0 __consumer_offsets-18]$ cat
leader-epoch-checkpoint
0
3
0 0
1 4
4 6
[kafka@internal-001-kafka-0 __consumer_offsets-18]$ cat
leader-epoch-checkpoint
0
5
0 0
1 4
4 6
5 7
6 9

I have tried to manually remove the followers partition files on disk in an
attempt to get it to sync from the leader but it keeps returning to the
inconsistent state.

Restarting the broker with the partition leader on it doesn't seem to move
leadership either.

The follower keeps logging the following constantly
2024-03-19 09:23:11,169 INFO [ReplicaFetcher replicaId=2, leaderId=1,
fetcherId=0] Truncating partition __consumer_offsets-18 with
TruncationState(offset=7, completed=true) due to leader epoch and offset
EpochEndOffset(errorCode=0, partition=18, leaderEpoch=4, endOffset=10)
(kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,169 INFO [UnifiedLog partition=__consumer_offsets-18,
dir=/var/lib/kafka/data-0/kafka-log2] Truncating to offset 7
(kafka.log.UnifiedLog) [ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,174 INFO [UnifiedLog partition=__consumer_offsets-18,
dir=/var/lib/kafka/data-0/kafka-log2] Loading producer state till offset 7
with message format version 2 (kafka.log.UnifiedLog$)
[ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,174 INFO [UnifiedLog partition=__consumer_offsets-18,
dir=/var/lib/kafka/data-0/kafka-log2] Reloading from producer snapshot and
rebuilding producer state from offset 7 (kafka.log.UnifiedLog$)
[ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,174 INFO [ProducerStateManager
partition=__consumer_offsets-18]Loading producer state from snapshot file
'SnapshotFile(offset=7,
file=/var/lib/kafka/data-0/kafka-log2/__consumer_offsets-18/0007.snapshot)'
(org.apache.kafka.storage.internals.log.ProducerStateManager)
[ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,175 INFO [UnifiedLog partition=__consumer_offsets-18,
dir=/var/lib/kafka/data-0/kafka-log2] Producer state recovery took 1ms for
snapshot load and 0ms for segment recovery from offset 7
(kafka.log.UnifiedLog$) [ReplicaFetcherThread-0-1]
2024-03-19 09:23:11,175 WARN [UnifiedLog partition=__consumer_offsets-18,
dir=/var/lib/kafka/data-0/kafka-log2] Non-monotonic update of high
watermark from (offset=10segment=[0:4083]) to (offset=7segment=[0:3607])
(kafka.log.UnifiedLog) [ReplicaFetcherThread-0-1]

Any ideas of how to look at this further?
Thanks
Karl

-- 



--

The information contained in this electronic message and any 
attachments to this message are intended for the exclusive use of the 
addressee(s) and may contain proprietary, confidential or privileged 
information. If you are not the intended recipient, you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and destroy all copies of this message and any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.


Re: KRaft Migration and Kafka Controller behaviour

2024-03-19 Thread Sanaa Syed
Hi Paolo,

Thank you for your response! I tested out a different theory today where I
deployed the kraft controller statefulset and waited to see which brokers
would be elected as controllers.

Here is an example of my migration right after I have provisioned the kraft
controller brokers/statefulset. At this point, the brokers haven't been
restarted.

get /controller
{"version":2,"brokerid":1,"timestamp":"1710876891432","kraftControllerEpoch":-1}

get /migration
{"version":0,"kraft_metadata_offset":-1,"kraft_controller_id":-1,"kraft_metadata_epoch":-1,"kraft_controller_epoch":-1}

At this point, on a dashboard I have I see that a kafka broker is a
controller and a kraft controller broker is also a controller (although
it's not what I see in zookeeper as shown above). One thing to note is I am
doing this migration on a stretched cluster so this may alter the way that
the quorum is set up (I have three kraft controller brokers across three
regions).

After I roll the brokers, I find that this is still the case (the kraft
controller epoch has not increased in either znodes). If you don't mind
sharing, what were the steps that you followed to migrate to KRaft?

Thank you,
Sanaa

On Tue, Mar 19, 2024 at 7:05 AM Paolo Patierno 
wrote:

> Hi Sanaa,
> from my experiece about running migration it never happened to me and it
> should not happen anyway.
>
> When a (ZooKeeper-based) broker registers to be the controller at the
> beginning, you can see that the corresponding /controller znode will have
> an -1 as epoch.
> Something like:
>
> {"version":2,"brokerid":0,"timestamp":"1710845218527","kraftControllerEpoch":-1}
> When you deploy the KRaft quorum controller and roll the brokers to
> register and start the migration, the controller role is got by one of the
> KRaft controller and its epoch will be for sure greater than -1.
> Something
> like:
> {"version":2,"brokerid":4,"timestamp":"1710844690234","kraftControllerEpoch":10}
> A KRaft controller is able to "steal" the controller role even because its
> epoch will be for sure greater than -1.
> So during or after a migration, a broker could not get the controller role
> because only one with epoch greater than the current one can do that (and
> for a ZooKeeper-based broker it will be always -1).
> A broker can get back to be controller when you rollback migration so you
> delete the controllers, and you have to delete the /controller znode (as
> the procedure describe). Only in this case a broker is able to "win" the
> /controller by using -1 as epoch (because the /controller znode doesn't
> exist anymore).
> Not sure if in your case you made some mistakes during the migration or
> rolling the brokers.
>
> Thanks,
> Paolo.
>
>
> On Mon, 18 Mar 2024 at 21:22, Sanaa Syed 
> wrote:
>
> > Hello,
> >
> > I've begun migrating some of my Zookeeper Kafka clusters to KRaft. A
> > behaviour I've noticed twice across two different kafka cluster
> > environments is after provisioning a kraft controller quorum in migration
> > mode, it is possible for a kafka broker to become an active controller
> > alongside a kraft controller broker.
> >
> > For example, here are the steps I follow and the behaviour I notice (I'm
> > currently using Kafka v3.6):
> > 1. Enable the KRaft migration on the existing Kafka brokers (set the
> > `controller.quorum.voter`, `controller.listener.names` and
> > `zookeeper.metadata.migration.enable` configs in the server.properties
> > file).
> > 2. Deploy a kraft controller statefulset and service with the migration
> > enabled so that data is copied over from Zookeeper and we enter a
> > dual-write mode.
> > 3. After a few minutes, I see that the migration has completed (it's a
> > pretty small cluster). At this point, the kraft controller pod has been
> > elected to be the controller (and I see this in zookeeper when I run `get
> > /controller`). If the kafka brokers or kraft controller pods are
> restarted
> > at any point after the migration is completed, a kafka broker is elected
> to
> > be the controller and is reflected in zookeeper as well. Now, I have two
> > active controllers - 1 is a kafka broker and 1 is a kraft controller
> > broker.
> >
> > A couple questions I have:
> > 1. Is this the expected behaviour? If so, how long after a migration has
> > been completed should we hold off on restarting kafka brokers to avoid
> this
> > situation?
> > 2. Why is it possible for a kafka broker to be a controller again
> > post-migration?
> > 3. How do we come back to a state where a kraft controller broker is the
> > only controller again in the least disruptive way possible?
> >
> > Thank you,
> > Sanaa
> >
>
>
> --
> Paolo Patierno
>
> *Senior Principal Software Engineer @ Red Hat**Microsoft MVP on **Azure*
>
> Twitter : @ppatierno 
> Linkedin : paolopatierno 
> GitHub : ppatierno 
>


-- 
Sanaa Syed