Re: Kafka partitions replication issue

Ricardo Ferreira Wed, 17 Jun 2020 07:04:08 -0700

Karnam,

I think the combination of the setting preferred leader and auto leaderrebalance enable along with the hardware issue in broker-3 might begiving you the opposite effect that you are expecting. If the broker-3happens to be the preferred leader for a given partition (because ithappens to be the broker that hosted the original leader when thepartition was originally created) then the Kafka protocol will try topin that broker for that partition -- but as you say the broker ishaving hardware failures and thus it will fail in this attempt.


Here are things that you can try:

- Move the preferred leader to another broker using the`bin/kafka-preferred-replica-election` tool.

- Decrease the `min.insync.replicas` from 2 to 1 to allow producers andreplication to keep on going.

- Enable unclean election, which allows non-ISRs to become leaders (butopens margin for data loss)


- Solve the hardware issue in broker-3 =)

Nevertheless, it is never a good idea to keep preferred leader electionenabled if the cluster health is not constantly monitored and you arenot willing to keep moving those across the cluster from time to time.Keeping the cluster well balanced requires an increase of Ops tasks.This is the reason why Confluent created the feature called Auto DataBalancing<https://docs.confluent.io/current/kafka/rebalancer/index.html> thatkeeps partition leaders automatically and constantly spread over thecluster for you.


Thanks,

-- Ricardo

On 6/17/20 8:16 AM, Karnam, Sudheer wrote:

Team,
We are using kafka version 2.3.0 and we are facing issue with brokers 
replication
<https://support.d2iq.com/s/feed/0D53Z00007KdrfHSAR>
1.Kafka has 6 brokers.
2.Mainly 7 topics exist in kafka cluster and each topic has 128 partitions.
3.Each partition has 3 in-sync-replicas and these are distributed among 6 kafka 
brokers.
4.All partitions has preferred leader and "Auto Leader Rebalance Enable" 
configuration enabled.
Issue:
We had a kafka broker-3 failure because of hardware issues and partitions 
having broker-3 as leader are disrupted.
As per kafka official page, partitions should elect new leader once preferred 
leader fails.

[2020-06-01 14:02:25,029] ERROR [ReplicaManager broker=3] Error processing 
append operation on partition object-xxx-xxx-xx-na4-93 
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync 
replicas for partition object-xxx-xxx-xx-na4-93 is [1], below required minimum 
[2]

Above error message found in kafka logs,
" object-xxx-xxx-xx-na4-93 " topic has 128 partition and 93rd partition has 3 
replicas. It is distributed among (broker-3,broker-2,broker-4).
Broker -3 is the preferred leader.
When broker-3 failed, Leader position should move to any one of 
(broker-2,broker-4) but it didn't happened.
As per error message, whenever leader is failing it is throwing error by 
stating only one insync replica available.

Please help us in finding root cause for not selecting new leader.


Thanks,
Sudheer

Re: Kafka partitions replication issue

Reply via email to