Hi Kafka community,
   I like to propose a small change related to
OfflinePartitionLeaderElectionStrategy.
   In our system, we usually has RF = 3, Min_ISR = 2,
unclean.leader.election = false and client usually set the ACK.all when
publishing. We have observed that occasionally, when disk become bad, we
have partition  offline and stayed on the offline state, which of cause,
causing the availability issue and we have to manually set
unclean.leader.election = true to bring the partition online.
   This partition offlie due to disk failure become a huge operational pain
for us.

   Looking into, the sequence of events are:
   1. First, ISR for that partition drops to 1 (maybe bad disk causing the
broker to respond to fetch more slowly. Note dead disk doesn't cause this
to happen every time, but occasionally)
   2. Then disk completely give up and the failure causing leader replica
offline
   3. Because the ISR is 1, OfflinePartitionLeaderElectionStrategy won't
choose the leader if unclean.leader.election = false.

   The observation here is, in this case, even the last failed replica is
not in ISR, it still should have the HW same as the failed leader replica.
So the OfflinePartitionLeaderElectionStrategy should select the last failed
replica as the leader, espcially if it has the same HW.

   So the proposal is:
   1. Choose replica as the leader if it has the same HW (and even it is
not in ISR)
   2. Further, when unclean.leader.election = true, choose the replica with
highest HW as the leader.

   Let me know if this makes sense or any suggestions. If yes, I will
create a JIRA and work on it.

   Thanks!
   Ming

Reply via email to