Dong Lin created KAFKA-7152:
-------------------------------
Summary: replica should be in-sync if its LEO equals leader's LEO
Key: KAFKA-7152
URL: https://issues.apache.org/jira/browse/KAFKA-7152
Project: Kafka
Issue Type: Improvement
Reporter: Dong Lin
Assignee: Dong Lin
Currently a replica will be moved out of ISR if follower has not fetched from
leader for 10 sec (default replica.lag.time.max.ms). This cases problem in the
following scenario:
Say follower's ReplicaFetchThread needs to fetch 2k partitions from the leader
broker. Only 100 out of 2k partitions are actively being produced to and
therefore the total bytes in rate for those 2k partitions are small. The
following will happen:
1) The follower's ReplicaFetcherThread sends FetchRequest for those 2k
partitions.
2) Because the total bytes-in-rate for those 2k partitions is very small,
follower is able to catch up and leader broker adds these 2k partitions to ISR.
Follower's lastCaughtUpTimeMs for all partitions are updated to the current
time T0.
3) Since follower has caught up for all 2k partitions, leader updates 2k
partition znodes to include the follower in the ISR. It may take 20 seconds to
write 2k partition znodes if each znode write operation takes 10 ms.
4) At T0 + 15, maybeShrinkIsr() is invoked on leader broker. Since there is no
FetchRequet from the follower for more than 10 seconds after T0, all those 2k
partitions will be considered as out of syn and the follower will be removed
from ISR.
5) The follower receives FetchResponse at least 20 seconds after T0. That means
the next FetchRequest from follower to leader will be after T0 + 20.
The sequence of events described above will loop over time. There will be
constant churn of URP in the cluster even if follower can catch up with
leader's byte-in-rate. This reduces the cluster availability.
In order to address this problem, one simple approach is to keep follower in
the ISR as long as follower's LEO equals leader's LEO regardless of follower's
lastCaughtUpTimeMs. This is particularly useful if there are a lot of inactive
partitions in the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)