[ https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020743#comment-17020743 ]
Satish Duggana commented on KAFKA-8733: --------------------------------------- [~flavr] Thanks for letting us know that you are also encountering the same issue. There is a discussion thread[1] on dev@kafka mailing list. Waiting for others to comment/close the discussion and start the voting mail thread. 1. [https://lists.apache.org/thread.html/243dcc267f7ba79f508bcc4cbaa77a41d2454cba9359173bb08e875e%40%3Cdev.kafka.apache.org%3E] > Offline partitions occur when leader's disk is slow in reads while responding > to follower fetch requests. > --------------------------------------------------------------------------------------------------------- > > Key: KAFKA-8733 > URL: https://issues.apache.org/jira/browse/KAFKA-8733 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 1.1.2, 2.4.0 > Reporter: Satish Duggana > Assignee: Satish Duggana > Priority: Critical > Attachments: weighted-io-time-2.png, wio-time.png > > > We found offline partitions issue multiple times on some of the hosts in our > clusters. After going through the broker logs and hosts’s disk stats, it > looks like this issue occurs whenever the read/write operations take more > time on that disk. In a particular case where read time is more than the > replica.lag.time.max.ms, follower replicas will be out of sync as their > earlier fetch requests are stuck while reading the local log and their fetch > status is not yet updated as mentioned in the below code of `ReplicaManager`. > If there is an issue in reading the data from the log for a duration more > than replica.lag.time.max.ms then all the replicas will be out of sync and > partition becomes offline if min.isr.replicas > 1 and unclean.leader.election > is false. > > {code:java} > def readFromLog(): Seq[(TopicPartition, LogReadResult)] = { > val result = readFromLocalLog( // this call took more than > `replica.lag.time.max.ms` > replicaId = replicaId, > fetchOnlyFromLeader = fetchOnlyFromLeader, > readOnlyCommitted = fetchOnlyCommitted, > fetchMaxBytes = fetchMaxBytes, > hardMaxBytesLimit = hardMaxBytesLimit, > readPartitionInfo = fetchInfos, > quota = quota, > isolationLevel = isolationLevel) > if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // > fetch time gets updated here, but mayBeShrinkIsr should have been already > called and the replica is removed from isr > else result > } > val logReadResults = readFromLog() > {code} > Attached the graphs of disk weighted io time stats when this issue occurred. > I will raise [KIP-501|https://s.apache.org/jhbpn] describing options on how > to handle this scenario. > -- This message was sent by Atlassian Jira (v8.3.4#803005)