[ 
https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020743#comment-17020743
 ] 

Satish Duggana commented on KAFKA-8733:
---------------------------------------

[~flavr] Thanks for letting us know that you are also encountering the same 
issue. There is a discussion thread[1] on dev@kafka mailing list. Waiting for 
others to comment/close the discussion and start the voting mail thread.

1. 
[https://lists.apache.org/thread.html/243dcc267f7ba79f508bcc4cbaa77a41d2454cba9359173bb08e875e%40%3Cdev.kafka.apache.org%3E]

 

> Offline partitions occur when leader's disk is slow in reads while responding 
> to follower fetch requests.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8733
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.2, 2.4.0
>            Reporter: Satish Duggana
>            Assignee: Satish Duggana
>            Priority: Critical
>         Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our 
> clusters. After going through the broker logs and hosts’s disk stats, it 
> looks like this issue occurs whenever the read/write operations take more 
> time on that disk. In a particular case where read time is more than the 
> replica.lag.time.max.ms, follower replicas will be out of sync as their 
> earlier fetch requests are stuck while reading the local log and their fetch 
> status is not yet updated as mentioned in the below code of `ReplicaManager`. 
> If there is an issue in reading the data from the log for a duration more 
> than replica.lag.time.max.ms then all the replicas will be out of sync and 
> partition becomes offline if min.isr.replicas > 1 and unclean.leader.election 
> is false.
>  
> {code:java}
> def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
>   val result = readFromLocalLog( // this call took more than 
> `replica.lag.time.max.ms`
>   replicaId = replicaId,
>   fetchOnlyFromLeader = fetchOnlyFromLeader,
>   readOnlyCommitted = fetchOnlyCommitted,
>   fetchMaxBytes = fetchMaxBytes,
>   hardMaxBytesLimit = hardMaxBytesLimit,
>   readPartitionInfo = fetchInfos,
>   quota = quota,
>   isolationLevel = isolationLevel)
>   if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // 
> fetch time gets updated here, but mayBeShrinkIsr should have been already 
> called and the replica is removed from isr
>  else result
>  }
> val logReadResults = readFromLog()
> {code}
> Attached the graphs of disk weighted io time stats when this issue occurred.
> I will raise [KIP-501|https://s.apache.org/jhbpn] describing options on how 
> to handle this scenario.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to