[ https://issues.apache.org/jira/browse/KAFKA-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804684#comment-16804684 ]
ASF GitHub Bot commented on KAFKA-8175: --------------------------------------- huangyiminghappy commented on pull request #6522: KAFKA-8175: Remove streams overrides on repartition topics URL: https://github.com/apache/kafka/pull/6522 As described in (https://issues.apache.org/jira/browse/KAFKA-8175). if one of the node is block in the cluster,and when the client can not send updateMetaData to the antother node,the client will print much log like `org.apache.kafka.common.errors.TimeoutException: Expiring 1062 record(s) for kafka_test_111-8: 23967 ms has passed since batch creation plus linger timeFri Mar 29 11:34:14 CST 2019 ` . in someTimes the controller can not find the broker is down is offline soon,then client's batches can not send to the offline node,and also can not trigger the update metaData.so we need to check the connection's read state,if it not ready in the config time,close the channel,and trigger update the metaData ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The broker block some minutes may occur expired error message for a period of > time > ------------------------------------------------------------------------------------ > > Key: KAFKA-8175 > URL: https://issues.apache.org/jira/browse/KAFKA-8175 > Project: Kafka > Issue Type: Improvement > Reporter: huangyiming > Priority: Minor > > when the broker block some minutes, the producer may occur expired error > message for a period of time,that may continued for a period of time. if the > broker cluster's ip is 100,101,102,and the controller is the 100,when the 101 > block 2minutes(you can use gdb simulation,and attach the pid for 2 > minutes,last quit it), if the controller can not find the machine 101 > offline in time(for example the controller found 101 offline only 60 seconds > later ),and the controller send leaderAndIsr only 60 seconds later,and in the > RecordAccumulator's batches may occur much deliveryTimeout. and the > topicAndParttion'leader in 101 may occur expired error,and can not send > update metadata to another 100 or 102,because the record in 101 can not > send,and can not trigger timeout to update the metadata. > so i use -- This message was sent by Atlassian JIRA (v7.6.3#76005)