[ https://issues.apache.org/jira/browse/KAFKA-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-901: -------------------------------- Attachment: kafka-901-followup.patch Attaching a follow up patch that fixes 2 minor things - 1. Added update metadata request handling to the state change log. This makes is much easier to troubleshoot any issue with metadata cache refreshing 2. A minor bug in the controller channel manager that fixes it to read an UpdateMetadataResponse properly. > Kafka server can become unavailable if clients send several metadata requests > ----------------------------------------------------------------------------- > > Key: KAFKA-901 > URL: https://issues.apache.org/jira/browse/KAFKA-901 > Project: Kafka > Issue Type: Bug > Components: replication > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Priority: Blocker > Attachments: kafka-901-followup.patch, kafka-901.patch, > kafka-901-v2.patch, kafka-901-v4.patch, kafka-901-v5.patch, > metadata-request-improvement.patch > > > Currently, if a broker is bounced without controlled shutdown and there are > several clients talking to the Kafka cluster, each of the clients realize the > unavailability of leaders for some partitions. This leads to several metadata > requests sent to the Kafka brokers. Since metadata requests are pretty slow, > all the I/O threads quickly become busy serving the metadata requests. This > leads to a full request queue, that stalls handling of finished responses > since the same network thread handles requests as well as responses. In this > situation, clients timeout on metadata requests and send more metadata > requests. This quickly makes the Kafka cluster unavailable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira