[ https://issues.apache.org/jira/browse/KAFKA-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663063#comment-13663063 ]
Jun Rao commented on KAFKA-901: ------------------------------- Thanks for the followup patch. Some comments: 60. KafkaApis: The following logging logs the whole request for each partition. This will probably pollute the log. Is it enough just to log the whole request once? if(stateChangeLogger.isTraceEnabled) updateMetadataRequest.partitionStateInfos.foreach(p => stateChangeLogger.trace(("Broker %d handling " + "UpdateMetadata request %s correlation id %d received from controller %d epoch %d for partition %s") .format(brokerId, p._2, updateMetadataRequest.correlationId, updateMetadataRequest.controllerId, updateMetadataRequest.controllerEpoch, p._1))) Is the following logging necessary? If we know a request, we already know what should be in the cache after processing the request. if(stateChangeLogger.isTraceEnabled) stateChangeLogger.trace(("Broker %d caching leader info %s for partition %s in response to UpdateMetadata request sent by controller %d" + " epoch %d with correlation id %d").format(brokerId, partitionState._2, partitionState._1, updateMetadataRequest.controllerId, updateMetadataRequest.controllerEpoch, updateMetadataRequest.correlationId)) } > Kafka server can become unavailable if clients send several metadata requests > ----------------------------------------------------------------------------- > > Key: KAFKA-901 > URL: https://issues.apache.org/jira/browse/KAFKA-901 > Project: Kafka > Issue Type: Bug > Components: replication > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Priority: Blocker > Attachments: kafka-901-followup.patch, kafka-901.patch, > kafka-901-v2.patch, kafka-901-v4.patch, kafka-901-v5.patch, > metadata-request-improvement.patch > > > Currently, if a broker is bounced without controlled shutdown and there are > several clients talking to the Kafka cluster, each of the clients realize the > unavailability of leaders for some partitions. This leads to several metadata > requests sent to the Kafka brokers. Since metadata requests are pretty slow, > all the I/O threads quickly become busy serving the metadata requests. This > leads to a full request queue, that stalls handling of finished responses > since the same network thread handles requests as well as responses. In this > situation, clients timeout on metadata requests and send more metadata > requests. This quickly makes the Kafka cluster unavailable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira