[
https://issues.apache.org/jira/browse/KAFKA-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663063#comment-13663063
]
Jun Rao commented on KAFKA-901:
-------------------------------
Thanks for the followup patch. Some comments:
60. KafkaApis: The following logging logs the whole request for each partition.
This will probably pollute the log. Is it enough just to log the whole request
once?
if(stateChangeLogger.isTraceEnabled)
updateMetadataRequest.partitionStateInfos.foreach(p =>
stateChangeLogger.trace(("Broker %d handling " +
"UpdateMetadata request %s correlation id %d received from controller
%d epoch %d for partition %s")
.format(brokerId, p._2, updateMetadataRequest.correlationId,
updateMetadataRequest.controllerId,
updateMetadataRequest.controllerEpoch, p._1)))
Is the following logging necessary? If we know a request, we already know what
should be in the cache after processing the request.
if(stateChangeLogger.isTraceEnabled)
stateChangeLogger.trace(("Broker %d caching leader info %s for
partition %s in response to UpdateMetadata request sent by controller %d" +
" epoch %d with correlation id %d").format(brokerId,
partitionState._2, partitionState._1,
updateMetadataRequest.controllerId,
updateMetadataRequest.controllerEpoch, updateMetadataRequest.correlationId))
}
> Kafka server can become unavailable if clients send several metadata requests
> -----------------------------------------------------------------------------
>
> Key: KAFKA-901
> URL: https://issues.apache.org/jira/browse/KAFKA-901
> Project: Kafka
> Issue Type: Bug
> Components: replication
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Blocker
> Attachments: kafka-901-followup.patch, kafka-901.patch,
> kafka-901-v2.patch, kafka-901-v4.patch, kafka-901-v5.patch,
> metadata-request-improvement.patch
>
>
> Currently, if a broker is bounced without controlled shutdown and there are
> several clients talking to the Kafka cluster, each of the clients realize the
> unavailability of leaders for some partitions. This leads to several metadata
> requests sent to the Kafka brokers. Since metadata requests are pretty slow,
> all the I/O threads quickly become busy serving the metadata requests. This
> leads to a full request queue, that stalls handling of finished responses
> since the same network thread handles requests as well as responses. In this
> situation, clients timeout on metadata requests and send more metadata
> requests. This quickly makes the Kafka cluster unavailable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira