[ https://issues.apache.org/jira/browse/KAFKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932125#comment-13932125 ]
Jay Kreps commented on KAFKA-1303: ---------------------------------- I don't think this is actually a problem. A metadata request could be slow for any number of reasons in which case this will happen. There is no guarantee that failover + metadata refresh completes before the retries are exhausted. I would be opposed to having special connections for metadata requests. However if we want to try to improve this we could include a smarter heuristic in Sender.selectMetadataDestination. Currently we prefer a node that we already have a connection to and which has no requests currently being sent, however many requests could be sent but not processed yet. We could prefer instead the node which we have a connection to which has the fewest in-flight requests. > metadata request in the new producer can be delayed > --------------------------------------------------- > > Key: KAFKA-1303 > URL: https://issues.apache.org/jira/browse/KAFKA-1303 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.2 > Reporter: Jun Rao > > While debugging a system test, I observed the following. > 1. A broker side configuration > (replica.fetch.wait.max.ms=500,replica.fetch.min.bytes=4096) made the time to > complete a produce request long (each taking about 500ms with ack=-1). > 2. The producer client has a bunch of outstanding produce requests queued up > on the brokers. > 3. One of the brokers fails and we force updating the metadata. > 4. The metadata request is queued up behind those outstanding producer > requests. > 5. By the time the metadata response comes back, some messages have failed > all retries because of stale metadata. -- This message was sent by Atlassian JIRA (v6.2#6252)