[ 
https://issues.apache.org/jira/browse/KAFKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940748#comment-13940748
 ] 

Jun Rao commented on KAFKA-1303:
--------------------------------

The way the old producer works is the following. The producer always uses the 
brokers specified in metadata.broker.list for issuing metadata requests. The 
socket connections for sending metadata requests are separate from those used 
for sending the produce requests. metadata.broker.list can be configured with a 
vip or a list of brokers. In either case, it's the client's responsibility for 
making sure that there is at least 1 live broker in metadata.broker.list. The 
benefit of this approach is that metadata requests are never blocked behind 
produce requests, which reduces the probability of failed producer requests due 
to stale metadata.

The way the new producer works is to only use metadata.broker.list when sending 
the very first metadata request. After that, it uses the cluster info returned 
in the meta request for issuing subsequent metadata and produce request. The 
client is still responsible for making sure  that there is at least 1 live 
broker in metadata.broker.list. Otherwise, the producer won't work after a 
restart. This approach has the potential benefit of using fewer socket 
connections and balancing the metadata requests among more brokers after 
cluster expansion. However, currently, the implementation has the downside that 
metadata requests can be queued behind produce requests.

My feeling is that the approach in the old producer gives a better tradeoff. 
Metadata requests are cheap and cluster expansion is rare. So load balancing 
metadata requests among new brokers is not that critical. To implement this 
behavior in the new producer, we can keep the metadata brokers in Cluster and 
only use those brokers for issuing metadata requests. To reduce # of sockets, 
we can either close the socket immediately after the metadata response is 
received or after the socket has been idle for some time.

> metadata request in the new producer can be delayed
> ---------------------------------------------------
>
>                 Key: KAFKA-1303
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1303
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2
>            Reporter: Jun Rao
>
> While debugging a system test, I observed the following.
> 1. A broker side configuration 
> (replica.fetch.wait.max.ms=500,replica.fetch.min.bytes=4096) made the time to 
> complete a produce request long (each taking about 500ms with ack=-1).
> 2. The producer client has a bunch of outstanding produce requests queued up 
> on the brokers.
> 3. One of the brokers fails and we force updating the metadata.
> 4. The metadata request is queued up behind those outstanding producer 
> requests.
> 5. By the time the metadata response comes back, some messages have failed 
> all retries because of stale metadata.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to