[ 
https://issues.apache.org/jira/browse/KAFKA-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527293#comment-14527293
 ] 

Ewen Cheslack-Postava commented on KAFKA-2135:
----------------------------------------------

[~dhay] is this a duplicate of KAFKA-1788? If you're sending the message after 
the broker is taken down and it's a single node cluster, you shouldn't even hit 
that code in Sender since the disconnection will have already occurred. The 
request will just sit in the RecordAccumulator until you're able to reconnect. 
(Eventually you should also start trying to refresh metadata, but with 0 nodes 
in a 1 node cluster available, you obviously can't do that either.)

As noted in that issue, there's a fix in the works via another patch that's 
adding timeouts so even if you're disconnected (or network partitioned, etc), 
the send request will timeout.

> New Kafka Producer Client: Send requests wait indefinitely if no broker is 
> available.
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2135
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2135
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.2.0
>            Reporter: David Hay
>            Assignee: Jun Rao
>            Priority: Critical
>
> I'm seeing issues when sending a message with the new producer client API.  
> The future returned from Producer.send() will block indefinitely if the 
> cluster is unreachable for some reason.  Here are the steps:
> # Start up a single node kafka cluster locally.
> # Start up application and create a KafkaProducer with the following config:
> {noformat}
> KafkaProducerWrapper values: 
>       compression.type = snappy
>       metric.reporters = []
>       metadata.max.age.ms = 300000
>       metadata.fetch.timeout.ms = 60000
>       acks = all
>       batch.size = 16384
>       reconnect.backoff.ms = 10
>       bootstrap.servers = [localhost:9092]
>       receive.buffer.bytes = 32768
>       retry.backoff.ms = 100
>       buffer.memory = 33554432
>       timeout.ms = 30000
>       key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder
>       retries = 3
>       max.request.size = 1048576
>       block.on.buffer.full = true
>       value.serializer = class com.mycompany.kafka.serializer.JsonEncoder
>       metrics.sample.window.ms = 30000
>       send.buffer.bytes = 131072
>       max.in.flight.requests.per.connection = 5
>       metrics.num.samples = 2
>       linger.ms = 0
>       client.id = site-json
> {noformat}
> # Send some messages...they are successfully sent
> # Shut down the kafka broker
> # Send another message.
> At this point, calling {{get()}} on the returned Future will block 
> indefinitely until the broker is restarted.
> It appears that there is some logic in 
> {{org.apache.kafka.clients.producer.internal.Sender}} that is supposed to 
> mark the Future as "done" in response to a disconnect event (towards the end 
> of the run(long) method).  However, the while loop earlier in this method 
> seems to remove the broker from consideration entirely, so the final loop 
> over ClientResponse objects is never executed.
> It seems like "timeout.ms" configuration should be honored in this case, or 
> perhaps introduce another timeout, indicating that we should give up waiting 
> for the cluster to return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to