We did notice something similar. When a broker node (out of 3) went down,
metadata calls continued to go to the failed node and producer kept
failing. We were able to make it work by increasing the reconnect.backoff.ms
to 1 second.

Something similar was discussed earlier -
http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster



On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <steve.cs.t...@gmail.com>
wrote:

> Hi everyone,
>
> Is there any concerns to have a long reconnect.backoff.ms for new java
> Kafka producer (0.8.2.0/0.8.2.1)?
>
> Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3 and
> host1 is *down* in the very beginning. If a newly created Kafka producer
> decide to choose host1 as first node to connect for metadata update, then
> that producer will keep trying on host1 *only* as default tcp timeout is
> surely longer than default value of reconnect.backoff.ms, which is 10 ms.
>
> I am thinking to have reconnect.backoff.ms longer than N * T where N is
> the
> number of nodes in bootstrap.servers and T is the default tcp timeout.  Is
> there any concerns to have a long reconnect.backoff.ms like that?  Any
> better solutions?
>
> Cheers, Steve
>

Reply via email to