Thanks, Rahul! In my environment I need to have reconnect.backoff.ms longer than OS default tcp timeout so that NetworkClient can give second node a try.
I believe this is related to https://issues.apache.org/jira/browse/KAFKA-2459 . Cheers, Steve On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <rahul...@gmail.com> wrote: > We did notice something similar. When a broker node (out of 3) went down, > metadata calls continued to go to the failed node and producer kept > failing. We were able to make it work by increasing the > reconnect.backoff.ms > to 1 second. > > Something similar was discussed earlier - > > http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster > > > > On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <steve.cs.t...@gmail.com> > wrote: > > > Hi everyone, > > > > Is there any concerns to have a long reconnect.backoff.ms for new java > > Kafka producer (0.8.2.0/0.8.2.1)? > > > > Assuming we have bootstrap.servers=host1:port1,host2:port2,host3:port3 > and > > host1 is *down* in the very beginning. If a newly created Kafka producer > > decide to choose host1 as first node to connect for metadata update, then > > that producer will keep trying on host1 *only* as default tcp timeout is > > surely longer than default value of reconnect.backoff.ms, which is 10 > ms. > > > > I am thinking to have reconnect.backoff.ms longer than N * T where N is > > the > > number of nodes in bootstrap.servers and T is the default tcp timeout. > Is > > there any concerns to have a long reconnect.backoff.ms like that? Any > > better solutions? > > > > Cheers, Steve > > >