Got it. Thanks a lot Ewen! Cheers, Steve
On Thu, Sep 3, 2015, 10:06 AM Ewen Cheslack-Postava <e...@confluent.io> wrote: > Steve, > > I don't think there is a better solution at the moment. This is an easy > issue to miss in unit testing because generally connections to localhost > will be rejected immediately if there isn't anything listening on the port. > If you're running in an environment where this happens normally, then for > now you'll need to wait for the long timeout. > > https://issues.apache.org/jira/browse/KAFKA-2120 may also alleviate the > problem by at least reducing the amount of time for the request to fail. > Depending on how adventurous you are, you could try using a version with > that patch and maybe adjust the setting lower than its default. > > -Ewen > > On Wed, Sep 2, 2015 at 10:46 AM, Steve Tian <steve.cs.t...@gmail.com> > wrote: > > > Would kafka dev kindly give us some advice on this? > > > > Cheers, Steve > > > > On Tue, Sep 1, 2015, 11:20 PM Steve Tian <steve.cs.t...@gmail.com> > wrote: > > > > > Thanks, Rahul! In my environment I need to have reconnect.backoff.ms > > > longer than OS default tcp timeout so that NetworkClient can give > second > > > node a try. > > > > > > I believe this is related to > > > https://issues.apache.org/jira/browse/KAFKA-2459 . > > > > > > Cheers, Steve > > > > > > On Tue, Sep 1, 2015, 5:24 PM Rahul Jain <rahul...@gmail.com> wrote: > > > > > >> We did notice something similar. When a broker node (out of 3) went > > down, > > >> metadata calls continued to go to the failed node and producer kept > > >> failing. We were able to make it work by increasing the > > >> reconnect.backoff.ms > > >> to 1 second. > > >> > > >> Something similar was discussed earlier - > > >> > > >> > > > http://qnalist.com/questions/6002514/new-producer-metadata-update-problem-on-2-node-cluster > > >> > > >> > > >> > > >> On Mon, Aug 31, 2015 at 11:00 PM, Steve Tian <steve.cs.t...@gmail.com > > > > >> wrote: > > >> > > >> > Hi everyone, > > >> > > > >> > Is there any concerns to have a long reconnect.backoff.ms for new > > java > > >> > Kafka producer (0.8.2.0/0.8.2.1)? > > >> > > > >> > Assuming we have > bootstrap.servers=host1:port1,host2:port2,host3:port3 > > >> and > > >> > host1 is *down* in the very beginning. If a newly created Kafka > > producer > > >> > decide to choose host1 as first node to connect for metadata update, > > >> then > > >> > that producer will keep trying on host1 *only* as default tcp > timeout > > is > > >> > surely longer than default value of reconnect.backoff.ms, which is > 10 > > >> ms. > > >> > > > >> > I am thinking to have reconnect.backoff.ms longer than N * T where > N > > is > > >> > the > > >> > number of nodes in bootstrap.servers and T is the default tcp > timeout. > > >> Is > > >> > there any concerns to have a long reconnect.backoff.ms like that? > > Any > > >> > better solutions? > > >> > > > >> > Cheers, Steve > > >> > > > >> > > > > > > > > > -- > Thanks, > Ewen >