Can you clarify, is this issue here specific to the "new" producer?  With
the "old" producer, we routinely construct a new producer which makes a
fresh metadata request (via a VIP connected to all nodes in the cluster).
Would this approach work with the new producer?

Jason


On Tue, May 5, 2015 at 1:12 PM, Rahul Jain <rahul...@gmail.com> wrote:

> Mayuresh,
> I was testing this in a development environment and manually brought down a
> node to simulate this. So the dead node never came back up.
>
> My colleague and I were able to consistently see this behaviour several
> times during the testing.
> On 5 May 2015 20:32, "Mayuresh Gharat" <gharatmayures...@gmail.com> wrote:
>
> > I agree that to find the least Loaded node the producer should fall back
> to
> > the bootstrap nodes if its not able to connect to any nodes in the
> current
> > metadata. That should resolve this.
> >
> > Rahul, I suppose the problem went off because the dead node in your case
> > might have came back up and allowed for a metadata update. Can you
> confirm
> > this?
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Tue, May 5, 2015 at 5:10 AM, Rahul Jain <rahul...@gmail.com> wrote:
> >
> > > We observed the exact same error. Not very clear about the root cause
> > > although it appears to be related to leastLoadedNode implementation.
> > > Interestingly, the problem went away by increasing the value of
> > > reconnect.backoff.ms to 1000ms.
> > > On 29 Apr 2015 00:32, "Ewen Cheslack-Postava" <e...@confluent.io>
> wrote:
> > >
> > > > Ok, all of that makes sense. The only way to possibly recover from
> that
> > > > state is either for K2 to come back up allowing the metadata refresh
> to
> > > > eventually succeed or to eventually try some other node in the
> cluster.
> > > > Reusing the bootstrap nodes is one possibility. Another would be for
> > the
> > > > client to get more metadata than is required for the topics it needs
> in
> > > > order to ensure it has more nodes to use as options when looking for
> a
> > > node
> > > > to fetch metadata from. I added your description to KAFKA-1843,
> > although
> > > it
> > > > might also make sense as a separate bug since fixing it could be
> > > considered
> > > > incremental progress towards resolving 1843.
> > > >
> > > > On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy <
> ku...@nmsworks.co.in
> > >
> > > > wrote:
> > > >
> > > > > Hi Ewen,
> > > > >
> > > > >  Thanks for the response.  I agree with you, In some case we should
> > use
> > > > > bootstrap servers.
> > > > >
> > > > >
> > > > > >
> > > > > > If you have logs at debug level, are you seeing this message in
> > > between
> > > > > the
> > > > > > connection attempts:
> > > > > >
> > > > > > Give up sending metadata request since no node is available
> > > > > >
> > > > >
> > > > >  Yes, this log came for couple of times.
> > > > >
> > > > >
> > > > > >
> > > > > > Also, if you let it continue running, does it recover after the
> > > > > > metadata.max.age.ms timeout?
> > > > > >
> > > > >
> > > > >  It does not reconnect.  It is continuously trying to connect with
> > dead
> > > > > node.
> > > > >
> > > > >
> > > > > -Manikumar
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Ewen
> > > >
> > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Reply via email to