Hmm....

This issue continues to emerge occasionally, albeit less often than in the 
past. 

If I hit it after several days or months of uptime, that would be okay, but 
today I have hit it twice within the first hour of 2 separate load tests.

I've cleaned up the code in my application to ensure I do not start / stop 
consumers rapidly.  In the most recent case, a consumer had been in use for 
several minutes before being shutdown, and this stack trace still emerged.

For me, it's not harmless, because this exception is on a background thread 
that continues to spin wildly (continually hitting this exception rather than 
aborting) long after I've shutdown and disposed of my consumer.  I never have a 
chance to intercept it, because I never receive the exception in my code.

The only remedy is to restart my application, which seems very undesirable.

I'm using a recent build of Kafka 0.8 pulled from the 0.8 branch within the 
last month; actually, I built it on June 25, the date of this original thread.

Thoughts?
________________________________________
From: Jun Rao [jun...@gmail.com]
Sent: Tuesday, June 25, 2013 11:58 PM
To: users@kafka.apache.org
Subject: Re: 0.8 throwing exception "Failed to find leader" and high-level 
consumer fails to make progress

The exception is likely due to a race condition btw the logic in ZK watcher
and the closing of ZK connection. It's harmless, except for the weird
exception.

Thanks,

Jun


On Tue, Jun 25, 2013 at 10:07 AM, Hargett, Phil <
phil.harg...@mirror-image.com> wrote:

> Possibly.
>
> I see evidence that its being stopped / started every 30 seconds in same
> cases (due to my code). It's entirely possible that I have a race, too, in
> that 2 separate pieces of code could be triggering such a stop / start.
>
> Gives me something to track down. Thank you!!
>
> On Jun 25, 2013, at 12:45 PM, "Jun Rao" <jun...@gmail.com> wrote:
>
> > This typically only happens when the consumerConnector is shut down. Are
> > you restarting the consumerConnector often?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jun 25, 2013 at 9:40 AM, Hargett, Phil <
> > phil.harg...@mirror-image.com> wrote:
> >
> >> Seeing this exception a LOT (3-4 times per second, same log topic).
> >>
> >> I'm using external code to feed data to about 50 different log topics
> over
> >> a cluster of 3 Kafka 0.8 brokers.  There are 3 ZooKeeper instances as
> well,
> >> all of this is running on EC2.  My application creates a high-level
> >> consumer (1 per topic) to consumer data from each and do further
> processing.
> >>
> >> The problem is this exception is in the high-level consumer, so my code
> >> has no way of knowing that it's become stuck.
> >>
> >> This exception does not always appear, but as far as I can tell, once
> this
> >> happens, the only cure is to restart my application's process.
> >>
> >> I saw this in 0.8 built from source about 1 week ago, and also am seeing
> >> it today after pulling the latest 0.8 sources and rebuilding Kafka.
> >>
> >> Thoughts?
> >>
> >> Failed to find leader for Set([topic6,0]):
> java.lang.NullPointerException
> >>        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
> >>        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
> >>        at
> >> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> >>        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
> >>        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
> >>        at
> >> kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
> >>        at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75)
> >>        at
> >>
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63)
> >>        at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> >>
>

Reply via email to