Hmm.... This issue continues to emerge occasionally, albeit less often than in the past.
If I hit it after several days or months of uptime, that would be okay, but today I have hit it twice within the first hour of 2 separate load tests. I've cleaned up the code in my application to ensure I do not start / stop consumers rapidly. In the most recent case, a consumer had been in use for several minutes before being shutdown, and this stack trace still emerged. For me, it's not harmless, because this exception is on a background thread that continues to spin wildly (continually hitting this exception rather than aborting) long after I've shutdown and disposed of my consumer. I never have a chance to intercept it, because I never receive the exception in my code. The only remedy is to restart my application, which seems very undesirable. I'm using a recent build of Kafka 0.8 pulled from the 0.8 branch within the last month; actually, I built it on June 25, the date of this original thread. Thoughts? ________________________________________ From: Jun Rao [jun...@gmail.com] Sent: Tuesday, June 25, 2013 11:58 PM To: users@kafka.apache.org Subject: Re: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress The exception is likely due to a race condition btw the logic in ZK watcher and the closing of ZK connection. It's harmless, except for the weird exception. Thanks, Jun On Tue, Jun 25, 2013 at 10:07 AM, Hargett, Phil < phil.harg...@mirror-image.com> wrote: > Possibly. > > I see evidence that its being stopped / started every 30 seconds in same > cases (due to my code). It's entirely possible that I have a race, too, in > that 2 separate pieces of code could be triggering such a stop / start. > > Gives me something to track down. Thank you!! > > On Jun 25, 2013, at 12:45 PM, "Jun Rao" <jun...@gmail.com> wrote: > > > This typically only happens when the consumerConnector is shut down. Are > > you restarting the consumerConnector often? > > > > Thanks, > > > > Jun > > > > > > On Tue, Jun 25, 2013 at 9:40 AM, Hargett, Phil < > > phil.harg...@mirror-image.com> wrote: > > > >> Seeing this exception a LOT (3-4 times per second, same log topic). > >> > >> I'm using external code to feed data to about 50 different log topics > over > >> a cluster of 3 Kafka 0.8 brokers. There are 3 ZooKeeper instances as > well, > >> all of this is running on EC2. My application creates a high-level > >> consumer (1 per topic) to consumer data from each and do further > processing. > >> > >> The problem is this exception is in the high-level consumer, so my code > >> has no way of knowing that it's become stuck. > >> > >> This exception does not always appear, but as far as I can tell, once > this > >> happens, the only cure is to restart my application's process. > >> > >> I saw this in 0.8 built from source about 1 week ago, and also am seeing > >> it today after pulling the latest 0.8 sources and rebuilding Kafka. > >> > >> Thoughts? > >> > >> Failed to find leader for Set([topic6,0]): > java.lang.NullPointerException > >> at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) > >> at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) > >> at > >> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) > >> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) > >> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) > >> at > >> kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438) > >> at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75) > >> at > >> > kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63) > >> at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > >> >