Do you see "zookeeper state changed (Expired)" in your logs? On Fri, Feb 27, 2015 at 10:12 AM, Jiangjie Qin <j...@linkedin.com.invalid> wrote:
> Can you paste the error log for each rebalance try? > You may search for keyword ³exception during rebalance². > > On 2/26/15, 7:41 PM, "Ashwin Jayaprakash" <ashwin.jayaprak...@gmail.com> > wrote: > > >Just give you some more debugging context, we noticed that the "consumers" > >path becomes empty after all the JVMs have exited because of this error. > >So, when we restart, there are no visible entries in ZK. > > > >On Thu, Feb 26, 2015 at 6:04 PM, Ashwin Jayaprakash < > >ashwin.jayaprak...@gmail.com> wrote: > > > >> Hello, we have a set of JVMs that consume messages from Kafka topics. > >>Each > >> JVM creates 4 ConsumerConnectors that are used by 4 separate threads. > >> These JVMs also create and use the CuratorFramework's Path children > >>cache > >> to watch and keep a sub-tree of the ZooKeeper in sync with other JVMs. > >>This > >> path has several thousand children elements. > >> > >> Everything was working perfectly until one fine day we decided to > >>restart > >> these JVMs. We restart these JVMs to roll in new code every few weeks or > >> so. We never had any problems until suddenly the Kafka consumers on > >>these > >> JVMs were unable to rebalance partitions among themselves. We have > >>bounced > >> these JVMs before with no issues. > >> > >> The exception: > >> Caused by: kafka.common.ConsumerRebalanceFailedException: > >> group1-system01-27422-kafka-787 can't rebalance after 12 retries > >> at > >> > >>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedReba > >>lance(ZookeeperConsumerConnector.scala:432) > >> at > >> > >>kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsume > >>rConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722) > >> at > >> > >>kafka.consumer.ZookeeperConsumerConnector$WildcardStreamsHandler.<init>(Z > >>ookeeperConsumerConnector.scala:756) > >> at > >> > >>kafka.consumer.ZookeeperConsumerConnector.createMessageStreamsByFilter(Zo > >>okeeperConsumerConnector.scala:145) > >> at > >> > >>kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreamsByF > >>ilter(ZookeeperConsumerConnector.scala:96) > >> at > >> > >>kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreamsByF > >>ilter(ZookeeperConsumerConnector.scala:100) > >> > >> We then set rebalance.max.retries=16 and rebalance.backoff.ms=10000. > >>I've > >> seen the Spark-Kafka issue > >> https://issues.apache.org/jira/browse/SPARK-5505 and Jun's > >>recommendation > >> to increase the backoff property. > >> > >> We must've tried restarting these JVMs about 20 times now both with and > >> without the "rebalance.xx" properties. Every time it is the same issue. > >> Except for the first time we applied the "rebalance.backoff.ms=10000" > >> property when all 4 JVMs started! We thought that solved everything and > >> then we tried restarting it just to make sure and then we were back to > >> square one. > >> > >> If we have only 1 thread create 1 ConsumerConnector instead of 4 it > >>works. > >> This way we can have any number of JVMs running 1 ConsumerConnector and > >> they all behave well and rebalance partitions. It is only when we try to > >> start multiple ConsumerConnectors on the same JVM does this problem > >>occur. > >> I'd like to remind you that 4 ConsumerConnectors was working for several > >> months. The ZK sub-tree for our non-Kafka part of the code was small > >>when > >> we started. > >> > >> Does anybody have any thoughts on this? What could be causing this > >>issue? > >> Could there be a Curator/ZK client conflict with the High level Kafka > >> consumer? Or is the number of nodes that we have on ZK from our code > >> causing problems with partition assignment in the Kafka code? Because > >>the > >> Curator framework keeps syncing data in the background while the Kafka > >>code > >> is creating ConsumerConnectors and rebalancing topics. > >> > >> Thanks, > >> Ashwin Jayaprakash. > >> > > -- -Regards, Mayuresh R. Gharat (862) 250-7125