Do you see "zookeeper state changed (Expired)" in your logs?

On Fri, Feb 27, 2015 at 10:12 AM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> Can you paste the error log for each rebalance try?
> You may search for keyword ³exception during rebalance².
>
> On 2/26/15, 7:41 PM, "Ashwin Jayaprakash" <ashwin.jayaprak...@gmail.com>
> wrote:
>
> >Just give you some more debugging context, we noticed that the "consumers"
> >path becomes empty after all the JVMs have exited because of this error.
> >So, when we restart, there are no visible entries in ZK.
> >
> >On Thu, Feb 26, 2015 at 6:04 PM, Ashwin Jayaprakash <
> >ashwin.jayaprak...@gmail.com> wrote:
> >
> >> Hello, we have a set of JVMs that consume messages from Kafka topics.
> >>Each
> >> JVM creates 4 ConsumerConnectors that are used by 4 separate threads.
> >> These JVMs also create and use the CuratorFramework's Path children
> >>cache
> >> to watch and keep a sub-tree of the ZooKeeper in sync with other JVMs.
> >>This
> >> path has several thousand children elements.
> >>
> >> Everything was working perfectly until one fine day we decided to
> >>restart
> >> these JVMs. We restart these JVMs to roll in new code every few weeks or
> >> so. We never had any problems until suddenly the Kafka consumers on
> >>these
> >> JVMs were unable to rebalance partitions among themselves.  We have
> >>bounced
> >> these JVMs before with no issues.
> >>
> >> The exception:
> >> Caused by: kafka.common.ConsumerRebalanceFailedException:
> >> group1-system01-27422-kafka-787 can't rebalance after 12 retries
> >> at
> >>
> >>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedReba
> >>lance(ZookeeperConsumerConnector.scala:432)
> >> at
> >>
> >>kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsume
> >>rConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722)
> >> at
> >>
> >>kafka.consumer.ZookeeperConsumerConnector$WildcardStreamsHandler.<init>(Z
> >>ookeeperConsumerConnector.scala:756)
> >> at
> >>
> >>kafka.consumer.ZookeeperConsumerConnector.createMessageStreamsByFilter(Zo
> >>okeeperConsumerConnector.scala:145)
> >> at
> >>
> >>kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreamsByF
> >>ilter(ZookeeperConsumerConnector.scala:96)
> >> at
> >>
> >>kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreamsByF
> >>ilter(ZookeeperConsumerConnector.scala:100)
> >>
> >> We then set rebalance.max.retries=16 and rebalance.backoff.ms=10000.
> >>I've
> >> seen the Spark-Kafka issue
> >> https://issues.apache.org/jira/browse/SPARK-5505 and Jun's
> >>recommendation
> >> to increase the backoff property.
> >>
> >> We must've tried restarting these JVMs about 20 times now both with and
> >> without the "rebalance.xx" properties. Every time it is the same issue.
> >> Except for the first time we applied the "rebalance.backoff.ms=10000"
> >> property when all 4 JVMs started! We thought that solved everything and
> >> then we tried restarting it just to make sure and then we were back to
> >> square one.
> >>
> >> If we have only 1 thread create 1 ConsumerConnector instead of 4 it
> >>works.
> >> This way we can have any number of JVMs running 1 ConsumerConnector and
> >> they all behave well and rebalance partitions. It is only when we try to
> >> start multiple ConsumerConnectors on the same JVM does this problem
> >>occur.
> >> I'd like to remind you that 4 ConsumerConnectors was working for several
> >> months. The ZK sub-tree for our non-Kafka part of the code was small
> >>when
> >> we started.
> >>
> >> Does anybody have any thoughts on this? What could be causing this
> >>issue?
> >> Could there be a Curator/ZK client conflict with the High level Kafka
> >> consumer? Or is the number of nodes that we have on ZK from our code
> >> causing problems with partition assignment in the Kafka code? Because
> >>the
> >> Curator framework keeps syncing data in the background while the Kafka
> >>code
> >> is creating ConsumerConnectors and rebalancing topics.
> >>
> >> Thanks,
> >> Ashwin Jayaprakash.
> >>
>
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Reply via email to