Also of note, this is all running from within a storm topology, when I kill a JVM, another is started very quickly.
Could this be a problem with a consumer leaving and rejoining within a small window? On Mon, Nov 18, 2013 at 2:52 PM, Drew Goya <d...@gradientx.com> wrote: > Hey Guozhang, I just forced the error by killing one of my consumer JVMs > and I am getting a consumer rebalance failure: > > 2013-11-18 22:46:54 k.c.ZookeeperConsumerConnector [ERROR] > [bridgeTopology_host-1384493092466-7099d843], error during syncedRebalance > kafka.common.ConsumerRebalanceFailedException: > bridgeTopology_host-1384493092466-7099d843 can't rebalance after 10 retries > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:428) > ~[stormjar.jar:na] > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:355) > ~[stormjar.jar:na] > > These are the relevant lines in my consumer properties file: > > rebalance.max.retries=10 > rebalance.backoff.ms=10000 > > My topic has 128 partitions > > Are there some other configuration settings I should be using? > > > On Mon, Nov 18, 2013 at 2:37 PM, Guozhang Wang <wangg...@gmail.com> wrote: > >> Hello Drew, >> >> Do you see any rebalance failure exceptions in the consumer log? >> >> Guozhang >> >> >> On Mon, Nov 18, 2013 at 2:14 PM, Drew Goya <d...@gradientx.com> wrote: >> >> > So I've run into a problem where occasionally, some partitions within a >> > topic end up in a "none" owner state for a long time. >> > >> > I'm using the high-level consumer on several machines, each consumer >> has 4 >> > threads. >> > >> > Normally when I run the ConsumerOffsetChecker, all partitions have >> owners >> > and similar lag. >> > >> > Occasionally I end up in this state: >> > >> > trackingGroup Events2 32 552506856 >> > 569853398 17346542 none >> > trackingGroup Events2 33 553649131 >> > 569775298 16126167 none >> > trackingGroup Events2 34 552380321 >> > 569572719 17192398 none >> > trackingGroup Events2 35 553206745 >> > 569448821 16242076 none >> > trackingGroup Events2 36 553673576 >> > 570084283 16410707 none >> > trackingGroup Events2 37 552669833 >> > 569765642 17095809 none >> > trackingGroup Events2 38 553147178 >> > 569766985 16619807 none >> > trackingGroup Events2 39 552495219 >> > 569837815 17342596 none >> > trackingGroup Events2 40 570108655 >> > 570111080 2425 >> > trackingGroup_host6-1384385417822-23157ae8-0 >> > trackingGroup Events2 41 570288505 >> > 570291068 2563 >> > trackingGroup_host6-1384385417822-23157ae8-0 >> > trackingGroup Events2 42 569929870 >> > 569932330 2460 >> > trackingGroup_host6-1384385417822-23157ae8-0 >> > >> > I'm at the point where I'm considering writing my own client but >> hopefully >> > the user group has the answer! >> > >> > I am using this commit of 8.0 on both the brokers and clients: >> > d4553da609ea9af6db8a79faf116d1623c8a856f >> > >> >> >> >> -- >> -- Guozhang >> > >