Could you find some entries in the log with the key word "conflict"? If yes could you paste them here?
Guozhang On Mon, Nov 18, 2013 at 2:56 PM, Drew Goya <d...@gradientx.com> wrote: > Also of note, this is all running from within a storm topology, when I kill > a JVM, another is started very quickly. > > Could this be a problem with a consumer leaving and rejoining within a > small window? > > > On Mon, Nov 18, 2013 at 2:52 PM, Drew Goya <d...@gradientx.com> wrote: > > > Hey Guozhang, I just forced the error by killing one of my consumer JVMs > > and I am getting a consumer rebalance failure: > > > > 2013-11-18 22:46:54 k.c.ZookeeperConsumerConnector [ERROR] > > [bridgeTopology_host-1384493092466-7099d843], error during > syncedRebalance > > kafka.common.ConsumerRebalanceFailedException: > > bridgeTopology_host-1384493092466-7099d843 can't rebalance after 10 > retries > > at > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:428) > > ~[stormjar.jar:na] > > at > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:355) > > ~[stormjar.jar:na] > > > > These are the relevant lines in my consumer properties file: > > > > rebalance.max.retries=10 > > rebalance.backoff.ms=10000 > > > > My topic has 128 partitions > > > > Are there some other configuration settings I should be using? > > > > > > On Mon, Nov 18, 2013 at 2:37 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > >> Hello Drew, > >> > >> Do you see any rebalance failure exceptions in the consumer log? > >> > >> Guozhang > >> > >> > >> On Mon, Nov 18, 2013 at 2:14 PM, Drew Goya <d...@gradientx.com> wrote: > >> > >> > So I've run into a problem where occasionally, some partitions within > a > >> > topic end up in a "none" owner state for a long time. > >> > > >> > I'm using the high-level consumer on several machines, each consumer > >> has 4 > >> > threads. > >> > > >> > Normally when I run the ConsumerOffsetChecker, all partitions have > >> owners > >> > and similar lag. > >> > > >> > Occasionally I end up in this state: > >> > > >> > trackingGroup Events2 32 552506856 > >> > 569853398 17346542 none > >> > trackingGroup Events2 33 553649131 > >> > 569775298 16126167 none > >> > trackingGroup Events2 34 552380321 > >> > 569572719 17192398 none > >> > trackingGroup Events2 35 553206745 > >> > 569448821 16242076 none > >> > trackingGroup Events2 36 553673576 > >> > 570084283 16410707 none > >> > trackingGroup Events2 37 552669833 > >> > 569765642 17095809 none > >> > trackingGroup Events2 38 553147178 > >> > 569766985 16619807 none > >> > trackingGroup Events2 39 552495219 > >> > 569837815 17342596 none > >> > trackingGroup Events2 40 570108655 > >> > 570111080 2425 > >> > trackingGroup_host6-1384385417822-23157ae8-0 > >> > trackingGroup Events2 41 570288505 > >> > 570291068 2563 > >> > trackingGroup_host6-1384385417822-23157ae8-0 > >> > trackingGroup Events2 42 569929870 > >> > 569932330 2460 > >> > trackingGroup_host6-1384385417822-23157ae8-0 > >> > > >> > I'm at the point where I'm considering writing my own client but > >> hopefully > >> > the user group has the answer! > >> > > >> > I am using this commit of 8.0 on both the brokers and clients: > >> > d4553da609ea9af6db8a79faf116d1623c8a856f > >> > > >> > >> > >> > >> -- > >> -- Guozhang > >> > > > > > -- -- Guozhang