Could you find some entries in the log with the key word "conflict"? If yes
could you paste them here?

Guozhang


On Mon, Nov 18, 2013 at 2:56 PM, Drew Goya <d...@gradientx.com> wrote:

> Also of note, this is all running from within a storm topology, when I kill
> a JVM, another is started very quickly.
>
> Could this be a problem with a consumer leaving and rejoining within a
> small window?
>
>
> On Mon, Nov 18, 2013 at 2:52 PM, Drew Goya <d...@gradientx.com> wrote:
>
> > Hey Guozhang, I just forced the error by killing one of my consumer JVMs
> > and I am getting a consumer rebalance failure:
> >
> > 2013-11-18 22:46:54 k.c.ZookeeperConsumerConnector [ERROR]
> > [bridgeTopology_host-1384493092466-7099d843], error during
> syncedRebalance
> > kafka.common.ConsumerRebalanceFailedException:
> > bridgeTopology_host-1384493092466-7099d843 can't rebalance after 10
> retries
> > at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:428)
> > ~[stormjar.jar:na]
> > at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:355)
> > ~[stormjar.jar:na]
> >
> > These are the relevant lines in my consumer properties file:
> >
> > rebalance.max.retries=10
> > rebalance.backoff.ms=10000
> >
> > My topic has 128 partitions
> >
> > Are there some other configuration settings I should be using?
> >
> >
> > On Mon, Nov 18, 2013 at 2:37 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> >> Hello Drew,
> >>
> >> Do you see any rebalance failure exceptions in the consumer log?
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Nov 18, 2013 at 2:14 PM, Drew Goya <d...@gradientx.com> wrote:
> >>
> >> > So I've run into a problem where occasionally, some partitions within
> a
> >> > topic end up in a "none" owner state for a long time.
> >> >
> >> > I'm using the high-level consumer on several machines, each consumer
> >> has 4
> >> > threads.
> >> >
> >> > Normally when I run the ConsumerOffsetChecker, all partitions have
> >> owners
> >> > and similar lag.
> >> >
> >> > Occasionally I end up in this state:
> >> >
> >> > trackingGroup   Events2                        32  552506856
> >> > 569853398       17346542        none
> >> > trackingGroup   Events2                        33  553649131
> >> > 569775298       16126167        none
> >> > trackingGroup   Events2                        34  552380321
> >> > 569572719       17192398        none
> >> > trackingGroup   Events2                        35  553206745
> >> > 569448821       16242076        none
> >> > trackingGroup   Events2                        36  553673576
> >> > 570084283       16410707        none
> >> > trackingGroup   Events2                        37  552669833
> >> > 569765642       17095809        none
> >> > trackingGroup   Events2                        38  553147178
> >> > 569766985       16619807        none
> >> > trackingGroup   Events2                        39  552495219
> >> > 569837815       17342596        none
> >> > trackingGroup   Events2                        40  570108655
> >> > 570111080       2425
> >> >  trackingGroup_host6-1384385417822-23157ae8-0
> >> > trackingGroup   Events2                        41  570288505
> >> > 570291068       2563
> >> >  trackingGroup_host6-1384385417822-23157ae8-0
> >> > trackingGroup   Events2                        42  569929870
> >> > 569932330       2460
> >> >  trackingGroup_host6-1384385417822-23157ae8-0
> >> >
> >> > I'm at the point where I'm considering writing my own client but
> >> hopefully
> >> > the user group has the answer!
> >> >
> >> > I am using this commit of 8.0 on both the brokers and clients:
> >> > d4553da609ea9af6db8a79faf116d1623c8a856f
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>



-- 
-- Guozhang

Reply via email to