Re: Consumer Group Rebalance Issues

2014-02-03 Thread Jun Rao
Yes, it looks like a runaway consumer. In the pastebin for host 62, the consumer id is trackingGroup_host62-1391197017388-c3ce6986, which is different from what's stored as the owner in ZK (trackingGroup_prod-host62- 1391197017407-f3e67af0-0). Thanks, Jun On Mon, Feb 3, 2014 at 11:08 AM, Drew G

Re: Consumer Group Rebalance Issues

2014-02-03 Thread Drew Goya
Sorry, I misspoke, I did manage to find the logs with failed rebalances on host62: worker-6727.log:2014-02-02 01:46:14 k.c.ZookeeperConsumerConnector [INFO] [trackingGroup_prod-host62-1391197017388-c3ce6986], trackingGroup_prod-host62-1391197017388-c3ce6986-0 attempting to claim partition 108 work

Re: Consumer Group Rebalance Issues

2014-02-03 Thread Drew Goya
That is another strange thing, this was the only consumer doing rebalances, host62 was happily consuming from its partitions. On Sat, Feb 1, 2014 at 9:15 PM, Jun Rao wrote: > What is host062 doing when the conflict occurred? Is it doing rebalance > too? If so, does it see the same set of partit

Re: Consumer Group Rebalance Issues

2014-02-01 Thread Jun Rao
What is host062 doing when the conflict occurred? Is it doing rebalance too? If so, does it see the same set of partitions and consumers as host061 does? Thanks, Jun On Sat, Feb 1, 2014 at 5:53 PM, Drew Goya wrote: > Hey all, this issue has recently popped up again. I've got a member of a >

Re: Consumer Group Rebalance Issues

2014-02-01 Thread Drew Goya
Hey all, this issue has recently popped up again. I've got a member of a consumer group stuck in a rebalance loop. It attempts to claim partitions but those are always conflicted: 2014-02-02 01:46:22 k.c.ZookeeperConsumerConnector [INFO] [trackingGroup_host061.pe1i.gradientx.com-1391301920791-99

Re: Consumer Group Rebalance Issues

2013-12-23 Thread Jason Rosenberg
We recently upgraded to 3.4.5, so far without incident. But I'd be interested to know whether we confirm that there are known problems with this! Jason On Mon, Dec 23, 2013 at 2:04 PM, Drew Goya wrote: > Thanks, I migrated our ZK cluster over to 3.3 this weekend. Hopefully that > does it! >

Re: Consumer Group Rebalance Issues

2013-12-23 Thread Drew Goya
Thanks, I migrated our ZK cluster over to 3.3 this weekend. Hopefully that does it! On Fri, Dec 20, 2013 at 9:09 AM, Jun Rao wrote: > Hmm, not sure how stable 3.4.4. We have been using 3.3.4 and haven't seen > issues with ZK as long as there aren't many ZK session expirations. > > Thanks, > >

Re: Consumer Group Rebalance Issues

2013-12-20 Thread Jun Rao
Hmm, not sure how stable 3.4.4. We have been using 3.3.4 and haven't seen issues with ZK as long as there aren't many ZK session expirations. Thanks, Jun On Thu, Dec 19, 2013 at 9:41 PM, Drew Goya wrote: > Our cluster is currently running 3.4.4. > > I see Kafka is currently using the 3.3.4 cl

Re: Consumer Group Rebalance Issues

2013-12-19 Thread Drew Goya
Our cluster is currently running 3.4.4. I see Kafka is currently using the 3.3.4 client, is there a potential conflict there? On Wed, Dec 18, 2013 at 9:12 PM, Jun Rao wrote: > The issue is that consumer 007 didn't see consumer 006 during rebalancing. > So, it made a decision in conflict with c

Re: Consumer Group Rebalance Issues

2013-12-18 Thread Jun Rao
The issue is that consumer 007 didn't see consumer 006 during rebalancing. So, it made a decision in conflict with consumer 006. Consumer 007 should have another ZK watcher fired to trigger another rebalance when if it will see consumer 006. Which version of ZK are you using? Thanks, Jun On Wed

Re: Consumer Group Rebalance Issues

2013-12-18 Thread Drew Goya
Thanks for the help with this Jun, really appreciate it! So I found this in the logs for consumer 007 about an hour previous. Besides that no real activity. It looks like 007 rebalanced and successfully claimed partition 24-27. Shortly after that its zookeeper client timed out and reconnected.

Re: Consumer Group Rebalance Issues

2013-12-17 Thread Jun Rao
What's consumer trackingGroup_prod-storm-sup-trk007 doing at the same time? It's the one that caused the conflict in ZK. Thanks, Jun On Tue, Dec 17, 2013 at 9:19 PM, Drew Goya wrote: > I explored that possibility but I'm not seeing any ZK session expirations > in the logs and it doesn't look

Re: Consumer Group Rebalance Issues

2013-12-17 Thread Drew Goya
I explored that possibility but I'm not seeing any ZK session expirations in the logs and it doesn't look like these rebalances complete. They fail due to conflicts in the zookeeper data On Tue, Dec 17, 2013 at 8:53 PM, Jun Rao wrote: > Have you looked at > > https://cwiki.apache.org/confluenc

Re: Consumer Group Rebalance Issues

2013-12-17 Thread Jun Rao
Have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog ? Thanks, Jun On Tue, Dec 17, 2013 at 9:24 AM, Drew Goya wrote: > Hey all, > > I've recently been having problems with consumer groups rebalancing. I'm > using several high l