Hi all ,
I'm a user of kafka(version is kafka_2.10-0.8.2.0), but recently I met a problem annoying me for a time. I create a topic named A for example. this topic has 18 partitions ,and run 9 webservices on 9 servers to consume this topic,each service consume 2 partitions configured in file . It run well at first but one day I found the service consume speed slowed down! use this command * kafka-run-class.sh kafka.tools.ConsumerOffsetChecke*r I found this topic A was only consumed by 4 service! The rebalancing not work! From the log , it say, 2015-08-18 16:08:46.156 WARN [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.RangeAssignor:83 - *No broker partitions consumed by consumer thread ops180021036.sh-1439282265455-8455a15b-1 for topic A* 2015-08-18 16:08:46.156 WARN [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.RangeAssignor:83 -* No broker partitions consumed by consumer thread ops180021036.sh-1439282265455-8455a15b-0 for topic A* 2015-08-18 16:08:46.156 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], *Consumer ops180021036.sh-1439282265455-8455a15b selected partitions : * 2015-08-18 16:08:46.156 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b],* end rebalancing consumer ops180021036.sh-1439282265455-8455a15b try #0* 2015-08-18 16:08:46.157 INFO [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread] kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 - [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Starting 2015-08-18 16:08:47.038 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], begin rebalancing consumer ops180021036.sh-1439282265455-8455a15b try #0 2015-08-18 16:08:47.047 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ConsumerFetcherManager:68 - [ConsumerFetcherManager-1439282265512] Stopping leader finder thread 2015-08-18 16:08:47.047 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 - [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutting down 2015-08-18 16:08:47.047 INFO [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread] kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 - [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Stopped 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 - [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutdown completed 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ConsumerFetcherManager:68 - [ConsumerFetcherManager-1439282265512] Stopping all fetchers 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ConsumerFetcherManager:68 - [ConsumerFetcherManager-1439282265512] All connections stopped 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], Cleared all relevant queues for this fetcher 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], Cleared the data chunks in all the consumer message iterators 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], Committing all offsets after clearing the fetcher queues 2015-08-18 16:08:47.048 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.ZookeeperConsumerConnector:68 - [ops180021036.sh-1439282265455-8455a15b], Releasing partition ownership 2015-08-18 16:08:47.178 INFO [ops180021036.sh-1439282265455-8455a15b_watcher_executor] kafka.consumer.RangeAssignor:68 - Consumer ops180021036.sh-1439282265455-8455a15b *rebalancing the following partitions*: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) for topic A *with consumers: List*(ops178090103.sh-1439282239893-c5966e01-0, ops178090103.sh-1439282239893-c5966e01-1, ops178090103.sh-1439282260982-7ef4f1c5-0, ops178090103.sh-1439282260982-7ef4f1c5-1, ops178090103.sh-1439282272017-8d0eae20-0, ops178090103.sh-1439282272017-8d0eae20-1, ops178093118.sh-1439282241888-810389d0-0, ops178093118.sh-1439282241888-810389d0-1, ops178093118.sh-1439282272949-812592a0-0, ops178093118.sh-1439282272949-812592a0-1, ops178096091.sh-1439282247256-dd4f01c3-0, ops178096091.sh-1439282247256-dd4f01c3-1, ops178096091.sh-1439282261099-1ca552ad-0, ops178096091.sh-1439282261099-1ca552ad-1, ops178096091.sh-1439282272076-3865c416-0, ops178096091.sh-1439282272076-3865c416-1, ops178096218.sh-1439282244077-e44933cb-0, ops178096218.sh-1439282244077-e44933cb-1, ops178096218.sh-1439282250962-6d91ea06-0, ops178096218.sh-1439282250962-6d91ea06-1, ops178096218.sh-1439282255978-44fae577-0, ops178096218.sh-1439282255978-44fae577-1, ops178103086.sh-1439282238431-38473eed-0, ops178103086.sh-1439282238431-38473eed-1, ops178103086.sh-1439282245101-dd2d2e8a-0, ops178103086.sh-1439282245101-dd2d2e8a-1, ops178103086.sh-1439282250200-9ec3e4f9-0, ops178103086.sh-1439282250200-9ec3e4f9-1, ops180019230.sh-1439282246060-6c17dbe0-0, ops180019230.sh-1439282246060-6c17dbe0-1, ops180019230.sh-1439282251861-46b2e7d0-0, ops180019230.sh-1439282251861-46b2e7d0-1, ops180019230.sh-1439282256080-8c4f4d28-0, ops180019230.sh-1439282256080-8c4f4d28-1, ops180021036.sh-1439282250060-57c09362-0, ops180021036.sh-1439282250060-57c09362-1, ops180021036.sh-1439282255415-b301daa2-0, ops180021036.sh-1439282255415-b301daa2-1, ops180021036.sh-1439282265455-8455a15b-0, ops180021036.sh-1439282265455-8455a15b-1, ops180021223.sh-1439282248773-578c62d0-0, ops180021223.sh-1439282248773-578c62d0-1, ops180021223.sh-1439282254389-a5d71a5a-0, ops180021223.sh-1439282254389-a5d71a5a-1, ops180021223.sh-1439282258421-16b051fb-0, ops180021223.sh-1439282258421-16b051fb-1, ops180022028.sh-1439282252296-d3b32c71-0, ops180022028.sh-1439282252296-d3b32c71-1, ops180022028.sh-1439282258091-38be130a-0, ops180022028.sh-1439282258091-38be130a-1, ops180022028.sh-1439282262207-e6a740b4-0, ops180022028.sh-1439282262207-e6a740b4-1) kafka.consumer.ZookeeperConsumerConnector:76 - [ops180021036.sh-1437380766435-2bfff03d],* exception during rebalance * at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551) I formatted the consumer list as follows, *ops178090103 for example is represented as 10.178.90.103* ops178090103.sh-1439282239893-c5966e01-0, ops178090103.sh-1439282239893-c5966e01-1, ops178090103.sh-1439282260982-7ef4f1c5-0, ops178090103.sh-1439282260982-7ef4f1c5-1, ops178090103.sh-1439282272017-8d0eae20-0, ops178090103.sh-1439282272017-8d0eae20-1, ops178093118.sh-1439282241888-810389d0-0, ops178093118.sh-1439282241888-810389d0-1, ops178093118.sh-1439282272949-812592a0-0, ops178093118.sh-1439282272949-812592a0-1, ops178096091.sh-1439282247256-dd4f01c3-0, ops178096091.sh-1439282247256-dd4f01c3-1, ops178096091.sh-1439282261099-1ca552ad-0, ops178096091.sh-1439282261099-1ca552ad-1, ops178096091.sh-1439282272076-3865c416-0, ops178096091.sh-1439282272076-3865c416-1, ops178096218.sh-1439282244077-e44933cb-0, ops178096218.sh-1439282244077-e44933cb-1, ops178096218.sh-1439282250962-6d91ea06-0, ops178096218.sh-1439282250962-6d91ea06-1, ops178096218.sh-1439282255978-44fae577-0, ops178096218.sh-1439282255978-44fae577-1, ops178103086.sh-1439282238431-38473eed-0, ops178103086.sh-1439282238431-38473eed-1, ops178103086.sh-1439282245101-dd2d2e8a-0, ops178103086.sh-1439282245101-dd2d2e8a-1, ops178103086.sh-1439282250200-9ec3e4f9-0, ops178103086.sh-1439282250200-9ec3e4f9-1, ops180019230.sh-1439282246060-6c17dbe0-0, ops180019230.sh-1439282246060-6c17dbe0-1, ops180019230.sh-1439282251861-46b2e7d0-0, ops180019230.sh-1439282251861-46b2e7d0-1, ops180019230.sh-1439282256080-8c4f4d28-0, ops180019230.sh-1439282256080-8c4f4d28-1, ops180021036.sh-1439282250060-57c09362-0, ops180021036.sh-1439282250060-57c09362-1, ops180021036.sh-1439282255415-b301daa2-0, ops180021036.sh-1439282255415-b301daa2-1, ops180021036.sh-1439282265455-8455a15b-0, ops180021036.sh-1439282265455-8455a15b-1, ops180021223.sh-1439282248773-578c62d0-0, ops180021223.sh-1439282248773-578c62d0-1, ops180021223.sh-1439282254389-a5d71a5a-0, ops180021223.sh-1439282254389-a5d71a5a-1, ops180021223.sh-1439282258421-16b051fb-0, ops180021223.sh-1439282258421-16b051fb-1, ops180022028.sh-1439282252296-d3b32c71-0, ops180022028.sh-1439282252296-d3b32c71-1, ops180022028.sh-1439282258091-38be130a-0, ops180022028.sh-1439282258091-38be130a-1, ops180022028.sh-1439282262207-e6a740b4-0, ops180022028.sh-1439282262207-e6a740b4-1 at last,* only the ops178** can consume can consume this topic,.. all host can ping each other successfully. SO, I know when a consumer died or add a new consumer will lead the rebalancing. but what factors will affect this rebalancing , and what factors will cause the failure of rebalancing ? Thankļ¼ Best regards, aluen