[ https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359954#comment-14359954 ]
K Zakee commented on KAFKA-2011: -------------------------------- Thanks Jiangjie. Though the ZK client session timeout has stopped the controller re-elections. On digging deeper, I found that controller elected log shows slightly before the ZK session timeout logs like below is one example: [2015-03-11 04:28:14,435] INFO Client session timed out, have not heard from server in 34105ms for sessionid 0x24bf1b6f5310075, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2015-03-11 04:27:48,007] INFO 1 successfully elected as leader (kafka.server.ZookeeperLeaderElector) I am wondering if ZK Session timeout caused the controller election (re-election), why logs depicting the other way around. > Rebalance with auto.leader.rebalance.enable=false > -------------------------------------------------- > > Key: KAFKA-2011 > URL: https://issues.apache.org/jira/browse/KAFKA-2011 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.2.0 > Environment: 5 Hosts of below config: > "x86_64" "32-bit, 64-bit" "Little Endian" "24 GenuineIntel CPUs Model 44 > 1600.000MHz" "RAM 189 GB" GNU/Linux > Reporter: K Zakee > Priority: Blocker > Attachments: controller-logs-1.zip, controller-logs-2.zip > > > Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as > below: > auto.leader.rebalance.enable=false > controlled.shutdown.enable=true > controlled.shutdown.max.retries=1 > controlled.shutdown.retry.backoff.ms=5000 > default.replication.factor=3 > log.cleaner.enable=true > log.cleaner.threads=5 > log.cleanup.policy=delete > log.flush.scheduler.interval.ms=3000 > log.retention.minutes=1440 > log.segment.bytes=1073741824 > message.max.bytes=1000000 > num.io.threads=14 > num.network.threads=14 > num.partitions=10 > queued.max.requests=500 > num.replica.fetchers=4 > replica.fetch.max.bytes=1048576 > replica.fetch.min.bytes=51200 > replica.lag.max.messages=5000 > replica.lag.time.max.ms=30000 > replica.fetch.wait.max.ms=1000 > fetch.purgatory.purge.interval.requests=5000 > producer.purgatory.purge.interval.requests=5000 > delete.topic.enable=true > Logs show rebalance happening well up to 24 hours after the start. > [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica > election for partitions: (kafka.controller.KafkaController) > [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed > preferred replica election: (kafka.controller.KafkaController) > … > [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica > election for partitions: (kafka.controller.KafkaController) > ... > [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica > election for partitions: (kafka.controller.KafkaController) > ... > [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica > leader election for partitions (kafka.controller.KafkaController) > ... > [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing > preferred replica election: (kafka.controller.KafkaController) -- This message was sent by Atlassian JIRA (v6.3.4#6332)