[ https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sadek updated KAFKA-3004: ------------------------- Description: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry on its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I also see an increase in minor-GC collection around the same time. 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000: 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs] 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00, real=5.71 secs] Here's a snippet of the broker log around that time [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) 15754934 [main-SendThread(kfk02.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 12203ms for sessionid 0x151b10503e60002, closing socket connection and attempting reconnect [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient) 15755399 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL (unknown error) 15755400 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kfk01.local/10.124.80.140:2182, initiating session 15755401 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated timeout = 12000 [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) Any idea what may be causing this? Thanks! was: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry on its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I also see an increase in minor-GC collection around the same time. 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000: 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs] 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00, real=5.71 secs] I've tried increasing zookeeper.connection.timeout.ms to 60000 but it doesn't seem to help and I still see the default (6000) value in the ZK logs: INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x351b0090ea80000 with negotiated timeout 6000 for client /10...... Any idea what may be causing this? Thanks! > Controller failing over repeatadly > ---------------------------------- > > Key: KAFKA-3004 > URL: https://issues.apache.org/jira/browse/KAFKA-3004 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.2.0 > Environment: Centos 6.5 > OpenJDK 1.7.0_79 > 6 Kafka nodes > 3 ZK nodes (cluster mode) > Reporter: Sadek > Assignee: Neha Narkhede > > While doing load testing we have noticed that the controller will fail over > almost every hour with the following entry on its log: > INFO [SessionExpirationListener on 4], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener) > I also see an increase in minor-GC collection around the same time. > 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000: > 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs] > 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00, > real=5.71 secs] > Here's a snippet of the broker log around that time > [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > 15754934 [main-SendThread(kfk02.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 12203ms for sessionid 0x151b10503e60002, closing socket > connection and attempting reconnect > [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > 15755399 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL > (unknown error) > 15755400 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kfk01.local/10.124.80.140:2182, initiating session > 15755401 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated > timeout = 12000 > [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > Any idea what may be causing this? > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)