[ https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sadek updated KAFKA-3004: ------------------------- Description: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry in its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I also see an increase in minor-GC collection around the same time. 2015-12-17T22:00:40.961+0000: 15693.112: [GC2015-12-17T22:00:46.404+0000: 15698.554: [ParNew: 282865K->3922K(314560K), 0.0104700 secs] 576345K->297570K(1013632K), 5.4531250 secs] [Times: user=0.05 sys=0.00, real=5.46 secs] Here's a snippet of the broker log around that time [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) 15754934 [main-SendThread(kfk02.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 12203ms for sessionid 0x151b10503e60002, closing socket connection and attempting reconnect [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient) 15755399 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL (unknown error) 15755400 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kfk01.local/10.124.80.140:2182, initiating session 15755401 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated timeout = 12000 [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) Any idea what may be causing this? Thanks! was: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry on its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I also see an increase in minor-GC collection around the same time. 2015-12-17T22:00:40.961+0000: 15693.112: [GC2015-12-17T22:00:46.404+0000: 15698.554: [ParNew: 282865K->3922K(314560K), 0.0104700 secs] 576345K->297570K(1013632K), 5.4531250 secs] [Times: user=0.05 sys=0.00, real=5.46 secs] Here's a snippet of the broker log around that time [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) 15754934 [main-SendThread(kfk02.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 12203ms for sessionid 0x151b10503e60002, closing socket connection and attempting reconnect [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient) 15755399 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL (unknown error) 15755400 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kfk01.local/10.124.80.140:2182, initiating session 15755401 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated timeout = 12000 [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) Any idea what may be causing this? Thanks! > Controller failing over repeatadly > ---------------------------------- > > Key: KAFKA-3004 > URL: https://issues.apache.org/jira/browse/KAFKA-3004 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.2.0 > Environment: Centos 6.5 > OpenJDK 1.7.0_79 > 6 Kafka nodes > 3 ZK nodes (cluster mode) > Reporter: Sadek > Assignee: Neha Narkhede > > While doing load testing we have noticed that the controller will fail over > almost every hour with the following entry in its log: > INFO [SessionExpirationListener on 4], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener) > I also see an increase in minor-GC collection around the same time. > 2015-12-17T22:00:40.961+0000: 15693.112: [GC2015-12-17T22:00:46.404+0000: > 15698.554: [ParNew: 282865K->3922K(314560K), 0.0104700 secs] > 576345K->297570K(1013632K), 5.4531250 secs] [Times: user=0.05 sys=0.00, > real=5.46 secs] > Here's a snippet of the broker log around that time > [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > 15754934 [main-SendThread(kfk02.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 12203ms for sessionid 0x151b10503e60002, closing socket > connection and attempting reconnect > [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > 15755399 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL > (unknown error) > 15755400 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kfk01.local/10.124.80.140:2182, initiating session > 15755401 [main-SendThread(kfk01.local:2182)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated > timeout = 12000 > [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > Any idea what may be causing this? > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)