Hi team, This is an issue that has frustrated me for quit some time. One of our clusters has three hosts. In my startup script, three zookeeper processes are brought up first followed by three kafka processes. The problem we have is that after three kafka processes are up, only one broker has been registered in zookeeper (In this case, host three). If I manually kill the kafka processes on host one and host two and restart them, they can register themselves with zookeeper successfully. I've attached logs from host one. The log indicated broker 1 was registered at /brokers/ids. When I checked zookeeper, I found only broker 3 was registered. It seems there is a race condition.
[2014-02-11 15:20:55,266] INFO Session establishment complete on server cfgtps1q -phys/HostOne:9181, sessionid = 0x144229beb980000, negotiated timeout = 100 00 (org.apache.zookeeper.ClientCnxn) [2014-02-11 15:20:55,268] INFO zookeeper state changed (SyncConnected) (org.I0It ec.zkclient.ZkClient) [2014-02-11 15:20:55,378] INFO /brokers/ids/1 exists with value { "host":"cfgtps 1q-phys.nam.nsroot.net", "jmx_port":9999, "port":11934, "version":1 } during con nection loss; this is ok (kafka.utils.ZkUtils$) [2014-02-11 15:20:55,379] INFO Registered broker 1 at path /brokers/ids/1 with a ddress hostone.xxx.xxxxxx.net:11934. (kafka.utils.ZkUtils$) [2014-02-11 15:20:55,380] INFO [Kafka Server 1], Connecting to ZK: HostOne :9181, HostTwo:9181, HostThree:9181 (kafka.server.KafkaServer) [2014-02-11 15:20:55,511] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) [2014-02-11 15:20:55,520] INFO conflict in /controller data: 1 stored data: 3 (k afka.utils.ZkUtils$) [2014-02-11 15:20:55,538] INFO [Kafka Server 1], Started (kafka.server.KafkaServ er) [2014-02-11 15:20:58,015] INFO 1 successfully elected as leader (kafka.server.Zo okeeperLeaderElector) [2014-02-11 15:20:58,605] INFO Accepted socket connection from /HostThree:52420 (org.apache.zookeeper.server.NIOServerCnxn) [2014-02-11 15:20:58,609] INFO Client attempting to establish new session at /HostThree:52420 (org.apache.zookeeper.server.NIOServerCnxn) [2014-02-11 15:20:58,616] INFO Established session 0x144229beb980001 with negotiated timeout 10000 for client /HostThree:52420 (org.apache.zookeeper.server.NIOServerCnxn) [2014-02-11 15:21:01,064] INFO New leader is 1 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2014-02-11 15:21:36,375] INFO Accepted socket connection from /xx.xx.xxx.xx:54709 (org.apache.zookeeper.server.NIOServerCnxn) [2014-02-11 15:21:36,378] INFO Client attempting to establish new session at /xx.xx.xxx.xx:54709 (org.apache.zookeeper.server.NIOServerCnxn) Regards, Libo