Hi team,
This is an issue that has frustrated me for quit some time. One of our clusters
has
three hosts. In my startup script, three zookeeper processes are brought up
first followed
by three kafka processes. The problem we have is that after three kafka
processes are up,
only one broker has been registered in zookeeper (In this case, host three). If
I manually
kill the kafka processes on host one and host two and restart them, they can
register
themselves with zookeeper successfully. I've attached logs from host one. The
log indicated
broker 1 was registered at /brokers/ids. When I checked zookeeper, I found only
broker 3
was registered. It seems there is a race condition.
[2014-02-11 15:20:55,266] INFO Session establishment complete on server cfgtps1q
-phys/HostOne:9181, sessionid = 0x144229beb980000, negotiated timeout = 100
00 (org.apache.zookeeper.ClientCnxn)
[2014-02-11 15:20:55,268] INFO zookeeper state changed (SyncConnected) (org.I0It
ec.zkclient.ZkClient)
[2014-02-11 15:20:55,378] INFO /brokers/ids/1 exists with value { "host":"cfgtps
1q-phys.nam.nsroot.net", "jmx_port":9999, "port":11934, "version":1 } during con
nection loss; this is ok (kafka.utils.ZkUtils$)
[2014-02-11 15:20:55,379] INFO Registered broker 1 at path /brokers/ids/1 with a
ddress hostone.xxx.xxxxxx.net:11934. (kafka.utils.ZkUtils$)
[2014-02-11 15:20:55,380] INFO [Kafka Server 1], Connecting to ZK: HostOne
:9181, HostTwo:9181, HostThree:9181 (kafka.server.KafkaServer)
[2014-02-11 15:20:55,511] INFO Will not load MX4J, mx4j-tools.jar is not in the
classpath (kafka.utils.Mx4jLoader$)
[2014-02-11 15:20:55,520] INFO conflict in /controller data: 1 stored data: 3 (k
afka.utils.ZkUtils$)
[2014-02-11 15:20:55,538] INFO [Kafka Server 1], Started (kafka.server.KafkaServ
er)
[2014-02-11 15:20:58,015] INFO 1 successfully elected as leader (kafka.server.Zo
okeeperLeaderElector)
[2014-02-11 15:20:58,605] INFO Accepted socket connection from /HostThree:52420
(org.apache.zookeeper.server.NIOServerCnxn)
[2014-02-11 15:20:58,609] INFO Client attempting to establish new session at
/HostThree:52420 (org.apache.zookeeper.server.NIOServerCnxn)
[2014-02-11 15:20:58,616] INFO Established session 0x144229beb980001 with
negotiated timeout 10000 for client /HostThree:52420
(org.apache.zookeeper.server.NIOServerCnxn)
[2014-02-11 15:21:01,064] INFO New leader is 1
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2014-02-11 15:21:36,375] INFO Accepted socket connection from
/xx.xx.xxx.xx:54709 (org.apache.zookeeper.server.NIOServerCnxn)
[2014-02-11 15:21:36,378] INFO Client attempting to establish new session at
/xx.xx.xxx.xx:54709 (org.apache.zookeeper.server.NIOServerCnxn)
Regards,
Libo