Hello, I've recently been doing research into getting our Kafka cluster running outside of Mesos (for a couple of reasons). However I'm noticing about 10% of the time Kafka fails to start on boot (or more accurately starts, and immediately exits). I find it weird since all brokers are using the exact same configuration, on the same OS (Ubuntu 16.04)
There's nothing in my LOG4J directory, however I did find a singular log line within $KAFKA_DIR/logs/kafkaServer.out that shed the actual light as to why it's failing: ``` Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9999; nested exception is: java.net.BindException: Address already in use (Bind failed) ``` However, I can verify nothing is running on this port right before invocation using netstat -tulpn which shows: ``` upstart.sh[1127]: Active Internet connections (only servers) upstart.sh[1127]: Proto Recv-Q Send-Q Local Address Foreign Address State PID/Pr upstart.sh[1127]: tcp 0 0 127.0.0.1:17123 0.0.0.0:* LISTEN 1419/p upstart.sh[1127]: tcp 0 0 127.0.0.1:8400 0.0.0.0:* LISTEN 1125/c upstart.sh[1127]: tcp 0 0 127.0.0.1:8500 0.0.0.0:* LISTEN 1125/c upstart.sh[1127]: tcp 0 0 0.0.0.0:53 0.0.0.0:* LISTEN 1215/d upstart.sh[1127]: tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1111/s upstart.sh[1127]: tcp 0 0 127.0.0.1:8600 0.0.0.0:* LISTEN 1125/c upstart.sh[1127]: tcp 0 0 127.0.0.1:8126 0.0.0.0:* LISTEN 1418/t upstart.sh[1127]: tcp6 0 0 :::8301 :::* LISTEN 1125/c upstart.sh[1127]: tcp6 0 0 :::53 :::* LISTEN 1215/d upstart.sh[1127]: tcp6 0 0 :::22 :::* LISTEN 1111/s upstart.sh[1127]: udp 0 0 0.0.0.0:53 0.0.0.0:* 1215/d upstart.sh[1127]: udp 0 0 0.0.0.0:68 0.0.0.0:* 973/dh upstart.sh[1127]: udp 0 0 10.32.104.144:123 0.0.0.0:* 1341/n upstart.sh[1127]: udp 0 0 127.0.0.1:123 0.0.0.0:* 1341/n upstart.sh[1127]: udp 0 0 0.0.0.0:123 0.0.0.0:* 1341/n upstart.sh[1127]: udp 0 0 127.0.0.1:8600 0.0.0.0:* 1125/c upstart.sh[1127]: udp6 0 0 :::54933 :::* 1441/j upstart.sh[1127]: udp6 0 0 127.0.0.1:8125 :::* 1420/p upstart.sh[1127]: udp6 0 0 :::53 :::* 1215/d upstart.sh[1127]: udp6 0 0 :::8301 :::* 1125/c upstart.sh[1127]: udp6 0 0 fe80::898:21ff:fec0:123 :::* 1341/n upstart.sh[1127]: udp6 0 0 ::1:123 :::* 1341/n upstart.sh[1127]: udp6 0 0 :::123 :::* 1341/n ``` I can also verify the network of the box itself is up, and working as programs like the consul-agent do in fact spawn, and connect to their clusters before kafka even gets invoked. For reference I'm using the built in `kafka-server-start.sh` script, and invoking it like so (IPs cut out): ``` KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname= kafka-i-0617a6aaa98f63c21.insops.net -Djava.net.preferIPv4Stack=true" JMX_PORT=9999 SCALA_VERSION=2.12.2 JAVA_HOME=/usr $KAFKA_INSTALL_PATH//bin/kafka-server-start.sh -daemon $KAFKA_INSTALL_PATH/config/server.properties --override zookeeper.connect="XX.XX.XX.XX:XX" --override broker.id="the-broker-test" --override listeners="SSL://$LOCAL_IPV4:9092" --override broker.rack="$AZ" ``` I'm not really sure where else to check for problems as it's only happening on some boots, and only logging the one line mentioned above. Thanks, -- *Eric Coan* *E: ec...@instructure.com <ec...@instructure.com>* *O:* *801.869.5000 <//801.869.5000>* <http://instructure.com/>