Hello,
I've recently been doing research into getting our Kafka cluster running
outside of Mesos (for a couple of reasons). However I'm noticing about 10%
of the time Kafka fails to start on boot (or more accurately starts, and
immediately exits). I find it weird since all brokers are using the exact
same configuration, on the same OS (Ubuntu 16.04)
There's nothing in my LOG4J directory, however I did find a singular log
line within $KAFKA_DIR/logs/kafkaServer.out that shed the actual light as
to why it's failing:
```
Error: Exception thrown by the agent : java.rmi.server.ExportException:
Port already in use: 9999; nested exception is:
java.net.BindException: Address already in use (Bind failed)
```
However, I can verify nothing is running on this port right before
invocation using netstat -tulpn which shows:
```
upstart.sh[1127]: Active Internet connections (only servers)
upstart.sh[1127]: Proto Recv-Q Send-Q Local Address Foreign
Address State PID/Pr
upstart.sh[1127]: tcp 0 0 127.0.0.1:17123 0.0.0.0:*
LISTEN 1419/p
upstart.sh[1127]: tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 1125/c
upstart.sh[1127]: tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 1125/c
upstart.sh[1127]: tcp 0 0 0.0.0.0:53 0.0.0.0:*
LISTEN 1215/d
upstart.sh[1127]: tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 1111/s
upstart.sh[1127]: tcp 0 0 127.0.0.1:8600 0.0.0.0:*
LISTEN 1125/c
upstart.sh[1127]: tcp 0 0 127.0.0.1:8126 0.0.0.0:*
LISTEN 1418/t
upstart.sh[1127]: tcp6 0 0 :::8301 :::*
LISTEN 1125/c
upstart.sh[1127]: tcp6 0 0 :::53 :::*
LISTEN 1215/d
upstart.sh[1127]: tcp6 0 0 :::22 :::*
LISTEN 1111/s
upstart.sh[1127]: udp 0 0 0.0.0.0:53 0.0.0.0:*
1215/d
upstart.sh[1127]: udp 0 0 0.0.0.0:68 0.0.0.0:*
973/dh
upstart.sh[1127]: udp 0 0 10.32.104.144:123 0.0.0.0:*
1341/n
upstart.sh[1127]: udp 0 0 127.0.0.1:123 0.0.0.0:*
1341/n
upstart.sh[1127]: udp 0 0 0.0.0.0:123 0.0.0.0:*
1341/n
upstart.sh[1127]: udp 0 0 127.0.0.1:8600 0.0.0.0:*
1125/c
upstart.sh[1127]: udp6 0 0 :::54933 :::*
1441/j
upstart.sh[1127]: udp6 0 0 127.0.0.1:8125 :::*
1420/p
upstart.sh[1127]: udp6 0 0 :::53 :::*
1215/d
upstart.sh[1127]: udp6 0 0 :::8301 :::*
1125/c
upstart.sh[1127]: udp6 0 0 fe80::898:21ff:fec0:123 :::*
1341/n
upstart.sh[1127]: udp6 0 0 ::1:123 :::*
1341/n
upstart.sh[1127]: udp6 0 0 :::123 :::*
1341/n
```
I can also verify the network of the box itself is up, and working as
programs like the consul-agent do in fact spawn, and connect to their
clusters before kafka even gets invoked.
For reference I'm using the built in `kafka-server-start.sh` script, and
invoking it like so (IPs cut out):
```
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=
kafka-i-0617a6aaa98f63c21.insops.net
-Djava.net.preferIPv4Stack=true" JMX_PORT=9999 SCALA_VERSION=2.12.2
JAVA_HOME=/usr
$KAFKA_INSTALL_PATH//bin/kafka-server-start.sh -daemon
$KAFKA_INSTALL_PATH/config/server.properties --override
zookeeper.connect="XX.XX.XX.XX:XX" --override
broker.id="the-broker-test" --override
listeners="SSL://$LOCAL_IPV4:9092" --override broker.rack="$AZ"
```
I'm not really sure where else to check for problems as it's only happening
on some boots, and only logging the one line mentioned above.
Thanks,
--
*Eric Coan*
*E: [email protected] <[email protected]>*
*O:* *801.869.5000 <//801.869.5000>*
<http://instructure.com/>