Hello,

I've recently been doing research into getting our Kafka cluster running
outside of Mesos (for a couple of reasons). However I'm noticing about 10%
of the time Kafka fails to start on boot (or more accurately starts, and
immediately exits). I find it weird since all brokers are using the exact
same configuration, on the same OS (Ubuntu 16.04)

There's nothing in my LOG4J directory, however I did find a singular log
line within $KAFKA_DIR/logs/kafkaServer.out that shed the actual light as
to why it's failing:

```
Error: Exception thrown by the agent : java.rmi.server.ExportException:
Port already in use: 9999; nested exception is:
        java.net.BindException: Address already in use (Bind failed)
```

However, I can verify nothing is running on this port right before
invocation using netstat -tulpn which shows:

```
 upstart.sh[1127]: Active Internet connections (only servers)
 upstart.sh[1127]: Proto Recv-Q Send-Q Local Address           Foreign
Address         State       PID/Pr
 upstart.sh[1127]: tcp        0      0 127.0.0.1:17123         0.0.0.0:*
           LISTEN      1419/p
 upstart.sh[1127]: tcp        0      0 127.0.0.1:8400          0.0.0.0:*
           LISTEN      1125/c
 upstart.sh[1127]: tcp        0      0 127.0.0.1:8500          0.0.0.0:*
           LISTEN      1125/c
 upstart.sh[1127]: tcp        0      0 0.0.0.0:53              0.0.0.0:*
           LISTEN      1215/d
 upstart.sh[1127]: tcp        0      0 0.0.0.0:22              0.0.0.0:*
           LISTEN      1111/s
 upstart.sh[1127]: tcp        0      0 127.0.0.1:8600          0.0.0.0:*
           LISTEN      1125/c
 upstart.sh[1127]: tcp        0      0 127.0.0.1:8126          0.0.0.0:*
           LISTEN      1418/t
 upstart.sh[1127]: tcp6       0      0 :::8301                 :::*
            LISTEN      1125/c
 upstart.sh[1127]: tcp6       0      0 :::53                   :::*
            LISTEN      1215/d
 upstart.sh[1127]: tcp6       0      0 :::22                   :::*
            LISTEN      1111/s
 upstart.sh[1127]: udp        0      0 0.0.0.0:53              0.0.0.0:*
                       1215/d
 upstart.sh[1127]: udp        0      0 0.0.0.0:68              0.0.0.0:*
                       973/dh
 upstart.sh[1127]: udp        0      0 10.32.104.144:123       0.0.0.0:*
                       1341/n
 upstart.sh[1127]: udp        0      0 127.0.0.1:123           0.0.0.0:*
                       1341/n
 upstart.sh[1127]: udp        0      0 0.0.0.0:123             0.0.0.0:*
                       1341/n
 upstart.sh[1127]: udp        0      0 127.0.0.1:8600          0.0.0.0:*
                       1125/c
 upstart.sh[1127]: udp6       0      0 :::54933                :::*
                        1441/j
 upstart.sh[1127]: udp6       0      0 127.0.0.1:8125          :::*
                        1420/p
 upstart.sh[1127]: udp6       0      0 :::53                   :::*
                        1215/d
 upstart.sh[1127]: udp6       0      0 :::8301                 :::*
                        1125/c
 upstart.sh[1127]: udp6       0      0 fe80::898:21ff:fec0:123 :::*
                        1341/n
 upstart.sh[1127]: udp6       0      0 ::1:123                 :::*
                        1341/n
 upstart.sh[1127]: udp6       0      0 :::123                  :::*
                        1341/n
```

I can also verify the network of the box itself is up, and working as
programs like the consul-agent do in fact spawn, and connect to their
clusters before kafka even gets invoked.

For reference I'm using the built in `kafka-server-start.sh` script, and
invoking it like so (IPs cut out):

```
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=
kafka-i-0617a6aaa98f63c21.insops.net
-Djava.net.preferIPv4Stack=true" JMX_PORT=9999 SCALA_VERSION=2.12.2
JAVA_HOME=/usr
$KAFKA_INSTALL_PATH//bin/kafka-server-start.sh -daemon
$KAFKA_INSTALL_PATH/config/server.properties --override
zookeeper.connect="XX.XX.XX.XX:XX" --override
broker.id="the-broker-test" --override
listeners="SSL://$LOCAL_IPV4:9092" --override broker.rack="$AZ"
```

I'm not really sure where else to check for problems as it's only happening
on some boots, and only logging the one line mentioned above.

Thanks,


-- 
*Eric Coan*
*E: ec...@instructure.com <ec...@instructure.com>*
*O:* *801.869.5000 <//801.869.5000>*
<http://instructure.com/>

Reply via email to