Have you tried changing the configured JMX port? After all, it's possible the conflict is between kafka and some other software running on the same server.
On 28 June 2017 at 21:06, Eric Coan <ec...@instructure.com> wrote: > Hello, > > > Unfortunately Kafka does indeed startup and run for a little bit before > crashing with the above exception, so doing one simple check wouldn't work. > I could theoretically keep this script running forever, and constantly > checking for it being up. However that's really a hacky solution, and I'd > prefer to not do that if I don't have too. > > On Wed, Jun 28, 2017 at 1:43 PM, M. Manna <manme...@gmail.com> wrote: > > > Can you not put a service wrapper for startup? It will attempt a restart > if > > the executable isn't up and running successfully. > > > > I am not familiar with Unix side, but in Windows you can use a powershell > > to utilise such thing. It's a better approach. > > > > Let me know what you think. > > > > On 28 Jun 2017 8:34 pm, "Eric Coan" <ec...@instructure.com> wrote: > > > > > I am using the same configuration for all brokers. However, each broker > > is > > > running on a completely separate host (I'm not running all three > brokers > > on > > > the same host). I can get all three running if I manually start kafka > > > again, however it's just occasionally on boot one fails to start with > > this > > > error. > > > > > > On Wed, Jun 28, 2017 at 1:25 PM, M. Manna <manme...@gmail.com> wrote: > > > > > > > Aren't u using the same JMX port 9999 for all brokers? I dont think > it > > > will > > > > work for more than 1 broker. > > > > > > > > > > > > > > > > On 28 Jun 2017 8:22 pm, "Eric Coan" <ec...@instructure.com> wrote: > > > > > > > > > Hey, > > > > > > > > > > No worries. I'm starting the brokers with a script yes (that ends > up > > > > > generating the command I pasted: > > > > > > > > > > ``` > > > > > > > > > > KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true > > > > > -Dcom.sun.management.jmxremote.authenticate=false > > > > > -Dcom.sun.management.jmxremote.ssl=false > > -Djava.rmi.server.hostname=$ > > > > FQDN > > > > > -Djava.net.preferIPv4Stack=true" JMX_PORT=9999 > SCALA_VERSION=2.12.2 > > > > > JAVA_HOME=/usr > > > > > $KAFKA_INSTALL_PATH//bin/kafka-server-start.sh -daemon > > > > > $KAFKA_INSTALL_PATH/config/server.properties --override > > > > > zookeeper.connect="XX.XX.XX.XX:XX" --override broker.id > ="$broker_id" > > > > > --override > > > > > listeners="SSL://$LOCAL_IPV4:9092" --override broker.rack="$AZ" > > > > > ``` > > > > > > > > > > The script beforehand populates the variables such as the FQDN, the > > > > broker > > > > > Id, Zookeeper IPs to connect to, Kafka Install Path, etc. The > > important > > > > > part of the command really is: > > > > > > > > > > ``` > > > > > KAFKA_JMX_OPTS="..." JMX_PORT=9999 SCALA_VERSION=2.12.2 > > JAVA_HOME=/usr > > > > > $KAFKA_INSTALL_PATH/bin/kafka-server-start.sh -daemon .. > > > > > ``` > > > > > > > > > > On Wed, Jun 28, 2017 at 1:08 PM, M. Manna <manme...@gmail.com> > > wrote: > > > > > > > > > > > Please forgive my autocorrect options :( > > > > > > > > > > > > On 28 Jun 2017 8:06 pm, "M. Manna" <manme...@gmail.com> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > OS is not an issue, I have a 3 broker setup and I have > experienced > > > this > > > > > > too. > > > > > > > > > > > > How are toy atarting the brokers? Is this a concurrent start or > > have > > > > you > > > > > > got some startup scriptto bring up all the brokers? > > > > > > > > > > > > KR, > > > > > > > > > > > > On 28 Jun 2017 6:47 pm, "Eric Coan" <ec...@instructure.com> > wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I've recently been doing research into getting our Kafka > cluster > > > > > running > > > > > > > outside of Mesos (for a couple of reasons). However I'm > noticing > > > > about > > > > > > 10% > > > > > > > of the time Kafka fails to start on boot (or more accurately > > > starts, > > > > > and > > > > > > > immediately exits). I find it weird since all brokers are using > > the > > > > > exact > > > > > > > same configuration, on the same OS (Ubuntu 16.04) > > > > > > > > > > > > > > There's nothing in my LOG4J directory, however I did find a > > > singular > > > > > log > > > > > > > line within $KAFKA_DIR/logs/kafkaServer.out that shed the > actual > > > > light > > > > > > as > > > > > > > to why it's failing: > > > > > > > > > > > > > > ``` > > > > > > > Error: Exception thrown by the agent : java.rmi.server. > > > > > ExportException: > > > > > > > Port already in use: 9999; nested exception is: > > > > > > > java.net.BindException: Address already in use (Bind > > > failed) > > > > > > > ``` > > > > > > > > > > > > > > However, I can verify nothing is running on this port right > > before > > > > > > > invocation using netstat -tulpn which shows: > > > > > > > > > > > > > > ``` > > > > > > > upstart.sh[1127]: Active Internet connections (only servers) > > > > > > > upstart.sh[1127]: Proto Recv-Q Send-Q Local Address > > > > Foreign > > > > > > > Address State PID/Pr > > > > > > > upstart.sh[1127]: tcp 0 0 127.0.0.1:17123 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1419/p > > > > > > > upstart.sh[1127]: tcp 0 0 127.0.0.1:8400 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1125/c > > > > > > > upstart.sh[1127]: tcp 0 0 127.0.0.1:8500 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1125/c > > > > > > > upstart.sh[1127]: tcp 0 0 0.0.0.0:53 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1215/d > > > > > > > upstart.sh[1127]: tcp 0 0 0.0.0.0:22 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1111/s > > > > > > > upstart.sh[1127]: tcp 0 0 127.0.0.1:8600 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1125/c > > > > > > > upstart.sh[1127]: tcp 0 0 127.0.0.1:8126 > > > > > 0.0.0.0:* > > > > > > > LISTEN 1418/t > > > > > > > upstart.sh[1127]: tcp6 0 0 :::8301 > > :::* > > > > > > > LISTEN 1125/c > > > > > > > upstart.sh[1127]: tcp6 0 0 :::53 > > :::* > > > > > > > LISTEN 1215/d > > > > > > > upstart.sh[1127]: tcp6 0 0 :::22 > > :::* > > > > > > > LISTEN 1111/s > > > > > > > upstart.sh[1127]: udp 0 0 0.0.0.0:53 > > > > > 0.0.0.0:* > > > > > > > 1215/d > > > > > > > upstart.sh[1127]: udp 0 0 0.0.0.0:68 > > > > > 0.0.0.0:* > > > > > > > 973/dh > > > > > > > upstart.sh[1127]: udp 0 0 10.32.104.144:123 > > > > > 0.0.0.0:* > > > > > > > 1341/n > > > > > > > upstart.sh[1127]: udp 0 0 127.0.0.1:123 > > > > > 0.0.0.0:* > > > > > > > 1341/n > > > > > > > upstart.sh[1127]: udp 0 0 0.0.0.0:123 > > > > > 0.0.0.0:* > > > > > > > 1341/n > > > > > > > upstart.sh[1127]: udp 0 0 127.0.0.1:8600 > > > > > 0.0.0.0:* > > > > > > > 1125/c > > > > > > > upstart.sh[1127]: udp6 0 0 :::54933 > > :::* > > > > > > > 1441/j > > > > > > > upstart.sh[1127]: udp6 0 0 127.0.0.1:8125 > > > :::* > > > > > > > 1420/p > > > > > > > upstart.sh[1127]: udp6 0 0 :::53 > > :::* > > > > > > > 1215/d > > > > > > > upstart.sh[1127]: udp6 0 0 :::8301 > > :::* > > > > > > > 1125/c > > > > > > > upstart.sh[1127]: udp6 0 0 fe80::898:21ff:fec0:123 > > :::* > > > > > > > 1341/n > > > > > > > upstart.sh[1127]: udp6 0 0 ::1:123 > > :::* > > > > > > > 1341/n > > > > > > > upstart.sh[1127]: udp6 0 0 :::123 > > :::* > > > > > > > 1341/n > > > > > > > ``` > > > > > > > > > > > > > > I can also verify the network of the box itself is up, and > > working > > > as > > > > > > > programs like the consul-agent do in fact spawn, and connect to > > > their > > > > > > > clusters before kafka even gets invoked. > > > > > > > > > > > > > > For reference I'm using the built in `kafka-server-start.sh` > > > script, > > > > > and > > > > > > > invoking it like so (IPs cut out): > > > > > > > > > > > > > > ``` > > > > > > > KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true > > > > > > > -Dcom.sun.management.jmxremote.authenticate=false > > > > > > > -Dcom.sun.management.jmxremote.ssl=false > > > -Djava.rmi.server.hostname= > > > > > > > kafka-i-0617a6aaa98f63c21.insops.net > > > > > > > -Djava.net.preferIPv4Stack=true" JMX_PORT=9999 > > > SCALA_VERSION=2.12.2 > > > > > > > JAVA_HOME=/usr > > > > > > > $KAFKA_INSTALL_PATH//bin/kafka-server-start.sh -daemon > > > > > > > $KAFKA_INSTALL_PATH/config/server.properties --override > > > > > > > zookeeper.connect="XX.XX.XX.XX:XX" --override > > > > > > > broker.id="the-broker-test" --override > > > > > > > listeners="SSL://$LOCAL_IPV4:9092" --override > broker.rack="$AZ" > > > > > > > ``` > > > > > > > > > > > > > > I'm not really sure where else to check for problems as it's > only > > > > > > happening > > > > > > > on some boots, and only logging the one line mentioned above. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > *Eric Coan* > > > > > > > *E: ec...@instructure.com <ec...@instructure.com>* > > > > > > > *O:* *801.869.5000 <//801.869.5000>* > > > > > > > <http://instructure.com/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Eric Coan* > > > > > *E: ec...@instructure.com <ec...@instructure.com>* > > > > > *O:* *801.869.5000 <//801.869.5000>* > > > > > <http://instructure.com/> > > > > > > > > > > > > > > > > > > > > > -- > > > *Eric Coan* > > > *E: ec...@instructure.com <ec...@instructure.com>* > > > *O:* *801.869.5000 <//801.869.5000>* > > > <http://instructure.com/> > > > > > > > > > -- > *Eric Coan* > *E: ec...@instructure.com <ec...@instructure.com>* > *O:* *801.869.5000 <//801.869.5000>* > <http://instructure.com/> >