Greetings. Apologies for the verbose email, but I'm trying to provide as
much relevant detail as possible. I have an Amazon AWS server that is
running 1 instance of Zookeeper, and 1 instance of Kafka 0.9.0. As all
the AWS servers, it has an internal non routable IP address (172.X.X.X),
and an external NATed IP address (54.X.X.X). Zookeeper is binding to the
default interface. I'm using the Java system property
java.net.preferIPv4Stack=true so that Zookeeper and Kafka bind to the
IPv4 interface. When I run netstat, it looks like this:
tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN
I can successfully run the Kafka consumer/producer scripts from the same
AWS machine, against either localhost, 127.0.0.1, or the internal IP
address (partially masked here):
echo "test" | bin/kafka-console-producer.sh --broker-list localhost:9092
--topic test (works fine)
echo "test2" | bin/kafka-console-producer.sh --broker-list
127.0.0.1:9092 --topic test (works fine)
echo "test3" | bin/kafka-console-producer.sh --broker-list
172.X.X.X:9092 --topic test (IP address obfuscated here, but works fine).
I can read the messages:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test
--from-beginning
test
test2
test3
I have also configured Kafka to use the external host name in the server
config properties file (host name obfuscated intentionally for this email):
advertised.host.name=ec2-54-....compute.amazonaws.com
So far, so good. It's when I try to access Kafka remotely that I'm
running into problems. I have granted access to connect to all TCP ports
on the AWS machine from my VPN server. From a remote machine on the VPN,
I can connect to the ports for Zookeeper and Kafka. However, I can't
seem to access the queue. I've tried both the consumer and producer
scripts, as well as the "kafkacat" program. I get slightly different
error messages. This is what I see when using kafkacat running on my
laptop over the VPN against the external IP address of the AWS
Zookeeper/Kafka machine:
kafkacat -b 54.X.X.X:9092 -o beginning -t test
% Auto-selecting Consumer mode (use -P or -C to override)
%3|1449600975.259|FAIL|rdkafka#consumer-0|
ec2-54-....us-west-1.compute.amazonaws.com:9092/0: Failed to connect to
broker at ip-172-....us-west-1.compute.internal:9092: Operation timed out
%3|1449600975.259|ERROR|rdkafka#consumer-0|
ec2-54-....us-west-1.compute.amazonaws.com:9092/0: Failed to connect to
broker at ip-172-....us-west-1.compute.internal:9092: Operation timed out
The timeout happens after several minutes. What I find interesting is
that it prints the broker IP using the internal hostname for the AWS
machine. I guess that might make sense if the error message is coming
from the remote ZK instance? I can list the topics from the laptop on
the VPN:
kafka-topics.sh --zookeeper 54.X.X.X:2181 --list
test
When I'm running the producer script I get this:
echo hello | bin/kafka-console-producer.sh --topic test --broker-list
54.X.X.X:9092
[2015-12-08 11:09:12,362] ERROR Error when sending message to topic test
with key: null, value: 5 bytes with error: Failed to update metadata
after 60000 ms.
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
And here's with the consumer, trying to consume messages that are
already on the queue. Same thing, consumer running on laptop over the VPN:
bin/kafka-console-consumer.sh --topic test --zookeeper 54.X.X.X:2181
--from-beginning
[2015-12-08 11:13:21,205] WARN Fetching topic metadata with correlation
id 0 for topics [Set(test)] from broker
[BrokerEndPoint(0,ec2-54-....us-west-1.compute.amazonaws.com,9092)]
failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75)
at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:74)
at kafka.producer.SyncProducer.send(SyncProducer.scala:119)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94)
at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2015-12-08 11:13:21,209] WARN
[console-consumer-38323_ip-192-168-4-28.us-west-1.compute.internal-1449601968243-29980a1b-leader-finder-thread],
Failed to find leader for Set([test,0])
(kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
kafka.common.KafkaException: fetching topic metadata for topics
[Set(test)] from broker
[ArrayBuffer(BrokerEndPoint(0,ec2-54-....us-west-1.compute.amazonaws.com,9092))]
failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:73)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94)
at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75)
at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:74)
at kafka.producer.SyncProducer.send(SyncProducer.scala:119)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
... 3 more
The versions I'm running are:
Zookeeper: 3.4.5--1, built on 06/10/2013 17:26 GMT
Kafka: kafka_2.11-0.9.0.0
Java: Oracle JDK, version 1.8.0_66
I'm sure it's just a configuration issue. Any help resolving this is
greatly appreciated. Thanks,
/Henrik