Some more detail on this issue:

On a hunch I tried restarting my docker-compose stack a few more times. Still same problem, my application using the Kafka Client APIs claims it is talking to Kafka, but the Kafka logs disagree.

So I restarted the stack once more. With 'docker-compose up' this is a very clean start. Then I waited about 10 minutes until I saw

kafka-1_1    | [2017-09-13 15:00:47,214] INFO [Group Metadata Manager on Broker 1001]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) kafka-3_1    | [2017-09-13 15:00:47,595] INFO [Group Metadata Manager on Broker 1002]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) kafka-2_1    | [2017-09-13 15:00:47,771] INFO [Group Metadata Manager on Broker 1003]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

in the log. When I reran my application, suddenly the Kafka logs come alive with indications they are creating topics, et al.

I have been using Kafka for a couple months now, and this is very new behavior I have not seen until a week ago. I am used to being able to run my application immediately after the Kafka Stack comes up in Docker. Operationally now it seems I have to wait 10 minutes after starting Kafka.

Of course I am still dealing with the NotLeaderForPartitionException problem, which is also new, and breaks my application, but at least I seem to have a repeatable path to that problem.

Cheers, Eric


On 2017-09-12 2:43 PM, Eric Kolotyluk wrote:

The last few days I have been seeing a problem I do not know how to explain.

For months I have been successfully running Kafka/Zookeeper under docker, and my application seems to work fine. Lately, when I run Kafka under either docker-compose on my developer system, or 'docker stack deploy' on a Docker Swarm on AWS, here is what I am seeing:

According to the logs, Zookeeper/Kafka seem to start okay, and the 3 brokers I have configured seem to find each other. The logs look pretty normal. Then I start my application, and my application logs show that it has connected to the Kafka Cluster okay, it indicates that it has created the topics okay. However, there is nothing in the Kafka logs to show any kind of connection from my application, let along topics being created. Sure enough, when I rerun my application, it cannot find the topics, it tries to create them again, and gets a successful response from the Kafka Admin Client. Nope, they were not created.

When I shut down Kafka, the logs show the shutdown sequence for all the brokers and zookeeper. I cannot understand why the Kafka Client Library is not showing any errors when the Kafka logs are not showing any connection or operations.

I tried both Kafka 0.11.0.0 and 0.10.2.1 -- same problem.

Been trying to figure out this problem all morning, bashing my head against the wall.

*Then I go to lunch*, and a couple hours later I try one more time. Behold, suddenly I can see the Kafka logs reporting they have created the topics my application requested. But now I am stuck with the infamous org.apache.kafka.common.errors.NotLeaderForPartitionException problem again. This is another new problem that has started recently. Unfortunately I have wasted hours and hours fighting the first problem I have not been able to dig into this one.

What could possibly be the explanation for this not working, and then working again after a few hours?

It seems insanely difficult to operate a Kafka cluster in any kind of stable configuration that does not fail randomly.

Can anyone offer any kind of advice on what the problem might be?

It it better to just give up trying to operate our own Kafka cluster and use Kinesis instead?

Cheers, Eric


Reply via email to