[
https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismael Juma updated KAFKA-4460:
-------------------------------
Labels: reliability (was: )
> Consumer stops getting messages when partition leader dies
> ----------------------------------------------------------
>
> Key: KAFKA-4460
> URL: https://issues.apache.org/jira/browse/KAFKA-4460
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.10.0.1
> Reporter: Bernhard Bonigl
> Labels: reliability
>
> I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a
> spring boot application with producers and a spring boot application with
> consumers.
> The topic has 5 partitions and a replication factor of 2, both brokers are in
> sync, partitions have alternating leader (although it doesn't matter).
> The spring boot kafka configuration is setup as follows:
> {code}
> kafka.address: localhost:9092,localhost:9093
> kafka.numberOfConsumers: 20
> {code}
> Where Broker 0 uses port 9092 and Broker 1 uses port 9093.
> ----
> When sending events they are consumed just fine. When Broker 0 is killed all
> topics get Broker 1 as their leader, however the consumers stop consuming
> events until Broker 0 is back. This happens nearly every time, but usually it
> takes at most 3 attempts of alternatively killing the leading broker to
> create the error state.
> The console log is getting spammed by the coordinators, it looks like the
> coordinator representing broker 0 is marked as dead, but instantly
> rediscovered and used again many many times, and only at the end the other
> broker is discovered. When the switch works the log is only minimally spammed
> and the other broker is discovered very quickly.
> This gist contains the log of the application when the problem occurs. The
> first line is a log of ours indicating a successfully consumed message. After
> that the Broker 0 (localhost:9092) is killed - you can see the log spam I was
> talking about. At the end localhost:9093 is discovered, however no further
> messages are consumed. After that I killed the application.
> ----
> I also discovered this unresolved stackoverflow question, which seems to be
> the same problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)