[ https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945085#comment-15945085 ]
Ismael Juma commented on KAFKA-4460: ------------------------------------ Hi, would it be possible for you to test a newer version (0.10.2 or trunk) to see if the issue still occurs? > Consumer stops getting messages when partition leader dies > ---------------------------------------------------------- > > Key: KAFKA-4460 > URL: https://issues.apache.org/jira/browse/KAFKA-4460 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.0.1 > Reporter: Bernhard Bonigl > Labels: reliability > > I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a > spring boot application with producers and a spring boot application with > consumers. > The topic has 5 partitions and a replication factor of 2, both brokers are in > sync, partitions have alternating leader (although it doesn't matter). > The spring boot kafka configuration is setup as follows: > {code} > kafka.address: localhost:9092,localhost:9093 > kafka.numberOfConsumers: 20 > {code} > Where Broker 0 uses port 9092 and Broker 1 uses port 9093. > ---- > When sending events they are consumed just fine. When Broker 0 is killed all > topics get Broker 1 as their leader, however the consumers stop consuming > events until Broker 0 is back. This happens nearly every time, but usually it > takes at most 3 attempts of alternatively killing the leading broker to > create the error state. > The console log is getting spammed by the coordinators, it looks like the > coordinator representing broker 0 is marked as dead, but instantly > rediscovered and used again many many times, and only at the end the other > broker is discovered. When the switch works the log is only minimally spammed > and the other broker is discovered very quickly. > [This gist | > https://gist.github.com/bonii-xx/2f1c122f643019a1525fbe120e9162d8] contains > the log of the application when the problem occurs. The first line is a log > of ours indicating a successfully consumed message. After that the Broker 0 > (localhost:9092) is killed - you can see the log spam I was talking about. At > the end localhost:9093 is discovered, however no further messages are > consumed. After that I killed the application. > ---- > I also discovered [this | > https://stackoverflow.com/questions/39650993/kafka-consumer-abstractcoordinator-discovered-coordinator-java-client] > unresolved stackoverflow question, which seems to be the same problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)