Bernhard Bonigl created KAFKA-4460:
--------------------------------------

             Summary: Consumer stops getting messages when partition leader dies
                 Key: KAFKA-4460
                 URL: https://issues.apache.org/jira/browse/KAFKA-4460
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.10.0.1
            Reporter: Bernhard Bonigl


I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a 
spring boot application with producers and a spring boot application with 
consumers.

The topic has 5 partitions and a replication factor of 2, both brokers are in 
sync, partitions have alternating leader (although it doesn't matter).

The spring boot kafka configuration is setup as follows:
{code}
kafka.address: localhost:9092,localhost:9093
kafka.numberOfConsumers: 20
{code}
Where Broker 0 uses port 9092 and Broker 1 uses port 9093.

----

When sending events they are consumed just fine. When Broker 0 is killed all 
topics get Broker 1 as their leader, however the consumers stop consuming 
events until Broker 0 is back. This happens nearly every time, but usually it 
takes at most 3 attempts of alternatively killing the leading broker to create 
the error state.

The console log is getting spammed by the coordinators, it looks like the 
coordinator representing broker 0 is marked as dead, but instantly rediscovered 
and used again many many times, and only at the end the other broker is 
discovered. When the switch works the log is only minimally spammed and the 
other broker is discovered very quickly.

This gist contains the log of the application when the problem occurs. The 
first line is a log of ours indicating a successfully consumed message. After 
that the Broker 0 (localhost:9092) is killed - you can see the log spam I was 
talking about. At the end localhost:9093 is discovered, however no further 
messages are consumed. After that I killed the application.

----

I also discovered this unresolved stackoverflow question, which seems to be the 
same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to