Andrej Urvantsev created KAFKA-7913:
---------------------------------------

             Summary: Kafka broker halts and messes up the whole cluster
                 Key: KAFKA-7913
                 URL: https://issues.apache.org/jira/browse/KAFKA-7913
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.1.0
         Environment: kafka_2.12-2.1.0, 
openjdk version "11.0.1" 2018-10-16 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.1+13-LTS),
CentOS Linux release 7.3.1611 (Core),
linux 3.10.0-514.26.2.el7.x86_64
            Reporter: Andrej Urvantsev


We upgraded cluster recently and running kafka 2.1.0 on java 11.

For a time being everything went ok, but then random brokers started to halt 
from time to time.

When it happens the broker still looks alive to other brokers, but it stops to 
receive network traffic. Other brokers then throw IOException:
{noformat}
java.io.IOException: Connection to 36155 was disconnected before the response 
was read
        at 
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
        at 
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
        at 
kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241)
        at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
        at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
        at scala.Option.foreach(Option.scala:257)
        at 
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
{noformat}
On the problematic broker all logging stops. No errors, no exceptions - nothing.

This also "breaks" all cluster - since clients and other brokers "think" that 
broker is still alive,

they are trying to connect to it and it seems that leader election leaves 
problematic brokers as a leader.

 

I would be glad to provide any further details if somebody could give an advice 
what to investigate when it happens next time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to