Rong Tang created KAFKA-6375:
--------------------------------

             Summary: Follower replicas can never catch up to be ISR due to 
creating ReplicaFetcherThread failed.
                 Key: KAFKA-6375
                 URL: https://issues.apache.org/jira/browse/KAFKA-6375
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.2.0
         Environment: Windows,  23 brokers KafkaCluster
            Reporter: Rong Tang


Hi, I met with a case that in one broker, the out of sync replicas never catch 
up.
When the broker starts up, it receives LeaderAndISR requests from controller, 
which will call createFetcherThread, the thread creation failed, with 
exceptions below.

And then, there is no fetcher for these follower replicas, and it is out of 
sync forever. Unless, later, it receives LeaderAndISR requests that has higher 
leader EPOCH. 

Restart the broker can mitigate the issue.

I have 2 questions.  
First, Why NEW ReplicaFetcherThread failed?
*Second, shouldn't Kafka do something to fail over, instead of letting the 
broker in abnormal state.*

It is a 23 brokers Kafka cluster running on Windows. each broker has 330 
replicas.

[2017-12-13 16:29:21,317] ERROR Error on broker 1000 while processing 
LeaderAndIsr request with correlationId 1 received from controller 427703487 
epoch 22 (state.change.logger)
org.apache.kafka.common.KafkaException: java.io.IOException: *Unable to 
establish loopback connection
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:124)
        at 
kafka.server.ReplicaFetcherThread.<init>(ReplicaFetcherThread.scala:87)
        at 
*kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
        at 
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
        at 
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at 
kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
        at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:869)
        at 
kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:689)
        at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:149)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:83)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to establish loopback connection
        at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:94)
        at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:171)
        at 
sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:50)
        at java.nio.channels.Pipe.open(Pipe.java:155)
        at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:127)
        at 
sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:44)
        at java.nio.channels.Selector.open(Selector.java:227)
        at org.apache.kafka.common.network.Selector.<init>(Selector.java:122)
        ... 16 more
Caused by: java.net.ConnectException: Connection timed out: connect
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:454)
        at sun.nio.ch.Net.connect(Net.java:446)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
        at 
sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:127)
        at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:76)
        ... 25 more




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to