Anthony Lazam created KAFKA-7876:
------------------------------------
Summary: Broker suddenly got disconnected
Key: KAFKA-7876
URL: https://issues.apache.org/jira/browse/KAFKA-7876
Project: Kafka
Issue Type: Bug
Components: controller, network
Affects Versions: 2.1.0
Reporter: Anthony Lazam
Attachments: kafka-issue.png
We have 3 node cluster setup. There are scenarios that one of the broker
suddenly got disconnected from the cluster but no underlying system issue is
found. The node that got dc'ed wasn't able to release the partition it holds as
the leader, hence clients (spring-boot) was unable to send/receive data from
the issued broker.
We noticed that it always happen to the active controller count.
Environment details:
Provider: AWS
Kernel: 3.10.0-693.21.1.el7.x86_64
OS: CentOS Linux release 7.5.1804 (Core)
Scala version: 2.11
Kafka version: 2.1.0
Kafka config:
{code:java}
############################# Socket Server Settings
#############################
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
############################# Log Basics #############################
num.partitions=1
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings
#############################
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
############################# Log Retention Policy #############################
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
############################# Group Coordinator Settings
#############################
group.initial.rebalance.delay.ms=0
############################# Zookeeper #############################
zookeeper.connection.timeout.ms=6000
broker.id=1
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
log.dirs=/data/kafka-node
advertised.listeners=PLAINTEXT://node1:9092
{code}
Broker disconnected controller log:
{code:java}
[2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger auto
leader balancing (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by broker
Map(TOPICS->MAP) (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
replica for broker 2 Map() (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
broker 2 is 0.0 (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
replica for broker 1 Map() (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
broker 1 is 0.0 (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
replica for broker 3 Map() (kafka.controller.KafkaController)
[2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
broker 3 is 0.0 (kafka.controller.KafkaController)
[2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger auto
leader balancing (kafka.controller.KafkaController)
{code}
Broker working server.log:
{code:java}
[2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=21379644)
to node 2: java.io.IOException: Connection to 2 was disconnected before the
response was read. (org.apache.kafka.clients.FetchSessionHandler)
[2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Error in response for fetch request (type=FetchRequest,
replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760,
fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356,
maxBytes=1048576, currentLeaderEpoch=Optional[2])},
isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio
nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 2 was disconnected before the response was
read
at
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
at
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
at
kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
at
kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
at
kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
at
kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:257)
at
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-01-26 05:02:35,723] WARN Attempting to send response via channel for
which there is no open connection, connection id node3:9092-node2:59988-1550
(kafka.network.Processor)
[2019-01-26 05:02:35,731] WARN Attempting to send response via channel for
which there is no open connection, connection id node3:9092-node2:59986-1550
(kafka.network.Processor)
[2019-01-26 05:02:35,797] WARN Attempting to send response via channel for
which there is no open connection, connection id node3:9092-node2:59494-1549
(kafka.network.Processor)
[2019-01-26 05:02:35,816] WARN Attempting to send response via channel for
which there is no open connection, connection id node3:9092-node2:53268-1530
(kafka.network.Processor)
[2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=INITIAL)
to node 2: java.io.IOException: Connection to 2 was disconnected before the
response was read. (org.apache.kafka.clients.FetchSessionHandler)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)