Kafka 0.9.0.1 ISR Shrink and Expand issue

Meghana Narasimhan Mon, 16 Jan 2017 11:15:04 -0800

Hi,
Our cluster set up is as follows, with Version 0.9.0.1 (confluent package)
we have 5 source clusters and 1 destination cluster. 5 instances of
mirrormaker run on the destination cluster
(one instance per source cluster, and each mirrormaker runs with
num.streams=8)


We have been seeing an issue for a the last few days when the partitions
shrink and expand on one node,
which causes mirrormaker to shutdown (known Memory record not writable
issue or NOT_ENOUGH_REPLICAS_AFTER_APPEND issue)

Some of the logs are as follows

*Node A is the controller*
*I. controller log has *
DEBUG [IsrChangeNotificationListener] Fired!!!
(kafka.controller.IsrChangeNotificationListener)
DEBUG Sending MetadataRequest to Brokers:ArrayBuffer(2, 1, 0) for
TopicAndPartitions:Set([test,115], [test,16], [test,52])
(kafka.controller.IsrChangeNotificationListener)
WARN [Controller-0-to-broker-2-send-thread], Controller 0 epoch 84 fails to
send request {controller_id=0,controller_epoch=84,partition_states=[...]
java.io.IOException: Connection to 2 was disconnected before the response
was read
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
WARN [Controller-0-to-broker-1-send-thread], Controller 0 epoch 84 fails to
send request[...]
java.io.IOException: Connection to 1 was disconnected before the response
was read
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
WARN [Controller-0-to-broker-0-send-thread], Controller 0 epoch 84 fails to
send request[...]
java.io.IOException: Connection to 0 was disconnected before the response
was read
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
DEBUG [IsrChangeNotificationListener] Fired!!!
(kafka.controller.IsrChangeNotificationListener)
INFO [Controller-0-to-broker-2-send-thread], Controller 0 connected to
Node(2, x.x.x.x, 9092) for sending state change requests
(kafka.controller.RequestSendThread)
INFO [Controller-0-to-broker-1-send-thread], Controller 0 connected to
Node(1, x.x.x.x, 9092) for sending state change requests
(kafka.controller.RequestSendThread)
INFO [Controller-0-to-broker-0-send-thread], Controller 0 connected to
Node(0, x.x.x.x, 9092) for sending state change requests
(kafka.controller.RequestSendThread)
DEBUG [IsrChangeNotificationListener] Fired!!!
(kafka.controller.IsrChangeNotificationListener)

*II. At the same time state-change log has the following (its pretty much
always for the same set of topics and partitions)*
TRACE Controller 0 epoch 84 sending UpdateMetadata request
(Leader:-2,ISR:0,1,2,LeaderEpoch:0,ControllerEpoch:84) to broker 2 for
partition  ...
TRACE Controller 0 epoch 84 sending UpdateMetadata request
(Leader:-2,ISR:0,1,2,LeaderEpoch:0,ControllerEpoch:84) to broker 2 for
partition ...
TRACE Controller 0 epoch 84 sending UpdateMetadata request
(Leader:-2,ISR:2,0,1,LeaderEpoch:0,ControllerEpoch:84) to broker 2 for
partition ...
TRACE Controller 0 epoch 84 sending UpdateMetadata request
(Leader:1,ISR:1,LeaderEpoch:44,ControllerEpoch:84) to broker 2 for
partition ...
TRACE Controller 0 epoch 84 sending UpdateMetadata request
(Leader:1,ISR:1,LeaderEpoch:44,ControllerEpoch:84) to broker 2 for
partition ...

*III.  on Node B, the server log contains a bunch of
NotEnoughReplicasException exceptions and Shrinks and Expands for topic
partitions *

INFO Partition [test,16] on broker 1: Shrinking ISR for partition [test,16]
from 1,2,0 to 1 (kafka.cluster.Partition)
INFO Partition [test,52] on broker 1: Shrinking ISR for partition [test,52]
from 1,2,0 to 1 (kafka.cluster.Partition)
....
...
...
ERROR [Replica Manager on Broker 1]: Error processing append operation on
partition [test,67] (kafka.server.ReplicaManager)
kafka.common.NotEnoughReplicasException: Number of insync replicas for
partition [test,67] is [1], below required minimum [2]

...
...
...
INFO Partition [test,37] on broker 1: Expanding ISR for partition [test,37]
from 1 to 1,2 (kafka.cluster.Partition)
INFO Partition [test,109] on broker 1: Expanding ISR for partition
[test,109] from 1 to 1,2 (kafka.cluster.Partition)
INFO Partition [test,37] on broker 1: Expanding ISR for partition [test,37]
from 1,2 to 1,2,0 (kafka.cluster.Partition)

As far as we have checked we don't see any network or storage related
issues.
Has anyone seen similar issues ? Any inputs will be of great help.

Thanks,
Meghana

Kafka 0.9.0.1 ISR Shrink and Expand issue

Reply via email to