[jira] [Updated] (KAFKA-3916) Connection from controller to broker disconnects

Dave Powell (JIRA) Tue, 28 Jun 2016 14:19:32 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dave Powell updated KAFKA-3916:
-------------------------------
    Description: 
We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
day, the controllers in our clusters have their connection to all brokers 
disconnected, and then successfully reconnected a few hundred ms later. Each 
time this occurs we see a brief spike in our 99th percentile produce and 
consume times, reaching several hundred ms.

Here is an example of what we're seeing in the controller.log:
{{[2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
broker.160.hostname, 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 160 was disconnected before the response was 
read
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:236)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at 
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

... one each for all brokers (including the controller) ...

 [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
state change requests (kafka.controller.RequestSendThread)

… one each for all brokers (including the controller) ...}}

  was:
We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
day, the controllers in our clusters have their connection to all brokers 
disconnected, and then successfully reconnected a few hundred ms later. Each 
time this occurs we see a brief spike in our 99th percentile produce and 
consume times, reaching several hundred ms.

Here is an example of what we're seeing in the controller.log:
{{
[2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
broker.160.hostname, 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 160 was disconnected before the response was 
read
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:236)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at 
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

... one each for all brokers (including the controller) ...

 [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
state change requests (kafka.controller.RequestSendThread)

… one each for all brokers (including the controller) ...
}}


> Connection from controller to broker disconnects
> ------------------------------------------------
>
>                 Key: KAFKA-3916
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3916
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.1
>            Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {{[2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
>         at scala.Option.foreach(Option.scala:236)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
>         at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
>         at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
>         at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
>         at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3916) Connection from controller to broker disconnects

Reply via email to