[ https://issues.apache.org/jira/browse/KAFKA-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mickael Maison resolved KAFKA-7349. ----------------------------------- Resolution: Won't Fix We are now removing ZooKeeper support so closing this issue. > Long Disk Writes cause Zookeeper Disconnects > -------------------------------------------- > > Key: KAFKA-7349 > URL: https://issues.apache.org/jira/browse/KAFKA-7349 > Project: Kafka > Issue Type: Bug > Components: zkclient > Affects Versions: 0.11.0.1 > Reporter: Adam Kafka > Priority: Minor > Attachments: SpikeInWriteTime.png > > > We run our Kafka cluster on a cloud service provider. As a consequence, we > notice a large tail latency write time that is out of our control. Some > writes take on the order of seconds. We have noticed that often these long > write times are correlated with subsequent Zookeeper disconnects from the > brokers. It appears that during the long write time, the Zookeeper heartbeat > thread does not get scheduled CPU time, resulting in a long gap of heartbeats > sent. After the write, the ZK thread does get scheduled CPU time, but it > detects that it has not received a heartbeat from Zookeeper in a while, so it > drops its connection then rejoins the cluster. > Note that the timeout reported is inconsistent with the timeout as set by the > configuration ({{zookeeper.session.timeout.ms}} = default of 6 seconds). We > have seen a range of values reported here, including 5950ms (less than > threshold), 12032ms (double the threshold), 25999ms (much larger than the > threshold). > We noticed that during a service degradation of the storage service of our > cloud provider, these Zookeeper disconnects increased drastically in > frequency. > We are hoping there is a way to decouple these components. Do you agree with > our diagnosis that the ZK disconnects are occurring due to thread contention > caused by long disk writes? Perhaps the ZK thread could be scheduled at a > higher priority? Do you have any suggestions for how to avoid the ZK > disconnects? > Here is an example of one of these events: > Logs on the Broker: > {code} > [2018-08-25 04:10:19,695] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:21,697] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:23,700] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:25,701] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:27,702] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:29,704] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:31,707] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:33,709] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:35,712] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:37,714] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:39,716] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:41,719] DEBUG Got ping response for sessionid: > 0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn) > ... > [2018-08-25 04:10:53,752] WARN Client session timed out, have not heard from > server in 12032ms for sessionid 0x36202ab4337002c > (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:53,754] INFO Client session timed out, have not heard from > server in 12032ms for sessionid 0x36202ab4337002c, closing socket connection > and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2018-08-25 04:10:53,920] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > [2018-08-25 04:10:53,920] INFO Waiting for keeper state SyncConnected > (org.I0Itec.zkclient.ZkClient) > ... > {code} > GC logs during the same time (demonstrating this is not just a long GC): > {code} > 2018-08-25T04:10:36.434+0000: 35150.779: [GC (Allocation Failure) > 3074119K->2529089K(6223360K), 0.0137342 secs] > 2018-08-25T04:10:37.367+0000: 35151.713: [GC (Allocation Failure) > 3074433K->2528524K(6223360K), 0.0127938 secs] > 2018-08-25T04:10:38.274+0000: 35152.620: [GC (Allocation Failure) > 3073868K->2528357K(6223360K), 0.0131040 secs] > 2018-08-25T04:10:39.220+0000: 35153.566: [GC (Allocation Failure) > 3073701K->2528885K(6223360K), 0.0133247 secs] > 2018-08-25T04:10:40.175+0000: 35154.520: [GC (Allocation Failure) > 3074229K->2528639K(6223360K), 0.0127870 secs] > 2018-08-25T04:10:41.084+0000: 35155.429: [GC (Allocation Failure) > 3073983K->2530769K(6223360K), 0.0135058 secs] > 2018-08-25T04:10:42.012+0000: 35156.358: [GC (Allocation Failure) > 3076113K->2531772K(6223360K), 0.0165919 secs] > 2018-08-25T04:10:53.737+0000: 35168.083: [GC (Allocation Failure) > 3077116K->2529630K(6223360K), 0.0135440 secs] > 2018-08-25T04:10:58.515+0000: 35172.860: [GC (Allocation Failure) > 3074974K->2531144K(6223360K), 0.0164601 secs] > 2018-08-25T04:11:03.149+0000: 35177.495: [GC (Allocation Failure) > 3076488K->2531751K(6223360K), 0.0146978 secs] > {code} > The attached screenshot shows a spike in write time to our data volume at the > same second. Note that this an average value, so it does not represent the > max write time, only a sample of write times. > !SpikeInWriteTime.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)