Us too. Read up on this one, released in 1.0.0, and pinned against an 0.11.0.2.
https://issues.apache.org/jira/browse/KAFKA-6042 There are some other tickets linked in that also. On Fri, Nov 3, 2017 at 5:40 PM, Madhukar Bharti <bhartimadhu...@gmail.com> wrote: > Hi, > > We also facing the same issue when we upgraded our cluste to 0.11.0.1. Can > anyone please help us out. It happened in production cluster. > > Thanks > Madhukar > > On Fri, 15 Sep 2017 at 9:07 PM, Lukas Lalinsky <lukas.lalin...@exponea.com > > > wrote: > > > Hello, > > > > I'm dealing with a strange issue in production and I'm running out of > > options what to do about it. > > > > It's a 3 node cluster running Kafka 0.11.0.1 with most topics having > > replication factor of 2. At some point, the broker that is about do > > die shrinks ISR for a few partitions just to itself: > > > > [2017-09-15 11:25:29,104] INFO Partition [...,12] on broker 3: > > Shrinking ISR from 3,2 to 3 (kafka.cluster.Partition) > > [2017-09-15 11:25:29,107] INFO Partition [...,8] on broker 3: > > Shrinking ISR from 3,1 to 3 (kafka.cluster.Partition) > > [2017-09-15 11:25:29,108] INFO Partition [...,38] on broker 3: > > Shrinking ISR from 3,2 to 3 (kafka.cluster.Partition) > > > > Then slightly after that, another broker writes errors like this to > > the log file: > > > > [2017-09-15 11:25:45,536] WARN [ReplicaFetcherThread-0-3]: Error in > > fetch to broker 3, request (type=FetchRequest, replicaId=2, > > maxWait=500, minBytes=1, maxBytes=10485760, fetchData={...}) > > (kafka.server.ReplicaFetcherThread) > > java.io.IOException: Connection to 3 was disconnected before the > > response was read > > at > > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive( > NetworkClientUtils.java:93) > > at > > kafka.server.ReplicaFetcherBlockingSend.sendRequest( > ReplicaFetcherBlockingSend.scala:93) > > at > > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207) > > at > > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42) > > at > > kafka.server.AbstractFetcherThread.processFetchRequest( > AbstractFetcherThread.scala:151) > > at > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala: > 112) > > at kafka.utils.ShutdownableThread.run( > ShutdownableThread.scala:64) > > > > There are many of such messages. At that point, I see the number of > > open file descriptors on the other broker growing. And eventually it > > crashes with thousands of messages like this: > > > > [2017-09-15 11:31:23,273] ERROR Error while accepting connection > > (kafka.network.Acceptor) > > java.io.IOException: Too many open files > > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > > at sun.nio.ch > > .ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > > at sun.nio.ch > > .ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > > at kafka.network.Acceptor.accept(SocketServer.scala:337) > > at kafka.network.Acceptor.run(SocketServer.scala:280) > > at java.lang.Thread.run(Thread.java:745) > > > > The file descriptor limit is set to 128k, the number of open file > > descriptors during normal operation is about 8k, so there is a lot of > > headroom. > > > > I'm not sure if it's the other brokers trying to replicate that kills > > it, or whether it's clients trying to publish messages. > > > > Has anyone seen a behavior like this? I'd appreciate any pointers. > > > > Thanks, > > > > Lukas > > > -- > Thanks and Regards, > Madhukar Bharti > Mob: 7845755539 > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017