Us too. Read up on this one,  released in 1.0.0, and pinned against an
0.11.0.2.

https://issues.apache.org/jira/browse/KAFKA-6042

There are some other tickets linked in that also.

On Fri, Nov 3, 2017 at 5:40 PM, Madhukar Bharti <bhartimadhu...@gmail.com>
wrote:

> Hi,
>
> We also facing the same issue when we upgraded our cluste to 0.11.0.1. Can
> anyone please help us out. It happened in production cluster.
>
> Thanks
> Madhukar
>
> On Fri, 15 Sep 2017 at 9:07 PM, Lukas Lalinsky <lukas.lalin...@exponea.com
> >
> wrote:
>
> > Hello,
> >
> > I'm dealing with a strange issue in production and I'm running out of
> > options what to do about it.
> >
> > It's a 3 node cluster running Kafka 0.11.0.1 with most topics having
> > replication factor of 2. At some point, the broker that is about do
> > die shrinks ISR for a few partitions just to itself:
> >
> > [2017-09-15 11:25:29,104] INFO Partition [...,12] on broker 3:
> > Shrinking ISR from 3,2 to 3 (kafka.cluster.Partition)
> > [2017-09-15 11:25:29,107] INFO Partition [...,8] on broker 3:
> > Shrinking ISR from 3,1 to 3 (kafka.cluster.Partition)
> > [2017-09-15 11:25:29,108] INFO Partition [...,38] on broker 3:
> > Shrinking ISR from 3,2 to 3 (kafka.cluster.Partition)
> >
> > Then slightly after that, another broker writes errors like this to
> > the log file:
> >
> > [2017-09-15 11:25:45,536] WARN [ReplicaFetcherThread-0-3]: Error in
> > fetch to broker 3, request (type=FetchRequest, replicaId=2,
> > maxWait=500, minBytes=1, maxBytes=10485760, fetchData={...})
> > (kafka.server.ReplicaFetcherThread)
> > java.io.IOException: Connection to 3 was disconnected before the
> > response was read
> >         at
> > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(
> NetworkClientUtils.java:93)
> >         at
> > kafka.server.ReplicaFetcherBlockingSend.sendRequest(
> ReplicaFetcherBlockingSend.scala:93)
> >         at
> > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> >         at
> > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> >         at
> > kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:151)
> >         at
> > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> 112)
> >         at kafka.utils.ShutdownableThread.run(
> ShutdownableThread.scala:64)
> >
> > There are many of such messages. At that point, I see the number of
> > open file descriptors on the other broker growing. And eventually it
> > crashes with thousands of messages like this:
> >
> > [2017-09-15 11:31:23,273] ERROR Error while accepting connection
> > (kafka.network.Acceptor)
> > java.io.IOException: Too many open files
> >         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >         at sun.nio.ch
> > .ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> >         at sun.nio.ch
> > .ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> >         at kafka.network.Acceptor.accept(SocketServer.scala:337)
> >         at kafka.network.Acceptor.run(SocketServer.scala:280)
> >         at java.lang.Thread.run(Thread.java:745)
> >
> > The file descriptor limit is set to 128k, the number of open file
> > descriptors during normal operation is about 8k, so there is a lot of
> > headroom.
> >
> > I'm not sure if it's the other brokers trying to replicate that kills
> > it, or whether it's clients trying to publish messages.
> >
> > Has anyone seen a behavior like this? I'd appreciate any pointers.
> >
> > Thanks,
> >
> > Lukas
> >
> --
> Thanks and Regards,
> Madhukar Bharti
> Mob: 7845755539
>



-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia

Mobile: +61 (0) 418 826 017

Reply via email to