I have the same problem. I didn't dig deeper but I saw this happen when I
launch kafka in daemon mode. I found the daemon mode is just launch kafka
with nohup. Not quite clear why this happen.


On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul <pl...@ebay.com> wrote:

> Yup. In fact, I just ran the test program again while the Kafak broker is
> still running, using the same user of course. I was able to get up to 10K
> connections with the test program. The test program uses the same java NIO
> library that the broker does. So the machine is capable of handling that
> many connections. The only issue I saw was that the NIO
> ServerSocketChannel is a bit slow at accepting connections when the total
> connection goes around 4K, but this could be due to the fact that I put
> the ServerSocketChannel in the same Selector as the 4K SocketChannels. So
> sometimes on the client side, I see:
>
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:93)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
>         at FdTest$ClientThread.run(FdTest.java:108)
>
>
> But all I have to do is sleep for a bit on the client, and then retry
> again. However, 4K does seem like a magic number, since that¹s seems to be
> the number that the Kafka broker machine can handle before it gives me the
> ³Too Many Open Files² error and eventually crashes.
>
> Paul Lung
>
> On 7/8/14, 9:29 PM, "Jun Rao" <jun...@gmail.com> wrote:
>
> >Does your test program run as the same user as Kafka broker?
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul <pl...@ebay.com> wrote:
> >
> >> Hi Guys,
> >>
> >> I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
> >>most
> >> often on the Controller machine. Then the controller process crashes,
> >>and
> >> the controller bounces to other machines, which causes those machines to
> >> crash. Looking at the file descriptors being held by the process, it¹s
> >>only
> >> around 4000 or so(looking at . There aren¹t a whole lot of connections
> >>in
> >> TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000
> >>­
> >> 64000² via "/proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java
> >> test program to see how many sockets and files I can open. The socket is
> >> definitely limited by the ephemeral port range, which was around 22K at
> >>the
> >> time. But I
> >> can open tons of files, since the open file limit of the user is set to
> >> 100K.
> >>
> >> So given that I can theoretically open 48K sockets and probably 90K
> >>files,
> >> and I only see around 4K total for the Kafka broker, I¹m really
> >>confused as
> >> to why I¹m seeing this error. Is there some internal Kafka limit that I
> >> don¹t know about?
> >>
> >> Paul Lung
> >>
> >>
> >>
> >> java.io.IOException: Too many open files
> >>
> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>
> >>         at
> >>
> >>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
> >>3)
> >>
> >>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>
> >>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>
> >>         at java.lang.Thread.run(Thread.java:679)
> >>
> >> [2014-07-08 13:07:21,534] ERROR Error in acceptor
> >>(kafka.network.Acceptor)
> >>
> >> java.io.IOException: Too many open files
> >>
> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>
> >>         at
> >>
> >>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
> >>3)
> >>
> >>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>
> >>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>
> >>         at java.lang.Thread.run(Thread.java:679)
> >>
> >> [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
> >> for partition [bom__021____active_80__32__mini____activeitem_lvs_qn,0]
> >>to
> >> broker 2124488:class kafka.common.NotLeaderForPartitionException
> >> (kafka.server.ReplicaFetcherThread)
> >>
> >> [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
> >>Error
> >> writing to highwatermark file:  (kafka.server.ReplicaManager)
> >>
> >> java.io.FileNotFoundException:
> >>
> >>/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-
> >>checkpoint.tmp
> >> (Too many open files)
> >>
> >>         at java.io.FileOutputStream.open(Native Method)
> >>
> >>         at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
> >>
> >>         at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
> >>
> >>         at java.io.FileWriter.<init>(FileWriter.java:90)
> >>
> >>         at
> >>kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> >>
> >>         at
> >>
> >>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
> >>licaManager.scala:447)
> >>
> >>         at
> >>
> >>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
> >>licaManager.scala:444)
> >>
> >>         at
> >>
> >>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trav
> >>ersableLike.scala:772)
> >>
> >>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> >>
> >>         at
> >>
> >>scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala
> >>:771)
> >>
> >>         at
> >>
> >>kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala
> >>:444)
> >>
> >>         at
> >>
> >>kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:
> >>94)
> >>
> >>         at
> >>kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
> >>
> >>         at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>
> >>         at
> >>
> >>java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351
> >>)
> >>
> >>         at
> >>java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >>
> >>         at
> >>
> >>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acce
> >>ss$201(ScheduledThreadPoolExecutor.java:165)
> >>
> >>         at
> >>
> >>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> >>ScheduledThreadPoolExecutor.java:267)
> >>
> >>         at
> >>
> >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
> >>:1110)
> >>
> >>         at
> >>
> >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
> >>a:603)
> >>
> >>         at java.lang.Thread.run(Thread.java:679)
> >>
> >>
> >>
> >>
>
>

Reply via email to