Hi Jun,

That was the problem. It was actually the Ubuntu upstart job over writing
the limit. Thank you very much for your help.

Paul Lung

On 7/9/14, 1:58 PM, "Jun Rao" <jun...@gmail.com> wrote:

>Is it possible your container wrapper somehow overrides the file handler
>limit?
>
>Thanks,
>
>Jun
>
>
>On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul <pl...@ebay.com> wrote:
>
>> Yup. In fact, I just ran the test program again while the Kafak broker
>>is
>> still running, using the same user of course. I was able to get up to
>>10K
>> connections with the test program. The test program uses the same java
>>NIO
>> library that the broker does. So the machine is capable of handling that
>> many connections. The only issue I saw was that the NIO
>> ServerSocketChannel is a bit slow at accepting connections when the
>>total
>> connection goes around 4K, but this could be due to the fact that I put
>> the ServerSocketChannel in the same Selector as the 4K SocketChannels.
>>So
>> sometimes on the client side, I see:
>>
>> java.io.IOException: Connection reset by peer
>>         at sun.nio.ch.FileDispatcher.write0(Native Method)
>>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
>>         at sun.nio.ch.IOUtil.write(IOUtil.java:93)
>>         at 
>>sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
>>         at FdTest$ClientThread.run(FdTest.java:108)
>>
>>
>> But all I have to do is sleep for a bit on the client, and then retry
>> again. However, 4K does seem like a magic number, since that¹s seems to
>>be
>> the number that the Kafka broker machine can handle before it gives me
>>the
>> ³Too Many Open Files² error and eventually crashes.
>>
>> Paul Lung
>>
>> On 7/8/14, 9:29 PM, "Jun Rao" <jun...@gmail.com> wrote:
>>
>> >Does your test program run as the same user as Kafka broker?
>> >
>> >Thanks,
>> >
>> >Jun
>> >
>> >
>> >On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul <pl...@ebay.com> wrote:
>> >
>> >> Hi Guys,
>> >>
>> >> I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
>> >>most
>> >> often on the Controller machine. Then the controller process crashes,
>> >>and
>> >> the controller bounces to other machines, which causes those
>>machines to
>> >> crash. Looking at the file descriptors being held by the process,
>>it¹s
>> >>only
>> >> around 4000 or so(looking at . There aren¹t a whole lot of
>>connections
>> >>in
>> >> TIME_WAIT states, and I¹ve increased the ephemeral port range to
>>³16000
>> >>­
>> >> 64000² via "/proc/sys/net/ipv4/ip_local_port_range². I¹ve written a
>>Java
>> >> test program to see how many sockets and files I can open. The
>>socket is
>> >> definitely limited by the ephemeral port range, which was around 22K
>>at
>> >>the
>> >> time. But I
>> >> can open tons of files, since the open file limit of the user is set
>>to
>> >> 100K.
>> >>
>> >> So given that I can theoretically open 48K sockets and probably 90K
>> >>files,
>> >> and I only see around 4K total for the Kafka broker, I¹m really
>> >>confused as
>> >> to why I¹m seeing this error. Is there some internal Kafka limit
>>that I
>> >> don¹t know about?
>> >>
>> >> Paul Lung
>> >>
>> >>
>> >>
>> >> java.io.IOException: Too many open files
>> >>
>> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>> >>
>> >>         at
>> >>
>> 
>>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
>>>>16
>> >>3)
>> >>
>> >>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>> >>
>> >>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>> >>
>> >>         at java.lang.Thread.run(Thread.java:679)
>> >>
>> >> [2014-07-08 13:07:21,534] ERROR Error in acceptor
>> >>(kafka.network.Acceptor)
>> >>
>> >> java.io.IOException: Too many open files
>> >>
>> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>> >>
>> >>         at
>> >>
>> 
>>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
>>>>16
>> >>3)
>> >>
>> >>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>> >>
>> >>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>> >>
>> >>         at java.lang.Thread.run(Thread.java:679)
>> >>
>> >> [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488],
>>Error
>> >> for partition
>>[bom__021____active_80__32__mini____activeitem_lvs_qn,0]
>> >>to
>> >> broker 2124488:class kafka.common.NotLeaderForPartitionException
>> >> (kafka.server.ReplicaFetcherThread)
>> >>
>> >> [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
>> >>Error
>> >> writing to highwatermark file:  (kafka.server.ReplicaManager)
>> >>
>> >> java.io.FileNotFoundException:
>> >>
>> 
>>>>/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offse
>>>>t-
>> >>checkpoint.tmp
>> >> (Too many open files)
>> >>
>> >>         at java.io.FileOutputStream.open(Native Method)
>> >>
>> >>         at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
>> >>
>> >>         at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
>> >>
>> >>         at java.io.FileWriter.<init>(FileWriter.java:90)
>> >>
>> >>         at
>> >>kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
>> >>
>> >>         at
>> >>
>> 
>>>>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R
>>>>ep
>> >>licaManager.scala:447)
>> >>
>> >>         at
>> >>
>> 
>>>>kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R
>>>>ep
>> >>licaManager.scala:444)
>> >>
>> >>         at
>> >>
>> 
>>>>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Tr
>>>>av
>> >>ersableLike.scala:772)
>> >>
>> >>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>> >>
>> >>         at
>> >>
>> 
>>>>scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.sca
>>>>la
>> >>:771)
>> >>
>> >>         at
>> >>
>> 
>>>>kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.sca
>>>>la
>> >>:444)
>> >>
>> >>         at
>> >>
>> 
>>>>kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scal
>>>>a:
>> >>94)
>> >>
>> >>         at
>> >>kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
>> >>
>> >>         at
>> >> 
>>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>
>> >>         at
>> >>
>> 
>>>>java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:3
>>>>51
>> >>)
>> >>
>> >>         at
>> >>java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>> >>
>> >>         at
>> >>
>> 
>>>>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ac
>>>>ce
>> >>ss$201(ScheduledThreadPoolExecutor.java:165)
>> >>
>> >>         at
>> >>
>> 
>>>>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ru
>>>>n(
>> >>ScheduledThreadPoolExecutor.java:267)
>> >>
>> >>         at
>> >>
>> 
>>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>>>va
>> >>:1110)
>> >>
>> >>         at
>> >>
>> 
>>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>>>av
>> >>a:603)
>> >>
>> >>         at java.lang.Thread.run(Thread.java:679)
>> >>
>> >>
>> >>
>> >>
>>
>>

Reply via email to