I’ve seen where setting network configurations within the OS can help mitigate 
some of the “Too many open files” issue as well.



Try changing the following items on the OS to try to have used network 
connections close as quickly as possible in order to keep file handle use down:





sysctl -w net.ipv4.tcp_fin_timeout=10


By default, this value is 60 seconds.  Reducing the value to 10 seconds will 
allow socket related file handles to be released sooner.



sysctl -w net.ipv4.tcp_synack_retries=3



By default, this value is 5.  Setting this to 3 decreases the amount of time 
that it will take for a failed passive TCP connection to timeout which releases 
resources sooner.




Additionally, be sure that your zookeeper account ulimit NOFILE are also set to 
a high enough value so that they are able to service requests for network 
connections at a comparable amount as the Kafka broker.  The above network 
parameters help zookeeper as well, so look into implementing them on your 
zookeeper nodes as well.



Finally, make sure the account that you run your producer and consumer 
processes also have appropriate ulimit setting for NOFILE and the nodes where 
they run use the network configurations above.







Thank you,



Jeff Groves





On 5/17/17, 1:11 AM, "Yang Cui" <y...@freewheel.tv> wrote:



    Hi Caleb,



      We already set the number of max open files to 100,000 before this error 
happened.



      Normally, the file description is about 20,000, but in some time, it 
suddenly jump to so many count.



      This is our monitor about Kafka FD info:



      2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877

      2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883

      2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892

      2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895

      2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898

      2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836

      2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842







    On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote:



        You need to up your OS open file limits, something like this should 
work:



        # /etc/security/limits.conf

        * - nofile 65536









        On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote:



        > Our Kafka cluster is broken down by  the problem 
“java.io.IOException: Too

        > many open files”  three times in 3 weeks.

        >

        > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.

        >

        > The error is like:

        >

        > java.io.IOException: Too many open files

        >         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

        >         at sun.nio.ch.ServerSocketChannelImpl.accept(

        > ServerSocketChannelImpl.java:422)

        >         at sun.nio.ch.ServerSocketChannelImpl.accept(

        > ServerSocketChannelImpl.java:250)

        >         at kafka.network.Acceptor.accept(SocketServer.scala:340)

        >         at kafka.network.Acceptor.run(SocketServer.scala:283)

        >         at java.lang.Thread.run(Thread.java:745)

        >

        > Is someone encounter the similar problem?

        >

        >

        >






Reply via email to