I’ve seen where setting network configurations within the OS can help mitigate some of the “Too many open files” issue as well.
Try changing the following items on the OS to try to have used network connections close as quickly as possible in order to keep file handle use down: sysctl -w net.ipv4.tcp_fin_timeout=10 By default, this value is 60 seconds. Reducing the value to 10 seconds will allow socket related file handles to be released sooner. sysctl -w net.ipv4.tcp_synack_retries=3 By default, this value is 5. Setting this to 3 decreases the amount of time that it will take for a failed passive TCP connection to timeout which releases resources sooner. Additionally, be sure that your zookeeper account ulimit NOFILE are also set to a high enough value so that they are able to service requests for network connections at a comparable amount as the Kafka broker. The above network parameters help zookeeper as well, so look into implementing them on your zookeeper nodes as well. Finally, make sure the account that you run your producer and consumer processes also have appropriate ulimit setting for NOFILE and the nodes where they run use the network configurations above. Thank you, Jeff Groves On 5/17/17, 1:11 AM, "Yang Cui" <y...@freewheel.tv> wrote: Hi Caleb, We already set the number of max open files to 100,000 before this error happened. Normally, the file description is about 20,000, but in some time, it suddenly jump to so many count. This is our monitor about Kafka FD info: 2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877 2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883 2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892 2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895 2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898 2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836 2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842 On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote: You need to up your OS open file limits, something like this should work: # /etc/security/limits.conf * - nofile 65536 On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote: > Our Kafka cluster is broken down by the problem “java.io.IOException: Too > many open files” three times in 3 weeks. > > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. > > The error is like: > > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:422) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:340) > at kafka.network.Acceptor.run(SocketServer.scala:283) > at java.lang.Thread.run(Thread.java:745) > > Is someone encounter the similar problem? > > >