In general, my understanding is that memory mapped files use a lot of open file handlers. We raise all our DBs to unlimited open files.
On Oct 29, 2013, at 8:30 AM, Pieter Callewaert <pieter.callewa...@be-mobile.be> wrote: > Investigated a bit more: > > - I can reproduce it, happened already on several nodes when I do some > stress testing (50000 select’s spread over multiple threads) > - Unexpected exception in the selector loop. Seems not related with > the Too many open files, it just happens. > - It’s not socket related. > - Using Oracle Java(TM) SE Runtime Environment (build 1.7.0_40-b43) > - Using multiple data directories (maybe related ?) > > I’m stuck at the moment, I don’t know If I should try DEBUG log because it > will be too much information? > > Kind regards, > Pieter Callewaert > > <image001.png> > Pieter Callewaert > Web & IT engineer > > Web: www.be-mobile.be > Email: pieter.callewa...@be-mobile.be > Tel: + 32 9 330 51 80 > > > From: Pieter Callewaert [mailto:pieter.callewa...@be-mobile.be] > Sent: dinsdag 29 oktober 2013 13:40 > To: user@cassandra.apache.org > Subject: Too many open files (Cassandra 2.0.1) > > Hi, > > I’ve noticed some nodes in our cluster are dying after some period of time. > > WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line > 76) Failed to accept a connection. > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) > at > org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > And other exceptions related to the same cause. > Now, as we use the Cassandra package, the nofile limit is raised to 100000. > To double check if this correct: > > root@de-cass09 ~ # cat /proc/18332/limits > Limit Soft Limit Hard Limit Units > … > Max open files 100000 100000 files > … > > Now I check how many files are open: > root@de-cass09 ~ # lsof -n -p 18332 | wc -l > 100038 > > This seems an awful a lot for size tiered compaction… ? > Now I noticed when I checked the list, a (deleted) file passed a lot > > … > java 18332 cassandra 4704r REG 8,1 10911921661 > 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) > java 18332 cassandra 4705r REG 8,1 10911921661 > 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) > … > > Actually, if I count specific for this file: > root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | > wc -l > 52707 > > Other nodes are around a total of 350 files open… Any idea why this nofiles > is so high ? > > The first exceptions I see is this: > WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.NullPointerException > at > sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178) > at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227) > at > sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164) > at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133) > at > java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209) > at > org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > Several minutes later I get Too many open files. > > Specs: > 12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages), > using JBOD of 2 disks. > JNA enabled. > > Any suggestions? > > Kind regards, > Pieter Callewaert > > <image001.png> > Pieter Callewaert > Web & IT engineer > > Web: www.be-mobile.be > Email: pieter.callewa...@be-mobile.be > Tel: + 32 9 330 51 80 >