Hi, I've noticed some nodes in our cluster are dying after some period of time.
WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line
76) Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
at
org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
And other exceptions related to the same cause.
Now, as we use the Cassandra package, the nofile limit is raised to 100000.
To double check if this correct:
root@de-cass09 ~ # cat /proc/18332/limits
Limit Soft Limit Hard Limit Units
...
Max open files 100000 100000 files
...
Now I check how many files are open:
root@de-cass09 ~ # lsof -n -p 18332 | wc -l
100038
This seems an awful a lot for size tiered compaction... ?
Now I noticed when I checked the list, a (deleted) file passed a lot
...
java 18332 cassandra 4704r REG 8,1 10911921661 2147483839
/data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted)
java 18332 cassandra 4705r REG 8,1 10911921661 2147483839
/data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted)
...
Actually, if I count specific for this file:
root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | wc
-l
52707
Other nodes are around a total of 350 files open... Any idea why this nofiles
is so high ?
The first exceptions I see is this:
WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76)
Unexpected exception in the selector loop.
java.lang.NullPointerException
at
sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178)
at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227)
at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164)
at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133)
at
java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209)
at
org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Several minutes later I get Too many open files.
Specs:
12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages),
using JBOD of 2 disks.
JNA enabled.
Any suggestions?
Kind regards,
Pieter Callewaert
[Description: cid:[email protected]]
Pieter Callewaert
Web & IT engineer
Web: www.be-mobile.be<http://www.be-mobile.be/>
Email: [email protected]<mailto:[email protected]>
Tel: + 32 9 330 51 80
<<inline: image001.png>>
