Hi - has anyone made any progress with this issue? We are having the same problem with our Cassandra nodes in production. At some point a node (and sometimes all 3) will jump to 100% CPU usage and stay there for hours until restarted. Stack traces reveal several threads in a seemingly endless loop doing this:
"Thread-21770" - Thread t...@25278 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.size0(Native Method) at sun.nio.ch.FileChannelImpl.size(Unknown Source) - locked java.lang.obj...@7a2c843d at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) My understanding from reading the code is that this trace shows a thread belonging to the StreamingService which is writing an incoming stream to disk. There seems to be some kind of bizzare problem which is causing the FileChannel.size() function to spin with high CPU. Also, this problem is not easy to replicate - so I would appreciate any information on how the StreamingService works and what triggers it to transfer these file streams. Thanks, Joseph Mermelstein LivePerson http://solutions.liveperson.com > > > i all, > > We setup two nodes and simply set replication factor=2 for test run. > > After both nodes, say, node A and node B, serve several hours, we found that > "node A" always keep 300% cpu usage. > > > (the other node is under 100% cpu, which is normal) > > thread dump on "node A" shows that there are 3 busy threads related to > IncomingStreamReader: > > ========================== > > "Thread-66" prio=10 tid=0x00002aade4018800 nid=0x69e7 runnable > > > [0x000000004030a000] > java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.setMemory(Native Method) > at sun.nio.ch.Util.erase(Util.java:202) > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) > > > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > > "Thread-65" prio=10 tid=0x00002aade4017000 nid=0x69e6 runnable > [0x000000004d44b000] > java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.setMemory(Native Method) > at sun.nio.ch.Util.erase(Util.java:202) > > > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > "Thread-62" prio=10 tid=0x00002aade4014800 nid=0x4150 runnable > [0x000000004d34a000] > java.lang.Thread.State: RUNNABLE > > > at sun.nio.ch.FileChannelImpl.size0(Native Method) > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309) > - locked <0x00002aaac450dcd0> (a java.lang.Object) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597) > > > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > =========================== > > > Is there anyone experience similar issue ? > > environments: > > OS --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux > Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build > 14.2-b01, mixed mode) > > > Cassandra --- 0.6.0 > Node configuration --- node A and node B. both nodes use node A as Seed > client --- Java thrift clients pick one node randomly to do read and write. > > > -- > Ingram Chen > online share order: http://dinbendon.net > > > blog: http://www.javaworld.com.tw/roller/page/ingramchen > > >