ok, found this is likely due to GC... I'm seeing full GC that runs 20seconds without actually removing anything:
35240.526: [Full GC [PSYoungGen: 1760704K->1668729K(1848128K)] [PSOldGen: 4095999K->4095999K(4096000K)] 5856703K->5764729K(5944128K) [PSPermGen: 24885K->24871K(25152K)], 20.6780790 secs] [Times: user=20.60 sys=0.00, real=20.68 secs] I'm trying to do a MAT anaysis now On Wed, Sep 7, 2011 at 11:01 AM, yang yang <yangyang...@yahoo.com.cn> wrote: > my cassandra server (from github source head, plus I added some code that just > calls thrift.CassandraServer.batch_mutate() in the same JVM ) runs fine > under > heavy load for about 5 hours, then it froze, > > when I checked the jstack, all the mutation stages are blocked on > Table.switchlock.ReadLock().lock(), > and the corresponding writeLock is being held by Table.maybeSwitchMemlock(), > which is blocked on CommitLog.instance.getContext(), which is blocked on > get()'ting from a future. > > > "COMMIT-LOG-WRITER" prio=10 tid=0x00002aaab4a6c000 nid=0x66df waiting on > condition [0x00000000428d6000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.$$YJP$$park(Native Method) > - parking to wait for <0x00000007f72bb5f0> (a > java.util.concurrent.FutureTask$Sync) > at sun.misc.Unsafe.park(Unsafe.java) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) > > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.db.commitlog.CommitLog.getContext(CommitLog.java:388) > at > org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilyStore.java:661) > > at > org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:733) > at > org.apache.cassandra.db.commitlog.CommitLog.createNewSegment(CommitLog.java:575) > at > org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:82) > at > org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:598) > > at > org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.lang.Thread.run(Thread.java:662) > > > > I tried to search for the thread that does the getting from future in > commitLog, > but it's not present. anyone has an idea why the > CommitLog.instance.getContext() > is not returning? the getContext() Callable is not yet scheduled by the > CommitLog executor? > > > Thanks a lot > Yang