Hi all, Anyone experiencing this..?
I noticed one of my 7.6-2 nodes had inexplicable and consistently high cpu usage. Checking the log I found that there was a some kind of SSTable corruption that was stopping a bunch of files from compacting (first trace copied below). I then tried scrub (before anyone asks it had been run when upgrading, but I was just trying to find out if this fixed it) but that failed with "Too many open files" even though node should be using mem map (second trace copied below). Current plan is to snapshot, decommission & bootstrap but not what should be happening. Only factor I can think of that might have caused problem is we are currently running huge GCGraceSeconds because we can't run repair (we are hanging on for 8.1 as CASSANDRA-2280 causes repair to consume all disk space and fail) INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232 SSTable.java (line 147) Deleted /var/opt/cassandra/data/FightMyMonster/Users_CisIndex-f-5922 INFO [MutationStage:7989] 2011-06-16 22:49:57,852 ColumnFamilyStore.java (line 1065) Enqueuing flush of Memtable-UserMonsters@423965881(34471 bytes, 322112 operations) INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 Memtable.java (line 157) Writing Memtable-UserMonsters@423965881(34471 bytes, 322112 operations) INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 Memtable.java (line 172) Completed flushing /var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7614-Data.db (7250239 bytes) INFO [CompactionExecutor:1] 2011-06-16 22:50:01,090 CompactionManager.java (line 395) Compacting [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader( path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/ var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/ UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')] INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399 ColumnFamilyStore.java (line 1065) Enqueuing flush of Memtable-Monsters@562380513(497561 bytes, 13860 operations) INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 Memtable.java (line 157) Writing Memtable-Monsters@562380513(497561 bytes, 13860 operations) INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 Memtable.java (line 172) Completed flushing /var/opt/cassandra/data/FightMyMonster/Monsters-f-7160-Data.db (471100 bytes) ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.IOException: Corrupt (negative) value length encountered at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493) at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201) at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Corrupt (negative) value length encountered at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99) at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) ... 26 more INFO [CompactionExecutor:1] 2011-06-16 22:50:37,794 CompactionManager.java (line 395) Compacting [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path ='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7158-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightM yMonster/Monsters-f-7160-Data.db')] INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7554-Data.db') complete: 244 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db') INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db') complete: 493 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db') INFO [CompactionExecutor:1] 2011-06-17 00:46:26,225 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db') complete: 275 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1] 2011-06-17 00:46:26,226 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db') INFO [CompactionExecutor:1] 2011-06-17 00:46:28,383 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db') complete: 297 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1] 2011-06-17 00:46:28,384 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db') INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db') complete: 347 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db') ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.FileNotFoundException: /var/opt/cassandra/data/FightMyMonster/UserMonsters-tmp-f-7823-Data.db (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:113) at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:78) at org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:2243) at org.apache.cassandra.db.CompactionManager.maybeCreateWriter(CompactionManager.java:794) at org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:534) at org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56) at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Dominic