I have run into a strange problem and was hoping for suggestions on how to fix it (0.7.0). When compaction occurs on one node for what appears to be one specific column family, the following error pops up the Cassandra log. Compaction apparently fails and temp files don't get cleaned up. After a while and what seems to be multiple failed compactions on the CF, the node runs out of disk space and crashes. Not sure if it is a related problem or a function of this being a heavily used column family but after failing to compact, compaction restarts on the same CF exacerbating the issue.
Problems with this specific node started earlier this weekend when it crashed with and OOM error. This is quite surprising since my memtable thresholds and GC settings have been tuned to run with quite a bit of overhead during normal operation (max heap usage usually <= 10 GB on a 12 GB heap, average usage of 6-8 GB). I could not find anything abnormal in the logs which would prompt an OOM. I will look things over tomorrow and try to provide a bit more information on the problem but as a solution, I was going to wipe out all SSTables for this CF on this node and then run a repair. Far from ideal, is this a reasonable solution? ERROR [CompactionExecutor:1] 2011-01-23 14:10:29,855 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,RMI Runtime] java.io.IOError: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdenti tyIterator.java:78) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTa bleScanner.java:178) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTa bleScanner.java:143) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:284) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter Iterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat or.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav a:323) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0 at org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java :52) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdenti tyIterator.java:69) ... 20 more Dan Hendry (403) 660-2297