Version 0.8.6. After an extreme load to 4 (embedded) cassandra servers with replication factor 3 ( Ubuntu 10.4, dual six core, 64 bit, no swap, 1 15000 rpm commitlog disk, 1 15000 rpm datafile disk, ) I get a fatal exception as listed below.
No more messages were found after that. Probably other things are going wrong like the message "410 Could not complete hinted handoff to /xxx.yyy.zzz.60", or the Dead/Up messages (occurring long before this exception). But maybe this exception can point me in the right direction or even point out some bug in Cassandra. Thanks, Ignace 2011-09-27 06:59:46,383 Compacting large row KsFullIdx/ForwardStringValues:3237343034 (178032211 bytes) incrementally 2011-09-27 07:00:09,738 GC for ParNew: 311 ms for 1 collections, 8139389704 used; max is 33344716800 2011-09-27 07:00:12,818 Compacting large row KsFullIdx/ForwardStringValues:31363437 (1281862723 bytes) incrementally 2011-09-27 07:02:16,025 Compacting large row KsFullIdx/ForwardStringValues:31363438 (1623095072 bytes) incrementally 2011-09-27 07:04:38,332 GC for ParNew: 534 ms for 1 collections, 7811259472 used; max is 33344716800 2011-09-27 07:04:52,803 Compacting large row KsFullIdx/ForwardStringValues:3238313433 (1435774436 bytes) incrementally 2011-09-27 07:06:57,160 Compacted to /media/datadrive1/capd.cassandra.capd/dbdatafile/KsFullIdx/ForwardString Values-tmp-g-542-Data.db. 43,244,902,670 to 42,780,624,408 (~98% of original) bytes for 1,260 keys. Time: 4,321,960ms. 2011-09-27 08:01:42,090 Saved KsFullIdx-ForwardStringValues-KeyCache (572 items) in 16 ms 2011-09-27 08:01:42,182 Saved KsFullIdx-ReverseStringValues-KeyCache (25688 items) in 63 ms 2011-09-27 08:18:13,078 InetAddress /xxx.yyy.zzz.62 is now dead. 2011-09-27 08:18:16,467 InetAddress /xxx.yyy.zzz.62 is now UP 2011-09-27 08:48:56,410 Could not complete hinted handoff to /xxx.yyy.zzz.60 2011-09-27 08:48:56,410 Enqueuing flush of Memtable-HintsColumnFamily@2083796703(12097/196566 serialized/live bytes, 254 ops) 2011-09-27 08:48:56,411 Writing Memtable-HintsColumnFamily@2083796703(12097/196566 serialized/live bytes, 254 ops) 2011-09-27 08:48:56,411 Nothing to compact in HintsColumnFamily; use forceUserDefinedCompaction if you wish to force compaction of single sstables (e.g. for tombstone collection) 2011-09-27 08:48:56,411 Finished hinted handoff of 254 rows to endpoint /xxx.yyy.zzz.60 2011-09-27 08:48:56,490 Completed flushing /media/datadrive1/capd.cassandra.capd/dbdatafile/system/HintsColumnFamil y-g-10-Data.db (25079 bytes) 2011-09-27 08:49:42,858 Started hinted handoff for endpoint /xxx.yyy.zzz.62 2011-09-27 12:01:42,100 Saved KsFullIdx-ForwardStringValues-KeyCache (712 items) in 27 ms 2011-09-27 12:01:42,182 Saved KsFullIdx-ReverseStringValues-KeyCache (30742 items) in 55 ms 2011-09-27 12:10:01,016 InetAddress /xxx.yyy.zzz.59 is now dead. 2011-09-27 12:10:02,272 InetAddress /xxx.yyy.zzz.59 is now UP 2011-09-27 12:17:34,596 Fatal exception in thread Thread[HintedHandoff:1,5,RMI Runtime] java.io.IOError: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSorted Map.java:265) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:28 1) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:23 6) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSki pListMap.java:1493) at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap. java:1443) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.ja va:445) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.ja va:428) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.ja va:418) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.ja va:380) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFe tcher.getNextBlock(IndexedSliceReader.java:179) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(In dexedSliceReader.java:121) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(In dexedSliceReader.java:49) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIter ator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java :135) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTa bleSliceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(Collating Iterator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(Collati ngIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(Collatin gIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator .java:69) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIter ator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java :135) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(Sl iceQueryFilter.java:116) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryF ilter.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamil yStore.java:1427) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySt ore.java:1304) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySt ore.java:1261) at org.apache.cassandra.db.HintedHandOffManager.sendRow(HintedHandOffManage r.java:155) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(Hint edHandOffManager.java:350) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffMan ager.java:89) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOff Manager.java:397) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto r.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja va:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.ja va:89) at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSorted Map.java:261) ... 33 more 2011-09-27 12:17:53,291 Started hinted handoff for endpoint /xxx.yyy.zzz.59