I am having yet another issue on one of my Cassandra nodes. Last night, one of my nodes ran out of memory and crashed after flooding the logs with the same type of errors I am seeing below. After restarting, they are popping up again. My solution has been to drop the consistency from ALL to ONE for the query which seems to be causing this problem so my service using Cassandra starts working again but its a terrible solution at best. Is there any thought as to what the root cause of this issue is or thoughts on how to fix it?
These errors seem to be popping up when reading from the same column family which I have been having other problems with. Recently, I drained the node and shutdown, deleted all on disk files for this column family, then ran a repair (which caused the node to run out of disk space as I have detailed in a previous email). Could the repair somehow have corrupted the nodes new data? Why is this error only appearing on one node given the data now is nearly guaranteed to have come from another replica? I have triple checked and all nodes are running the same, release version of 0.7 Are there any suggestions for tools to check over a systems hardware (Ubuntu 10.04)? SMART info for the disk shows nothing alarming and there is nothing in /var/log/messages. ERROR [ReadStage:10] 2011-01-26 12:15:59,607 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:124) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:47) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS liceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ ueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte r.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto re.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav a:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:9 4) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 64) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 13) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche r.getNextBlock(IndexedSliceReader.java:180) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:119) ... 22 more ERROR [ReadStage:10] 2011-01-26 12:15:59,608 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:10,5,main] java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:124) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:47) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS liceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ ueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte r.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto re.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav a:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:9 4) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 64) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 13) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche r.getNextBlock(IndexedSliceReader.java:180) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:119) ... 22 more ERROR [ReadStage:2] 2011-01-26 12:16:09,732 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:124) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:47) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS liceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ ueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte r.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto re.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav a:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:9 4) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 64) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 13) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche r.getNextBlock(IndexedSliceReader.java:180) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:119) ... 22 more