Hello, In a cluster running 0.6.6 one node lost part of a data file due to an operator error. An older file was moved in place to bring cassandra up again.
Now we get lots of these in the log: 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258 CassandraDaemon.java:87 Uncaught exception in thread Thread[ROW-READ-STAGE:4327,5,main] 2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException 2011-08-27_10:30:55.26220 at org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326) 2011-08-27_10:30:55.26220 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) 2011-08-27_10:30:55.26221 at java.io.DataInputStream.readUTF(DataInputStream.java:592) 2011-08-27_10:30:55.26221 at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) 2011-08-27_10:30:55.26222 at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:125) 2011-08-27_10:30:55.26222 at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59) 2011-08-27_10:30:55.26223 at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) 2011-08-27_10:30:55.26223 at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.Table.getRow(Table.java:382) 2011-08-27_10:30:55.26225 at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) 2011-08-27_10:30:55.26225 at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70) 2011-08-27_10:30:55.26226 at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49) 2011-08-27_10:30:55.26226 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-08-27_10:30:55.26227 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-08-27_10:30:55.26227 at java.lang.Thread.run(Thread.java:662) Is it possible to use nodetool repair to fix this with the current data set? I issued a repair command and the other nodes seem to be doing the correct things but I concerned by this: "Uncaught exception in thread Thread[ROW-READ-STAGE:4327,5,main]" Will the affect node ever be able to do anything? Also, only Data file was affected, the index and Filter files are still the originals. Should I keep these or do anything else with them? My alternative is to delete all the data and run repair again which I have done in the past and it works but takes a while with a large data set. I am open to ideas and any suggestions are welcome. -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE