I'm pretty sure that was a bug fixed in a later 0.6.x release so you might be able to upgrade and the exceptions might go away. We run 0.6.13 with a minor mod to support data expiration and will probably do so indefinitely since there no way to upgrade without shutting our site down :(
-Anthony On Aug 27, 2011, at 3:40 AM, Jake Maizel <j...@soundcloud.com> wrote: > Hello, > > In a cluster running 0.6.6 one node lost part of a data file due to an > operator error. An older file was moved in place to bring cassandra > up again. > > Now we get lots of these in the log: > > 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258 > CassandraDaemon.java:87 Uncaught exception in thread > Thread[ROW-READ-STAGE:4327,5,main] > 2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException > 2011-08-27_10:30:55.26220 at > org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326) > 2011-08-27_10:30:55.26220 at > java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) > 2011-08-27_10:30:55.26221 at > java.io.DataInputStream.readUTF(DataInputStream.java:592) > 2011-08-27_10:30:55.26221 at > java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) > 2011-08-27_10:30:55.26222 at > org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:125) > 2011-08-27_10:30:55.26222 at > org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59) > 2011-08-27_10:30:55.26223 at > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) > 2011-08-27_10:30:55.26223 at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990) > 2011-08-27_10:30:55.26224 at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901) > 2011-08-27_10:30:55.26224 at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870) > 2011-08-27_10:30:55.26224 at > org.apache.cassandra.db.Table.getRow(Table.java:382) > 2011-08-27_10:30:55.26225 at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) > 2011-08-27_10:30:55.26225 at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70) > 2011-08-27_10:30:55.26226 at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49) > 2011-08-27_10:30:55.26226 at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > 2011-08-27_10:30:55.26227 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > 2011-08-27_10:30:55.26227 at java.lang.Thread.run(Thread.java:662) > > Is it possible to use nodetool repair to fix this with the current data set? > > I issued a repair command and the other nodes seem to be doing the > correct things but I concerned by this: "Uncaught exception in thread > Thread[ROW-READ-STAGE:4327,5,main]" > > Will the affect node ever be able to do anything? > > Also, only Data file was affected, the index and Filter files are > still the originals. Should I keep these or do anything else with > them? > > My alternative is to delete all the data and run repair again which I > have done in the past and it works but takes a while with a large data > set. > > I am open to ideas and any suggestions are welcome. > > -- > Jake Maizel > Head of Network Operations > Soundcloud > > Mail & GTalk: j...@soundcloud.com > Skype: jakecloud > > Rosenthaler strasse 13, 101 19, Berlin, DE