Repairing lost data

Jake Maizel Sat, 27 Aug 2011 03:40:53 -0700

Hello,

In a cluster running 0.6.6 one node lost part of a data file due to an
operator error.  An older file was moved in place to bring cassandra
up again.


Now we get lots of these in the log:

 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258
CassandraDaemon.java:87 Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]
2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException
2011-08-27_10:30:55.26220       at
org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
2011-08-27_10:30:55.26220       at
java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
2011-08-27_10:30:55.26221       at
java.io.DataInputStream.readUTF(DataInputStream.java:592)
2011-08-27_10:30:55.26221       at
java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
2011-08-27_10:30:55.26222       at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:125)
2011-08-27_10:30:55.26222       at
org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59)
2011-08-27_10:30:55.26223       at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
2011-08-27_10:30:55.26223       at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
2011-08-27_10:30:55.26224       at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
2011-08-27_10:30:55.26224       at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
2011-08-27_10:30:55.26224       at
org.apache.cassandra.db.Table.getRow(Table.java:382)
2011-08-27_10:30:55.26225       at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
2011-08-27_10:30:55.26225       at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
2011-08-27_10:30:55.26226       at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
2011-08-27_10:30:55.26226       at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2011-08-27_10:30:55.26227       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2011-08-27_10:30:55.26227       at java.lang.Thread.run(Thread.java:662)

Is it possible to use nodetool repair to fix this with the current data set?

I issued a repair command and the other nodes seem to be doing the
correct things but I concerned  by this: "Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]"

Will the affect node ever be able to do anything?

 Also, only Data file was affected, the index and Filter files are
still the originals.  Should I keep these or do anything else with
them?

My alternative is to delete all the data and run repair again which I
have done in the past and it works but takes a while with a large data
set.

I am open to ideas and any suggestions are welcome.

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Repairing lost data

Reply via email to