Hi all,
Recently we tried to repair one of our biggest table, and we keep
getting hit by errors related to hard link. Here's a stacktrace:
ERROR [RepairJobTask:4] 2016-03-31 05:47:27,268 RepairJob.java:145 -
Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.51.0.7
at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(-
SnapshotTask.java:77) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHand-
ler.java:48) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask-
.java:62) ~[apache-cassandra-2.1.5.jar:2.1.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor-
.java:1145) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto-
r.java:615) [na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
ERROR [AntiEntropyStage:39] 2016-03-31 05:47:27,268
CassandraDaemon.java:223 - Exception in thread
Thread[AntiEntropyStage:39,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Tried to hard
link to file that does not exist
/data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-
Statistics.db
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMe-
ssageVerbHandler.java:141) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask-
.java:62) ~[apache-cassandra-2.1.5.jar:2.1.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor-
.java:1145) ~[na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto-
r.java:615) ~[na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]
Caused by: java.lang.RuntimeException: Tried to hard link to file that
does not exist
/data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-
Statistics.db
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java-
:90) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableRea-
der.java:1799) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Colum-
nFamilyStore.java:2237) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore-
.java:2319) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMe-
ssageVerbHandler.java:82) ~[apache-cassandra-2.1.5.jar:2.1.5]
... 4 common frames omitted
I tried Googling for that particular error and I did not find a
definitive answer, instead what seems to be recommended is to simply
restart the node. However, we're getting this error at least once a day
and sometimes on multiple nodes (we have 7 nodes currently), so it's
getting tedious to restart cassandra every time.
I saw the issue https://issues.apache.org/jira/browse/CASSANDRA-6433
and it suggests it's due to a drop of a keyspace, but we didn't do
any drop. So I'm not sure that issue really applies, although the
error is related.
This issue https://issues.apache.org/jira/browse/CASSANDRA-6716 reports
the same exception but we didn't do any scrubbing, so I'm not sure it
applies either.
We're running cassandra 2.1.5 by the way. I don't know if upgrading will
fix the problems, because I didn't really see anything related to this
looking in the changelogs.
I'm wondering if getting these exceptions will somehow "block" the
repair, because it seems the repair is super slow right now (we're
talking days repairing).