Hi,
you should check the "snapshot" directories on your nodes - it is very
likely there are some old ones from failed operations taking up some space.
Am 15.04.2016 um 01:28 schrieb kavya:
Hi,
We are running a 6 node cassandra 2.2.4 cluster and we are seeing a
spike in the disk Load as per the ‘nodetool status’ command that does
not correspond with the actual disk usage. Load reported by nodetool
was as high as 3 times actual disk usage on certain nodes.
We noticed that the periodic repair failed with below error on running
the command : ’nodetool repair -pr’
ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902
RepairRunnable.java:243 - Repair session
64b54d50-0100-11e6-b46e-a511fd37b526 for range
(-3814318684016904396,-3810689996127667017] failed with error [….]
Validation failed in /<ip>
org.apache.cassandra.exceptions.RepairException: [….] Validation
failed in <ip>
at
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_40]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40
We restarted all nodes in the cluster and ran a full repair which
completed successfully without any validation errors, however we still
see Load spike on the same nodes after a while. Please advice.
Thanks!