Hi,

We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike
in the disk Load as per the ‘nodetool status’ command that does not
correspond with the actual disk usage. Load reported by nodetool was as
high as 3 times actual disk usage on certain nodes.
We noticed that the periodic repair failed with below error on running the
command : ’nodetool repair -pr’

ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 -
Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range
(-3814318684016904396,-3810689996127667017] failed with error [….]
Validation failed in /<ip>
org.apache.cassandra.exceptions.RepairException: [….] Validation failed in
<ip>
    at
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
~[apache-cassandra-2.2.4.jar:2.2.4]
    at
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
~[apache-cassandra-2.2.4.jar:2.2.4]
    at
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410)
~[apache-cassandra-2.2.4.jar:2.2.4]
    at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163)
~[apache-cassandra-2.2.4.jar:2.2.4]
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-2.2.4.jar:2.2.4]
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_40]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_40]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40

We restarted all nodes in the cluster and ran a full repair which completed
successfully without any validation errors, however we still see Load spike
on the same nodes after a while. Please advice.

Thanks!

Reply via email to