Hi,

you should check the "snapshot" directories on your nodes - it is very likely there are some old ones from failed operations taking up some space.

Am 15.04.2016 um 01:28 schrieb kavya:
Hi,

We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike in the disk Load as per the ‘nodetool status’ command that does not correspond with the actual disk usage. Load reported by nodetool was as high as 3 times actual disk usage on certain nodes. We noticed that the periodic repair failed with below error on running the command : ’nodetool repair -pr’

ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 - Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range (-3814318684016904396,-3810689996127667017] failed with error [….] Validation failed in /<ip> org.apache.cassandra.exceptions.RepairException: [….] Validation failed in <ip> at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40

We restarted all nodes in the cluster and ran a full repair which completed successfully without any validation errors, however we still see Load spike on the same nodes after a while. Please advice.

Thanks!


Reply via email to