Thanks for the reply. The snapshots are only 400MB. Also, the disk usage on the node should have reported the snapshot size as well in it (df -h).
I did notice that the spike in nodetool status load seems to coincide with the hourly operation of "IndexSummaryManager.java:256 - Redistributing index summaries". Any correlation here? And the last night's run of periodic "nodetool repair -pr" succeeded on only 2 of the 6 nodes. On Fri, Apr 15, 2016 at 12:28 AM, Jan Kesten <j.kes...@enercast.de> wrote: > Hi, > > you should check the "snapshot" directories on your nodes - it is very > likely there are some old ones from failed operations taking up some space. > > > Am 15.04.2016 um 01:28 schrieb kavya: > >> Hi, >> >> We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike >> in the disk Load as per the ‘nodetool status’ command that does not >> correspond with the actual disk usage. Load reported by nodetool was as >> high as 3 times actual disk usage on certain nodes. >> We noticed that the periodic repair failed with below error on running >> the command : ’nodetool repair -pr’ >> >> ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 - >> Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range >> (-3814318684016904396,-3810689996127667017] failed with error [….] >> Validation failed in /<ip> >> org.apache.cassandra.exceptions.RepairException: [….] Validation failed >> in <ip> >> at >> org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) >> ~[apache-cassandra-2.2.4.jar:2.2.4] >> at >> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) >> ~[apache-cassandra-2.2.4.jar:2.2.4] >> at >> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) >> ~[apache-cassandra-2.2.4.jar:2.2.4] >> at >> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) >> ~[apache-cassandra-2.2.4.jar:2.2.4] >> at >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >> ~[apache-cassandra-2.2.4.jar:2.2.4] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_40] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_40] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40 >> >> We restarted all nodes in the cluster and ran a full repair which >> completed successfully without any validation errors, however we still see >> Load spike on the same nodes after a while. Please advice. >> >> Thanks! >> >> >