Hi I have a 3-node SSD-based cluster, with around 1 TB data, RF:3, C* v.1.2.0, vnodes. One large CF, LCS. Everything was running smooth, until one of the nodes crashed and was restarted.
At the time of normal operation there was 800 gb free space on each node. After the crash, C* started using a lot more, resulting in an out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2 days, giving us very little time to do anything about it, since repairs/joins takes a considerable amount of time. What can make C* suddenly use this amount of disk-space? We did see a lot of pending compactions on one node (7k). Any tips on recovering from an out-of-diskspace on multiple nodes, situation? I've tried moving some SStables away, but C* seems to use whatever space I free up in no time. I'm not sure if any of the nodes is fully updated as 'nodetool status' reports 3 different loads -- Address Load Tokens Owns (effective) Host ID Rack UN 10.146.145.26 1.4 TB 256 100.0% 1261717d-ddc1-457e-9c93-431b3d3b5c5b rack1 UN 10.148.149.141 1.03 TB 256 100.0% f80bfa31-e19d-4346-9a14-86ae87f06356 rack1 DN 10.146.146.4 1.11 TB 256 100.0% 85d4cd28-93f4-4b96-8140-3605302e90a9 rack1 -- Sincerely, *Nicolai Gylling*