> On Wed, May 8, 2013 at 10:43 PM, Nicolai Gylling <n...@issuu.com> wrote:
>> At the time of normal operation there was 800 gb free space on each node.
>> After the crash, C* started using a lot more, resulting in an
>> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2
>> days, giving us very little time to do anything about it, since
>> repairs/joins takes a considerable amount of time.
> 
> Did someone do a repair? Repair very frequently results in (usually
> temporary) >2x disk consumption.
> 
Repairs is running regularly once a week, and normally doesn't take up much 
space, as we're using Leveled Compaction Strategy. 


>> What can make C* suddenly use this amount of disk-space? We did see a lot of
>> pending compactions on one node (7k).
> 
> Mostly repair.
> 
>> Any tips on recovering from an out-of-diskspace on multiple nodes,
>> situation? I've tried moving some SStables away, but C* seems to use
>> whatever space I free up in no time. I'm not sure if any of the nodes is
>> fully updated as 'nodetool status' reports 3 different loads
> 
> A relevant note here is that moving sstables out of the full partition
> while cassandra is running will not result in any space recovery,
> because Cassandra still has an open filehandle to that sstable. In
> order to deal with out of disk space condition you need to stop
> Cassandra. Unfortunately the JVM stops responding to clean shutdown
> request when the disk is full, you will have to kill -KILL the
> process.
> 
> If you have a lot of overwrites/fragmentation, you could attempt to
> clear enough space to do a major compaction of remaining data, do that
> major compaction, split your One Huge sstable with the (experimental)
> sstable_split tool and then copy temporarily moved sstables back onto
> the node. You could also attempt to use user defined compaction (via
> JMX endpoint) to strategically compact such data. If you grep for
> compaction in your logs, do you see compactions resulting in smaller
> output file sizes? (compacted to X% of original messages)
> 
> I agree with Alexis Rodriguez that Cassandra 1.2.0 is not a version
> anyone should run, it contains significant bugs.
> 
> =Rob

We're storing timeseries, so we don't have any overwrites and hardly any 
reduction in sizes during compaction. I'll try to upgrade and see if that can 
help get some diskspace back.

Let's say we're seing some bug in C*, and SSTables doesn't get deleted during 
compaction (which I guess is the only reason for this consumption of 
diskspace). Will C* 1.2.4 be able to fix this? Or would it be a better solution 
to replace one node at a time, so we're sure to only have the data, that C* 
knows about?


Reply via email to