Re: Compaction doubles disk space

aaron morton Wed, 30 Mar 2011 03:40:26 -0700

Checked the code again, got it a bit wrong. When getting a path to flush a 
memtable (and to write an incoming stream) via cfs.getFlushPath() the code does 
not invoke GC if there is not enough space.


One reason for not doing this could be that when we do it during compaction we 
wait for 20 seconds before checking disk space again. However the write happens 
on a separate flusher pool.

created https://issues.apache.org/jira/browse/CASSANDRA-2404 to ask if we 
can/should reclaim space during flush. 

Karl, what version are you using and have you altered the compaction thresholds 
? 

Aaron

On 30 Mar 2011, at 19:46, Karl Hiramoto wrote:

> On 30/03/2011 09:08, aaron morton wrote:
>> Also as far as I understand we cannot immediately delete files because other 
>> operations (including repair) may be using them. The data in the pre 
>> compacted files is just as correct as the data in the compacted file, it's 
>> just more compact. So the easiest thing to do is let the JVM sort out if 
>> anything else is using them.
>> 
>> Perhaps it could be improved by actively tracking which files are in use so 
>> they may be deleted quicker. But right so long as unused space is freed when 
>> needed it's working as designed AFAIK.
>> 
>> 
> 
> I've run out of space on multiple occasions, and we have nagios alarms going 
> off frequently when disk usage is over 90%.   I check cassandra and the data/ 
> directory is 2X  to 4X bigger than it needs to be, and no compaction or 
> repair is currently running.  I restart the cassandra process, or force a GC, 
> it deletes a lot of old SSTables and the data/ directory goes down to 1/2 to 
> 1/4  of the size it was a few minutes ago.
> 
> Under lots of disk pressure here.
> 
> --
> Karl
>

Re: Compaction doubles disk space

Reply via email to