Re: Cassandra disk space utilization WAY higher than I would expect

Rob Coli Fri, 06 Aug 2010 14:50:15 -0700

On 8/5/10 11:51 AM, Peter Schuller wrote:

Also, the variation in disk space in your most recent post looks
entirely as expected to me and nothing really extreme. The temporary
disk space occupied during the compact/cleanup would easily be as high
as your original disk space usage to begin with, and the fact that
you're reaching the 5-7 GB per node level after a cleanup has
completed fully and all obsolete sstables have been removed

Your post refers to "obsolete" sstables, but the only thing that makesthem "obsolete" in this case is that they have been compacted?


As I understand Julie's case, she is :

a) initializing her cluster
b) inserting some number of unique keys with CL.ALL
c) noticing that more disk space (6x?) than is expected is used
d) but that she gets expected usage if she does a major compaction

In other words, the problem isn't "temporary disk space occupied duringthe compact", it's permanent disk space occupied unless she compacts.

There is clearly overhead from there being multiple SSTables withmultiple bloom filters and multiple indexes. But from my understanding,that does not fully account for the difference in disk usage she isseeing. If it is 6x across the whole cluster, it seems unlikely that themeta information is 5x the size of the actual information.

I haven't been following this thread very closely, but I don't think"obsolete" SSTables should be relevant, because she's not doing UPDATEor DELETE and she hasn't changed cluster topography (the "cleanup" case).

Julie : when compaction occurs, it logs the number of bytes that itstarted with and the number it ended with, as well as the number of keysinvolved in the compaction. What do these messages say?


example line :

INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java(line 398) Compacted to /path/to/MyColumnFamily-26-Data.db.999999999/888888888 bytes for 12345678 keys. Time: 123456ms.


=Rob

Re: Cassandra disk space utilization WAY higher than I would expect

Reply via email to