On 8/5/10 11:51 AM, Peter Schuller wrote:
Also, the variation in disk space in your most recent post looks
entirely as expected to me and nothing really extreme. The temporary
disk space occupied during the compact/cleanup would easily be as high
as your original disk space usage to begin with, and the fact that
you're reaching the 5-7 GB per node level after a cleanup has
completed fully and all obsolete sstables have been removed

Your post refers to "obsolete" sstables, but the only thing that makes them "obsolete" in this case is that they have been compacted?

As I understand Julie's case, she is :

a) initializing her cluster
b) inserting some number of unique keys with CL.ALL
c) noticing that more disk space (6x?) than is expected is used
d) but that she gets expected usage if she does a major compaction

In other words, the problem isn't "temporary disk space occupied during the compact", it's permanent disk space occupied unless she compacts.

There is clearly overhead from there being multiple SSTables with multiple bloom filters and multiple indexes. But from my understanding, that does not fully account for the difference in disk usage she is seeing. If it is 6x across the whole cluster, it seems unlikely that the meta information is 5x the size of the actual information.

I haven't been following this thread very closely, but I don't think "obsolete" SSTables should be relevant, because she's not doing UPDATE or DELETE and she hasn't changed cluster topography (the "cleanup" case).

Julie : when compaction occurs, it logs the number of bytes that it started with and the number it ended with, as well as the number of keys involved in the compaction. What do these messages say?

example line :

INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. 999999999/888888888 bytes for 12345678 keys. Time: 123456ms.

=Rob

Reply via email to