Rob Coli <rcoli <at> digg.com> writes:

> As I understand Julie's case, she is :
> 
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction

Yes, this is the scenario in my initial post.

> There is clearly overhead from there being multiple SSTables with 
> multiple bloom filters and multiple indexes. But from my understanding, 
> that does not fully account for the difference in disk usage she is 
> seeing. If it is 6x across the whole cluster, it seems unlikely that the 
> meta information is 5x the size of the actual information.
> 
> Julie : when compaction occurs, it logs the number of bytes that it 
> started with and the number it ended with, as well as the number of keys 
> involved in the compaction. What do these messages say?
> 
> example line :
> 
> INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java 
> (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. 
> 999999999/888888888 bytes for 12345678 keys.  Time: 123456ms.
> 
> =Rob

I will need to re-try this scenario today.  My subsequent posts were from
writing unique keys, then updating them which is a totally different test and
*should* result in excess SStable sizes.  I will retry writing unique keys, then
waiting for the nodes to settle and get back to you.  THANK YOU!!!



Reply via email to