Rob Coli <rcoli <at> digg.com> writes: > As I understand Julie's case, she is : > > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction
Yes, this is the scenario in my initial post. > There is clearly overhead from there being multiple SSTables with > multiple bloom filters and multiple indexes. But from my understanding, > that does not fully account for the difference in disk usage she is > seeing. If it is 6x across the whole cluster, it seems unlikely that the > meta information is 5x the size of the actual information. > > Julie : when compaction occurs, it logs the number of bytes that it > started with and the number it ended with, as well as the number of keys > involved in the compaction. What do these messages say? > > example line : > > INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java > (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. > 999999999/888888888 bytes for 12345678 keys. Time: 123456ms. > > =Rob I will need to re-try this scenario today. My subsequent posts were from writing unique keys, then updating them which is a totally different test and *should* result in excess SStable sizes. I will retry writing unique keys, then waiting for the nodes to settle and get back to you. THANK YOU!!!