> a) cleanup is a superset of compaction, so if you've been doing > overwrites at all then it will reduce space used for that reason
I had failed to consider over-writes as a possible culprit (since removals were stated not to be done). However thinking about it I believe the effect of this should be limited to roughly a doubling of disk space in the absolute worst case of over-writing all data in the absolute worst possible order (such as writing everything twice in the same order). Or more accurately, it should be limited to wasting as much as space as the size of the overwritten values. If you're overwriting with larger values, it will no longer be a "doubling" relative to the actual live data set. Julie, did you do over-writes or was your disk space measurements based on the state of the cluster after an initial set of writes of unique values? -- / Peter Schuller