> a) cleanup is a superset of compaction, so if you've been doing
> overwrites at all then it will reduce space used for that reason

I had failed to consider over-writes as a possible culprit (since
removals were stated not to be done). However thinking about it I
believe the effect of this should be limited to roughly a doubling of
disk space in the absolute worst case of over-writing all data in the
absolute worst possible order (such as writing everything twice in the
same order).

Or more accurately, it should be limited to wasting as much as space
as the size of the overwritten values. If you're overwriting with
larger values, it will no longer be a "doubling" relative to the
actual live data set.

Julie, did you do over-writes or was your disk space measurements
based on the state of the cluster after an initial set of writes of
unique values?

-- 
/ Peter Schuller

Reply via email to