On Thu, May 27, 2010 at 9:23 PM, Sean Bridges <sean.brid...@gmail.com> wrote: > But doesn't having multiple similarly sized column families mean in-node > compaction does not require 50% of disk? Looking at compaction manager, > only 1 thread is doing a compaction, so we only need enough free disk space > to compact the largest column family.
Yes AFAIK the compaction only happens in one Cf at a time. Also the total amount may not reach twice the size (if for example there is data that disappears in the compaction). Yet, you still have to be careful about pther factors, for example, if you have a snapshot of the data on the same machine (by default going into $data/$cf/snapshot/123456/) In 0.6 this is done via an hard link, which means it won't count as busy space, but when the old data is deleted from $data/$cf the space will not be freed and you will end up with ~twice the amount of data. I am not completely confident that there are no other "edge" cases I had not considered, so the "try to stay under 50% disk usage" principle is ok for me :)