On Thu, May 27, 2010 at 9:23 PM, Sean Bridges <sean.brid...@gmail.com> wrote:
> But doesn't having multiple similarly sized column families mean in-node
> compaction does not require 50% of disk?  Looking at compaction manager,
> only 1 thread is doing a compaction, so we only need enough free disk space
> to compact the largest column family.

Yes AFAIK the compaction only happens in one Cf at a time.
Also the total amount may not reach twice the size (if for example
there is data that disappears in the compaction).

Yet, you still have to be careful about pther factors, for example, if
you have a snapshot of the data on the same machine (by default going
into $data/$cf/snapshot/123456/)

In 0.6 this is done via an hard link, which means it won't count as
busy space, but when the old data is deleted from $data/$cf the space
will not be freed and you will end up with ~twice the amount of data.

I am not completely confident that there are no other "edge" cases I
had not considered, so the "try to stay under 50% disk usage"
principle is ok for me :)

Reply via email to