On Mon, Nov 8, 2010 at 5:07 PM, Wayne <wav...@gmail.com> wrote:
> Can anyone speak to best practices for running manual compaction in
> production? Our assumption is that without it the sstables will become too
> fragmented...is this an accepted "fact"? Obviously it depends on the volume
> of writes, but I am looking for current production practices.
>
> Since it takes a lot of resources and 4-5 hours for our current node size of
> 500Gb weekly seems like a sensible option for us. Is this a normal practice?
>
> Is it best to run on all nodes at the same time or staggered across nodes to
> reduce total cluster slow-down? Given that full compaction has a major
> affect on a node and its ability to function under heavy load our assumption
> is that staggered over the weekend for example (our low usage time) would be
> best.
>
> Any recommendations?
>
> Thanks
>
> Wayne
>
>

Yes. Stagger major compaction times. If you run them all at once you
slow down the cluster and do not need to. I wrote about a puppet trick
for staggering compaction in my blog.
http://www.edwardcapriolo.com/roller/. Running at night is a good idea
as well less traffic leaves more resources to compact faster.

Compaction does two major things
1) Defragment rows helping your read path. (normal compaction does this however)
2) clears deleted data (* now non minor compaction can remove data not
found in other SSTables from bloom filters)

How often you should do compaction depends on how often you are
removing data. Default GCGracePeriod is 10 days. You have to feel this
out. If you run  major compaction once a week you can study how much
your Column family shrinks. If it did not shrink much go up to two
weeks, and so on.

Reply via email to