DataStax has not recommend to run major compaction now: http://www.datastax.com/docs/1.0/operations/tuning But if you can afford it, major compaction will improve read latency as you see.
Major compaction is expensive, so you will not want to run it during high traffic hours. And you should not run it more than 1 node in replicas same time. You should not run repair and major compaction in same time in same (affected) node, because both of the tasks require massive io. With these constraints, as often as you run major compaction, you will get better read latency. 2012/3/1 Eran Chinthaka Withana <eran.chinth...@gmail.com>: > Hi, > > I have two questions on major compactions (the ones user initiate using > nodetool) and I really appreciate if someone can help. > > 1. I've noticed that when I run compactions the read latency improves even > more than I expected (which is good :) ) The improvement is so tempting that > I'd like to run this almost every week :). I understand after a compaction > Cassandra will create one giant SSTable and if something happens to it > things can go little bit crazy. So from your experience how often should we > be running compactions? What parameters will influence this frequency? > > 2. I'm thinking scheduling compactions using a cron job. But the issue is I > scheduled repairs also using a cronjob to run once in GC Period (of default > 10 days). Now the obvious question is what will happen if a node is running > both the compactions AND the repair at the same time? Is this something we > should avoid at all costs? What will be the implications? > > Thanks, > Eran Chinthaka Withana > -- w3m