On Wed, Mar 30, 2011 at 12:54 PM, Peter Schuller <peter.schul...@infidyne.com> wrote: >> Note this script doesn't work if your repair takes hours, and in the >> middle of the repair cassandra was restarted, nodetool will exit and the >> flagfile will be updated. Another case, if repair hangs, and day later >> cassandra is restarted. > > This is why "set -e" is at the to and commented as "important" :) But > it relies on 'nodetool repair' reliably exiting with non-zero exit > status on failures. > >> if nodetool returns an error this might work: >> >> nodetool -h localhost repair && touch /path/to/flagfile.tmp > > That's the equivalent, due to 'set -e'. > > > -- > / Peter Schuller >
I just wanted to chime in here and say some people NEVER run repair. In our particular case we remove inactive data older then a specific date. If we lost a tombstone and that data were to re-appear that would really not be the end of the world for us. Repair is really intensive since it involves a compaction and in 0.6.X was not optimal as it really increased on disk data. I have followed some threads and there are some conditions that I read repair can't handle. The question you have to ask yourself is how likely are they to occur and what they might mean in your use-case. These are not easy questions to answer.