I'll skip the prelude, but I worked myself into a bit of a jam.  I'm
recovering now, but I want to double check if I'm thinking about things
correct.

Basically, I was in a state where a majority of my servers wanted to do
compactions, and rather large ones.  This was impacting my site
performance.  I tried nodetool stop COMPACTION.  I tried
setcompactionthroughput=1.  I tried restarting servers, but they'd restart
the compactions pretty much immediately on boot.

Then I realized that:
nodetool stop COMPACTION
only stopped running compactions, and then the compactions would re-enqueue
themselves rather quickly.

So, right now I have:
1.) scripts running on N-1 servers looping on "nodetool stop COMPACTION" in
a tight loop
2.) On the "Nth" server I've disabled gossip/thrift and turned up
setcompactionthroughput to 999
3.) When the Nth server completes, I pick from the remaining N-1 (well, I'm
still running the first compaction, which is going to take 12 more hours,
but that is the plan at least).

Does this make sense?  Other than the fact there was probably warning signs
that would have prevented me from getting into this state in the first
place? :-)

will

Reply via email to