Edward, you make a good point, and I do think am getting closer to having to increase my cluster size (I'm around ~300GB/node now).
In my case, I think it was neither. I had one node OOM after working on a large compaction but it continued to run in a zombie like state (constantly GC'ing), which I didn't have an alert on. Then I had the bad luck of a "close token" also starting a large compaction. I have RF=3 with some of my R/W patterns at quorum, causing that segment of my cluster to get slow (e.g. a % of of my traffic started to slow). I was running 1.1.2 (I haven't had to poke anything for quite some time, obviously), so I upgraded before moving on (as I saw a lot of bug fixes to compaction issues in release notes). But the upgrade caused even more nodes to start compactions. Which lead to my original email... I had a cluster where 80% of my nodes were compacting, and I really needed to boost production traffic and couldn't seem to "tamp cassandra down" temporarily. Thanks for the advice everyone! will On Tue, Apr 2, 2013 at 10:20 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Settings do not make compactions go away. If your compactions are "out of > control" it usually means one of these things, > 1) you have a corrupt table that the compaction never finishes on, > sstables count keep growing > 2) you do not have enough hardware to handle your write load > > > On Tue, Apr 2, 2013 at 7:50 AM, William Oberman > <ober...@civicscience.com>wrote: > >> Thanks Gregg & Aaron. Missed that setting! >> >> On Tuesday, April 2, 2013, aaron morton wrote: >> >>> Set the min and max >>> compaction thresholds for a given column family >>> >>> +1 for setting the max_compaction_threshold (as well as the min) on the >>> a CF when you are getting behind. It can limit the size of the compactions >>> and give things a chance to complete in a reasonable time. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Consultant >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 2/04/2013, at 3:42 AM, Gregg Ulrich <gulr...@netflix.com> wrote: >>> >>> You may want to set compaction threshold and not throughput. If you set >>> the min threshold to something very large (100000), compactions will not >>> start until cassandra finds this many files to compact (which it should >>> not). >>> >>> In the past I have used this to stop compactions on a node, and then run >>> an offline major compaction to get though the compaction, then set the min >>> threshold back. Not everyone likes major compactions though. >>> >>> >>> >>> setcompactionthreshold <keyspace> <cfname> <minthreshold> >>> <maxthreshold> - Set the min and max >>> compaction thresholds for a given column family >>> >>> >>> >>> On Mon, Apr 1, 2013 at 12:38 PM, William Oberman < >>> ober...@civicscience.com> wrote: >>> >>>> I'll skip the prelude, but I worked myself into a bit of a jam. I'm >>>> recovering now, but I want to double check if I'm thinking about things >>>> correct. >>>> >>>> Basically, I was in a state where a majority of my servers wanted to do >>>> compactions, and rather large ones. This was impacting my site >>>> performance. I tried nodetool stop COMPACTION. I tried >>>> setcompactionthroughput=1. I tried restarting servers, but they'd restart >>>> the compactions pretty much immediately on boot. >>>> >>>> Then I realized that: >>>> nodetool stop COMPACTION >>>> only stopped running compactions, and then the compactions would >>>> re-enqueue themselves rather quickly. >>>> >>>> So, right now I have: >>>> 1.) scripts running on N-1 servers looping on "nodetool stop >>>> COMPACTION" in a tight loop >>>> 2.) On the "Nth" server I've disabled gossip/thrift and turned up >>>> setcompactionthroughput to 999 >>>> 3.) When the Nth server completes, I pick from the remaining N-1 (well, >>>> I'm still running the first compaction, which is going to take 12 more >>>> hours, but that is the plan at least). >>>> >>>> Does this make sense? Other than the fact there was probably warning >>>> signs that would have prevented me from getting into this state in the >>>> first place? :-) >>>> >>>> will >>>> >>> >>> >>> >> >> -- >> Will Oberman >> Civic Science, Inc. >> 6101 Penn Avenue, Fifth Floor >> Pittsburgh, PA 15206 >> (M) 412-480-7835 >> (E) ober...@civicscience.com >> > >