I just tried to use this setting (I'm using 1.1.9). And it appears I can't set min > 32, as that's the max max now (using nodetool at least). Not sure if JMX would allow more access, but I don't like bypassing things I don't fully understand. I think I'll just leave my compaction killers running instead (not that killing compactions constantly isn't messing with things as well....).
will On Tue, Apr 2, 2013 at 10:43 AM, William Oberman <ober...@civicscience.com>wrote: > Edward, you make a good point, and I do think am getting closer to having > to increase my cluster size (I'm around ~300GB/node now). > > In my case, I think it was neither. I had one node OOM after working on a > large compaction but it continued to run in a zombie like state (constantly > GC'ing), which I didn't have an alert on. Then I had the bad luck of a > "close token" also starting a large compaction. I have RF=3 with some of > my R/W patterns at quorum, causing that segment of my cluster to get slow > (e.g. a % of of my traffic started to slow). I was running 1.1.2 (I > haven't had to poke anything for quite some time, obviously), so I upgraded > before moving on (as I saw a lot of bug fixes to compaction issues in > release notes). But the upgrade caused even more nodes to start > compactions. Which lead to my original email... I had a cluster where 80% > of my nodes were compacting, and I really needed to boost production > traffic and couldn't seem to "tamp cassandra down" temporarily. > > Thanks for the advice everyone! > > will > > > On Tue, Apr 2, 2013 at 10:20 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> Settings do not make compactions go away. If your compactions are "out of >> control" it usually means one of these things, >> 1) you have a corrupt table that the compaction never finishes on, >> sstables count keep growing >> 2) you do not have enough hardware to handle your write load >> >> >> On Tue, Apr 2, 2013 at 7:50 AM, William Oberman <ober...@civicscience.com >> > wrote: >> >>> Thanks Gregg & Aaron. Missed that setting! >>> >>> On Tuesday, April 2, 2013, aaron morton wrote: >>> >>>> Set the min and max >>>> compaction thresholds for a given column family >>>> >>>> +1 for setting the max_compaction_threshold (as well as the min) on the >>>> a CF when you are getting behind. It can limit the size of the compactions >>>> and give things a chance to complete in a reasonable time. >>>> >>>> Cheers >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Cassandra Consultant >>>> New Zealand >>>> >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 2/04/2013, at 3:42 AM, Gregg Ulrich <gulr...@netflix.com> wrote: >>>> >>>> You may want to set compaction threshold and not throughput. If you >>>> set the min threshold to something very large (100000), compactions will >>>> not start until cassandra finds this many files to compact (which it should >>>> not). >>>> >>>> In the past I have used this to stop compactions on a node, and then >>>> run an offline major compaction to get though the compaction, then set the >>>> min threshold back. Not everyone likes major compactions though. >>>> >>>> >>>> >>>> setcompactionthreshold <keyspace> <cfname> <minthreshold> >>>> <maxthreshold> - Set the min and max >>>> compaction thresholds for a given column family >>>> >>>> >>>> >>>> On Mon, Apr 1, 2013 at 12:38 PM, William Oberman < >>>> ober...@civicscience.com> wrote: >>>> >>>>> I'll skip the prelude, but I worked myself into a bit of a jam. I'm >>>>> recovering now, but I want to double check if I'm thinking about things >>>>> correct. >>>>> >>>>> Basically, I was in a state where a majority of my servers wanted to >>>>> do compactions, and rather large ones. This was impacting my site >>>>> performance. I tried nodetool stop COMPACTION. I tried >>>>> setcompactionthroughput=1. I tried restarting servers, but they'd restart >>>>> the compactions pretty much immediately on boot. >>>>> >>>>> Then I realized that: >>>>> nodetool stop COMPACTION >>>>> only stopped running compactions, and then the compactions would >>>>> re-enqueue themselves rather quickly. >>>>> >>>>> So, right now I have: >>>>> 1.) scripts running on N-1 servers looping on "nodetool stop >>>>> COMPACTION" in a tight loop >>>>> 2.) On the "Nth" server I've disabled gossip/thrift and turned up >>>>> setcompactionthroughput to 999 >>>>> 3.) When the Nth server completes, I pick from the remaining N-1 >>>>> (well, I'm still running the first compaction, which is going to take 12 >>>>> more hours, but that is the plan at least). >>>>> >>>>> Does this make sense? Other than the fact there was probably warning >>>>> signs that would have prevented me from getting into this state in the >>>>> first place? :-) >>>>> >>>>> will >>>>> >>>> >>>> >>>> >>> >>> >>>