> compaction needs some disk I/O. Slowing down our compaction will improve > overall system performance. Of course, you don't want to go too slow and fall > behind too much. In this case I was thinking of the memory use. Compaction tasks are a bit like a storm of reads. If you are having problems with memory management all those reads can result in increased GC.
> It looks like we hit OOM when repair starts streaming > multiple cfs simultaneously. Odd. It's not very memory intensive. > I'm wondering if I should throttle streaming, and/or repair only one > CF at a time. Decreasing stream_throughput_outbound_megabits_per_sec may help, if the goal is just to get repair working. You may also want to increase phi_convict_threshold to 12, this will make it harder for a node to get marked as down. Which can be handy when GC is causing problems and you have under powered nodes. If the node is marked as down the repair session will fail instantly. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/03/2013, at 9:12 AM, Dane Miller <d...@optimalsocial.com> wrote: > On Fri, Mar 22, 2013 at 5:58 PM, Wei Zhu <wz1...@yahoo.com> wrote: >> compaction needs some disk I/O. Slowing down our compaction will improve >> overall >> system performance. Of course, you don't want to go too slow and fall behind >> too much. > > Hmm. Even after making the suggested configuration changes, repair > still fails with OOM (but only one node died this time, which is an > improvement). It looks like we hit OOM when repair starts streaming > multiple cfs simultaneously. Just prior to OOM, the node loses > contact with another node in the cluster and starts storing hints. > > I'm wondering if I should throttle streaming, and/or repair only one > CF at a time. > >> From: "Dane Miller" >> Subject: Re: Stream fails during repair, two nodes out-of-memory >> >> On Thu, Mar 21, 2013 at 10:28 AM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> heap of 1867M is kind of small. According to the discussion on this list, >>> it's advisable to have m1.xlarge. >>> >>> +1 >>> >>> In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the NEW_HEAP_SIZE to >>> 400M >>> >>> In the yaml file set >>> >>> in_memory_compaction_limit_in_mb to 32 >>> compaction_throughput_mb_per_sec to 8 >>> concurrent_compactors to 2 >>> >>> This will slow down compaction a lot. You may want to restore some of these >>> settings once you have things stable. >>> >>> You have an under powered box for what you are trying to do. >> >> Thanks very much for the info. Have made the changes and am retrying. >> I'd like to understand, why does it help to slow compaction? >> >> It does seem like the cluster is under powered to handle our >> application's full write load plus repairs, but it operates fine >> otherwise. >> >> On Wed, Mar 20, 2013 at 8:47 PM, Wei Zhu <wz1...@yahoo.com> wrote: >>> It's clear you are out of memory. How big is your data size? >> >> 120 GB per node, of which 50% is actively written/updated, and 50% is >> read-mostly. >> >> Dane >>