Our Young size=800 MB,SurvivorRatio=8,edenSize=640MB. All objects/bytes generated during compaction are garbage right?
During compaction, with in_memory_compaction_limit=64MB and concurrent_compactors=8, there is a lot of pressure on ParNew sweeps. I was thinking of decreasing concurrent_compactors and in_memory_compaction_limit to go easy on GC I am not familiar with inner workings of cassandra but hope have diagnosed the problem to a little extent. On Fri, Jul 6, 2012 at 11:27 AM, rohit bhatia <rohit2...@gmail.com> wrote: > @ravi, u can increase young gen size, keep a high tenuring rate or > increase survivor ratio.. > > > On Fri, Jul 6, 2012 at 4:03 AM, aaron morton <aa...@thelastpickle.com> > wrote: > > Ideally we would like to collect maximum garbage from ParNew itself, > during > > compactions. What are the steps to take towards to achieving this? > > > > I'm not sure what you are asking. > > > > Cheers > > > > ----------------- > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > > On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote: > > > > We have modified maxTenuringThreshold from 1 to 5. May be it is causing > > problems. Will change it back to 1 and see how the system is. > > > > concurrent_compactors=8. We will reduce this, as anyway our system won't > be > > able to handle this number of compactions at the same time. Think it will > > ease GC also to some extent. > > > > Ideally we would like to collect maximum garbage from ParNew itself, > during > > compactions. What are the steps to take towards to achieving this? > > > > On Wed, Jul 4, 2012 at 4:07 PM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> It *may* have been compaction from the repair, but it's not a big CF. > >> > >> I would look at the logs to see how much data was transferred to the > node. > >> Was their a compaction going on while the GC storm was happening ? Do > you > >> have a lot of secondary indexes ? > >> > >> If you think it correlated to compaction you can try reducing the > >> concurrent_compactors > >> > >> Cheers > >> > >> ----------------- > >> Aaron Morton > >> Freelance Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> > >> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote: > >> > >> Recently, we faced a severe freeze [around 30-40 mins] on one of our > >> servers. There were many mutations/reads dropped. The issue happened > just > >> after a routine nodetool repair for the below CF completed [1.0.7, NTS, > >> DC1:3,DC2:2] > >> > >> Column Family: MsgIrtConv > >> SSTable count: 12 > >> Space used (live): 17426379140 > >> Space used (total): 17426379140 > >> Number of Keys (estimate): 122624 > >> Memtable Columns Count: 31180 > >> Memtable Data Size: 81950175 > >> Memtable Switch Count: 31 > >> Read Count: 8074156 > >> Read Latency: 15.743 ms. > >> Write Count: 2172404 > >> Write Latency: 0.037 ms. > >> Pending Tasks: 0 > >> Bloom Filter False Postives: 1258 > >> Bloom Filter False Ratio: 0.03598 > >> Bloom Filter Space Used: 498672 > >> Key cache capacity: 200000 > >> Key cache size: 200000 > >> Key cache hit rate: 0.9965579513062582 > >> Row cache: disabled > >> Compacted row minimum size: 51 > >> Compacted row maximum size: 89970660 > >> Compacted row mean size: 226626 > >> > >> > >> Our heap config is as follows > >> > >> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC > >> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > >> -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75 > >> -XX:+UseCMSInitiatingOccupancyOnly > >> > >> from yaml > >> in_memory_compaction_limit=64 > >> compaction_throughput_mb_sec=8 > >> multi_threaded_compaction=false > >> > >> INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 > AntiEntropyService.java > >> (line 762) [repair #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] MsgIrtConv is > >> fully synced > >> INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085 > >> AntiEntropyService.java (line 698) [repair > >> #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] session completed successfully > >> INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 > CompactionTask.java > >> (line 221) Compacted to > >> [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,]. 47,907,012 to > >> 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s. > Time: > >> 6,186ms. > >> > >> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran > >> for every 3 seconds, while CMS ran for every 30 seconds approx > continuous > >> for 40 minutes. > >> > >> INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line > >> 122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line > >> 122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is > >> 8506048512 > >> > >> ......................................... > >> > >> INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line > >> 122) GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line > >> 122) GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line > >> 122) GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line > >> 122) GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line > >> 122) GC for ParNew: 697 ms for 2 collections, 4873857072 used; max is > >> 8506048512 > >> INFO [ScheduledTasks:1] 2012-06-29 10:08:01,443 GCInspector.java (line > >> 122) GC for ParNew: 726 ms for 2 collections, 4941511184 used; max is > >> 8506048512 > >> > >> After this, the node got stable and was back and running. Any pointers > >> will be greatly helpful > >> > >> > > > > >