I have a theory as to what I think is happening here. There is a correlation between the massive content all at once, and our outags.
Our scheme uses large buckets of content where we write to a bucket/partition for 5 minutes, then move to a new one. This way we can page through buckets. I think what's happening is that CS is reading the entire partition into memory, then slicing through it... which would explain why its running out of memory. system.log:WARN [CompactionExecutor:294] 2016-08-03 02:01:55,659 BigTableWriter.java:184 - Writing large partition blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes) On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote: > We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated > to each C* node. We're aware of the recommended 8GB limit to keep GCs low > but our memory has been creeping up (probably) related to this bug. > > Here's what we're seeing... if we do a low level of writes we think > everything generally looks good. > > What happens is that we then need to catch up and then do a TON of writes > all in a small time window. Then CS nodes start dropping like flies. Some > of them just GC frequently and are able to recover. When they GC like this > we see GC pause in the 30 second range which then cause them to not gossip > for a while and they drop out of the cluster. > > This happens as a flurry around the cluster so we're not always able to > catch which ones are doing it as they recover. However, if we have 3 down, > we mostly have a locked up cluster. Writes don't complete and our app > essentially locks up. > > SOME of the boxes never recover. I'm in this state now. We have t3-5 > nodes that are in GC storms which they won't recover from. > > I reconfigured the GC settings to enable jstat. > > I was able to catch it while it was happening: > > ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500 > S0 S1 E O M CCS YGC YGCT FGC FGCT > GCT > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > 0.00 100.00 100.00 94.76 97.60 93.06 10435 1686.191 471 1139.142 > 2825.332 > > ... as you can see the box is legitimately out of memory. S0, S1, E and O > are all completely full. > > I'm not sure were to go from here. I think 20GB for our work load is more > than reasonable. > > 90% of the time they're well below 10GB of RAM used. While I was watching > this box I was seeing 30% RAM used until it decided to climb to 100% > > Any advice on what do do next... I don't see anything obvious in the logs > to signal a problem. > > I attached all the command line arguments we use. Note that I think that > the cassandra-env.sh script puts them in there twice. > > -ea > -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar > -XX:+CMSClassUnloadingEnabled > -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 > -Xms20000M > -Xmx20000M > -Xmn4096M > -XX:+HeapDumpOnOutOfMemoryError > -Xss256k > -XX:StringTableSize=1000003 > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB > -XX:CompileCommandFile=/hotspot_compiler > -XX:CMSWaitDuration=10000 > -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSEdenChunksRecordAlways > -XX:CMSWaitDuration=10000 > -XX:+UseCondCardMark > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure > -XX:PrintFLSStatistics=1 > -Xloggc:/var/log/cassandra/gc.log > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=10M > -Djava.net.preferIPv4Stack=true > -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.rmi.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Djava.library.path=/usr/share/cassandra/lib/sigar-bin > -XX:+UnlockCommercialFeatures > -XX:+FlightRecorder > -ea > -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar > -XX:+CMSClassUnloadingEnabled > -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 > -Xms20000M > -Xmx20000M > -Xmn4096M > -XX:+HeapDumpOnOutOfMemoryError > -Xss256k > -XX:StringTableSize=1000003 > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB > -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler > -XX:CMSWaitDuration=10000 > -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSEdenChunksRecordAlways > -XX:CMSWaitDuration=10000 > -XX:+UseCondCardMark > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure > -XX:PrintFLSStatistics=1 > -Xloggc:/var/log/cassandra/gc.log > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=10M > -Djava.net.preferIPv4Stack=true > -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.rmi.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Djava.library.path=/usr/share/cassandra/lib/sigar-bin > -XX:+UnlockCommercialFeatures > -XX:+FlightRecorder > -Dlogback.configurationFile=logback.xml > -Dcassandra.logdir=/var/log/cassandra > -Dcassandra.storagedir= > -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid > > > -- > > We’re hiring if you know of any awesome Java Devops or Linux Operations > Engineers! > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > > -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>