I have a theory as to what I think is happening here.

There is a correlation between the massive content all at once, and our
outags.

Our scheme uses large buckets of content where we write to a
bucket/partition for 5 minutes, then move to a new one.  This way we can
page through buckets.

I think what's happening is that CS is reading the entire partition into
memory, then slicing through it... which would explain why its running out
of memory.

system.log:WARN  [CompactionExecutor:294] 2016-08-03 02:01:55,659
BigTableWriter.java:184 - Writing large partition
blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes)

On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated
> to each C* node.  We're aware of the recommended 8GB limit to keep GCs low
> but our memory has been creeping up (probably) related to this bug.
>
> Here's what we're seeing... if we do a low level of writes we think
> everything generally looks good.
>
> What happens is that we then need to catch up and then do a TON of writes
> all in a small time window.  Then CS nodes start dropping like flies.  Some
> of them just GC frequently and are able to recover. When they GC like this
> we see GC pause in the 30 second range which then cause them to not gossip
> for a while and they drop out of the cluster.
>
> This happens as a flurry around the cluster so we're not always able to
> catch which ones are doing it as they recover. However, if we have 3 down,
> we mostly have a locked up cluster.  Writes don't complete and our app
> essentially locks up.
>
> SOME of the boxes never recover. I'm in this state now.  We have t3-5
> nodes that are in GC storms which they won't recover from.
>
> I reconfigured the GC settings to enable jstat.
>
> I was able to catch it while it was happening:
>
> ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
>   S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT
> GCT
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>
> ... as you can see the box is legitimately out of memory.  S0, S1, E and O
> are all completely full.
>
> I'm not sure were to go from here.  I think 20GB for our work load is more
> than reasonable.
>
> 90% of the time they're well below 10GB of RAM used.  While I was watching
> this box I was seeing 30% RAM used until it decided to climb to 100%
>
> Any advice on what do do next... I don't see anything obvious in the logs
> to signal a problem.
>
> I attached all the command line arguments we use.  Note that I think that
> the cassandra-env.sh script puts them in there twice.
>
> -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
> -XX:+CMSClassUnloadingEnabled
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms20000M
> -Xmx20000M
> -Xmn4096M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k
> -XX:StringTableSize=1000003
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> -XX:CompileCommandFile=/hotspot_compiler
> -XX:CMSWaitDuration=10000
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSWaitDuration=10000
> -XX:+UseCondCardMark
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:PrintFLSStatistics=1
> -Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
> -Djava.net.preferIPv4Stack=true
> -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.rmi.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
> -XX:+UnlockCommercialFeatures
> -XX:+FlightRecorder
> -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
> -XX:+CMSClassUnloadingEnabled
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms20000M
> -Xmx20000M
> -Xmn4096M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k
> -XX:StringTableSize=1000003
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler
> -XX:CMSWaitDuration=10000
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSWaitDuration=10000
> -XX:+UseCondCardMark
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:PrintFLSStatistics=1
> -Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
> -Djava.net.preferIPv4Stack=true
> -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.rmi.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
> -XX:+UnlockCommercialFeatures
> -XX:+FlightRecorder
> -Dlogback.configurationFile=logback.xml
> -Dcassandra.logdir=/var/log/cassandra
> -Dcassandra.storagedir=
> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to