Re: OOM (or, what settings to use on AWS large?)

William Oberman Wed, 22 Jun 2011 05:50:27 -0700

Well, I managed to run 50 days before an OOM, so any changes I make will
take a while to test ;-)  I've seen the GCInspector log lines appear
periodically in my logs, but I didn't see a correlation with the crash.


I'll read the instructions on how to properly do a rolling upgrade today,
practice on test, and try that on production first.

will

On Wed, Jun 22, 2011 at 8:41 AM, Sasha Dolgy <sdo...@gmail.com> wrote:

> We had a similar problem a last month and found that the OS eventually
> in the end killed the Cassandra process on each of our nodes ... I've
> upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but
> i do see consumption levels rising consistently from one day to the
> next on each node ..
>
> On Wed, Jun 1, 2011 at 2:30 PM, Sasha Dolgy <sdo...@gmail.com> wrote:
> > is there a specific string I should be looking for in the logs that
> > isn't super obvious to me at the moment...
> >
> > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >> The place to start is with the statistics Cassandra logs after each GC.
>
> look for GCInspector
>
> I found this in the logs on all my servers but never did much after
> that....
>
> On Wed, Jun 22, 2011 at 2:33 PM, William Oberman
> <ober...@civicscience.com> wrote:
> > I woke up this morning to all 4 of 4 of my cassandra instances reporting
> > they were down in my cluster.  I quickly started them all, and everything
> > seems fine.  I'm doing a postmortem now, but it appears they all OOM'd at
> > roughly the same time, which was not reported in any cassandra log, but I
> > discovered something in /var/log/kern that showed java died of oom(*).
>  In
> > amazon, I'm using large instances for cassandra, and they have no swap
> (as
> > recommended), so I have ~8GB of ram.  Should I use a different max mem
> > setting?  I'm using a stock rpm from riptano/datastax.  If I run "ps
> -aux" I
> > get:
> > /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
> > -Xms3843M -Xmx3843M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
> > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> > -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=X.X.X.X
> > -Dcom.sun.management.jmxremote.port=8080
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false -Dmx4jaddress=0.0.0.0
> > -Dmx4jport=8081 -Dlog4j.configuration=log4j-server.properties
> > -Dlog4j.defaultInitOverride=true
> > -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp
> >
> :/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.1.3.jar:/usr/share/cassandra/lib/apache-cassandra-0.7.4.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-collections-3.2.1.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/cassandra/lib/guava-r05.jar:/usr/share/cassandra/lib/high-scale-lib.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jetty-6.1.21.jar:/usr/share/cassandra/lib/jetty-util-6.1.21.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jug-2.0.0.jar:/usr/share/cassandra/lib/libthrift-0.5.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar
> > org.apache.cassandra.thrift.CassandraDaemon
> > (*) Also, why would they all OOM so close to each other?  Bad luck?  Or
> once
> > the first node went down, is there an increased chance of the rest?
> > I'm still on 0.7.4, when I released cassandra to production that was the
> > latest release.  In addition to (or instead of?) fixing memory settings,
> I'm
> > guessing I should upgrade.
> > will
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: OOM (or, what settings to use on AWS large?)

Reply via email to