Re: Survey: Cassandra/JVM Resident Set Size increase

Zhu Han Wed, 13 Jul 2011 21:32:43 -0700

On Wed, Jul 13, 2011 at 9:45 PM, Konstantin Naryshkin
<konstant...@a-bb.net>wrote:


> Do you mean that it is using all of the available heap? That is the
> expected behavior of most long running Java applications. The JVM will not
> GC until it needs memory (or you explicitly ask it to) and will only free up
> a bit of memory at a time. That is very good behavior from a performance
> stand point since frequent, large GCs would make your application very
> unresponsive. It also makes Java applications take up all the memory you
> give them.
>
> ----- Original Message -----
> From: "Sasha Dolgy" <sdo...@gmail.com>
> To: user@cassandra.apache.org
> Sent: Tuesday, July 12, 2011 10:23:02 PM
> Subject: Re: Survey: Cassandra/JVM Resident Set Size increase
>
> I'll post more tomorrow ... However, we set up one node in a single node
> cluster and have left it with no data....reviewing memory consumption
> graphs...it increased daily until it gobbled (highly technical term) all
> memory...the system is now running just below 100% memory usage....which i
> find peculiar seeings that it is doing nothing............with no data and
> no peers.
> On Jul 12, 2011 3:29 PM, "Chris Burroughs" <chris.burrou...@gmail.com>
> wrote:
> > ### Preamble
> >
> > There have been several reports on the mailing list of the JVM running
> > Cassandra using "too much" memory. That is, the resident set size is
> >>>(max java heap size + mmaped segments) and continues to grow until the
> > process swaps, kernel oom killer comes along, or performance just
> > degrades too far due to the lack of space for the page cache. It has
> > been unclear from these reports if there is a pattern. My hope here is
> > that by comparing JVM versions, OS versions, JVM configuration etc., we
> > will find something. Thank you everyone for your time.
> >
> >
> > Some example reports:
> > - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
> > -
> >
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
> > - https://issues.apache.org/jira/browse/CASSANDRA-2868
> > -
> >
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
> > -
> >
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html
> >
> > For reference theories include (in no particular order):
> > - memory fragmentation
> > - JVM bug
> > - OS/glibc bug
> > - direct memory
> > - swap induced fragmentation
> > - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.
> >
> > ### Survey
> >
> > 1. Do you think you are experiencing this problem?
>

Yes.


> >
> > 2. Why? (This is a good time to share a graph like
> > http://www.twitpic.com/5fdabn or
> > http://img24.imageshack.us/img24/1754/cassandrarss.png)
>

I observe  the RSS of cassandra process keeps going up to dozens of
gigabytes, even if the dataset (sstables) is just hundreds of megabytes.

> >
> > 2. Are you using mmap? (If yes be sure to have read
> > http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have
> > used pmap [or another tool] to rule you mmap and top decieving you.)
>

Yes. pmap tells me a lot of anonymous regions are created and expanded
during the life cycle
of cassandra process. That is is primary reason of RSS occupy. I'm pretty
these anonymous regions are  not the Java heap used by JVM, as they are not
continuous.

>
> > 3. Are you using JNA? Was mlockall succesful (it's in the logs on
> startup)?
>

Yes. mlockall is successful either. I have not tried other settings.


> >
> > 4. Is swap enabled? Are you swapping?
>

No. Swap is disabled.


> >
> > 5. What version of Apache Cassandra are you using?
>

0.6.13


> >
> > 6. What is the earliest version of Apache Cassandra you recall seeing
> > this problem with?
>

Earlier version of 0.6.x branch.


> >
> > 7. Have you tried the patch from CASSANDRA-2654 ?
>

Not yet, as I do not query large datasets.


> >
> > 8. What jvm and version are you using?
>

"java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)"

I also tried openJDK.


>
> > 9. What OS and version are you using?
>

 The kernel version is "2.6.18-194.26.1.el5.028stab079.2", which is from
CentOS 5.4

The user level environment is Ubuntu 10.04 (Lucid) server edition.  This
strange combination is because cassandra runs inside OpenVZ container
(Ubuntu 10.04) above Cent OS host.

I am afraid the old kernel caused the memory fragmentation of cassandra
process. But I can not prove it as I did not try it on latest kernel.

>
> > 10. What are your jvm flags?
>

Both CMS and parallel old GC can observe the problem. These are the flags
used:

"        -ea         -Xms3G        -Xmx3G         -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC         -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8      -XX:MaxTenuringThreshold=1
        -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly         -XX:+HeapDumpOnOutOfMemoryError "

"-ea -Xms3G -Xmx3G -XX:+UseParallelOldGC -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError"



> >
> > 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)
> >
>

Not yet. Is it helpful?


> > 12. Can you characterise how much GC your cluster is doing?
>

This is one node of the test cluster. It has been idle most of the time
since it was restarted 12 days ago.

$ sudo jstat -gcutil 26166
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 11.92   0.00   4.60  78.92  49.76    282    5.756     1    0.639    6.395



> >
> > 13. Approximately how many read/writes per unit time is your cluster
> > doing (per node or the whole cluster)?
>

The load is very light as it is a test cluster.

>
> > 14. How are you column families configured (key cache size, row cache
> > size, etc.)?
> >
>

Row cache is disabled totally.

Key cache size is disabled for the two of the largest CFs. For other CFs, it
is enabled. However, the total size of these SSTables with key cache enabled
is just 30 MBs.

Re: Survey: Cassandra/JVM Resident Set Size increase

Reply via email to