Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

Victor Kabdebon Sat, 18 Dec 2010 19:38:41 -0800

Hello everybody,

I actually have the exact same problem. I have very little amount of data (
few hundred kb) and the memory consumption goes up without any end. in
sight. For
On my node I have limited ram ( 2 Gb) to run cassandra, but since I have
very little data, I fought it was not a problem, here is the result of $du :


vic...@****:~$ du /opt/cassandra/data/ -h
40K    /opt/cassandra/data/system
1,7M    /opt/cassandra/data/FallingDown
1,7M    /opt/cassandra/data/

Now, if I look at :
vic...@****:~$ sudo ps aux | grep "cassandra"
cassandra     11034  0.2 22.9 *1107772 462764* ?      Sl   Dec17   6:13
/usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
org.apache.cassandra.thrift.CassandraDaemon

Cassandra uses 462764 Kb, roughly 460 Mb for 2 Mb of data... And it keeps
getting bigger.
It is important to know that I have just a few insert, quite a lot of read
though. Also Cassandra seams to completly ignore the JVM limitations such as
Xmx.
If I don't stop and launch Cassandra every 15 ou 20 days it simply crashes,
due to oom errors.

Is there an explanation for this ?

Thank you all,
Victor

2010/12/18 Zhu Han <schumi....@gmail.com>

> Here is a typo, sorry...
>
> best regards,
> hanzhu
>
>
> On Sun, Dec 19, 2010 at 10:29 AM, Zhu Han <schumi....@gmail.com> wrote:
>
>> The problem seems still like the C-heap of JVM, which leaks 70MB every
>> day. Here is the summary:
>>
>> on 12/19: 00000000010c3000 178548K rw---    [ anon ]
>> on 12/18: 00000000010c3000 110320K rw---    [ anon ]
>> on 12/17: 00000000010c3000  39256K rw---    [ anon ]
>>
>> This should not be the JVM object heap, because the object heap size is
>> fixed up per the below JVM settings. Here is the map of JVM object heap,
>> which remains constant.
>>
>> 00000000010c3000  39256K rw---    [ anon ]
>>
>
> It should be :
> 00002b58433c0000 1069824K rw---    [ anon ]
>
>
>>
>> I'll paste it to open-jdk mailist to seek for help.
>>
>> Zhu,
>>> Couple of quick questions:
>>>  How many threads are in your JVM?
>>>
>>
>> There are hundreds of threads. Here is the settings of Cassandra:
>> 1)  *<ConcurrentReads>8</ConcurrentReads>
>>   <ConcurrentWrites>128</ConcurrentWrites>*
>>
>> The thread stack size on this server is 1MB. So I observe hundreds of
>> single mmap segment as 1MB.
>>
>>  Can you also post the full commandline as well?
>>>
>> Sure. All of them are default settings.
>>
>> /usr/bin/java -ea -Xms1G -Xmx1G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
>> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8080
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dstorage-config=bin/../conf -cp
>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.8.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
>> org.apache.cassandra.thrift.CassandraDaemon
>>
>>
>>>  Also, output of cat /proc/meminfo
>>>
>>
>> This is an openvz based testing environment. So /proc/meminfo is not very
>> helpful. Whatever, I paste it here.
>>
>>
>> MemTotal:      9838380 kB
>> MemFree:       4005900 kB
>> Buffers:             0 kB
>> Cached:              0 kB
>> SwapCached:          0 kB
>> Active:              0 kB
>> Inactive:            0 kB
>> HighTotal:           0 kB
>> HighFree:            0 kB
>> LowTotal:      9838380 kB
>> LowFree:       4005900 kB
>> SwapTotal:           0 kB
>> SwapFree:            0 kB
>> Dirty:               0 kB
>> Writeback:           0 kB
>> AnonPages:           0 kB
>> Mapped:              0 kB
>> Slab:                0 kB
>> PageTables:          0 kB
>> NFS_Unstable:        0 kB
>> Bounce:              0 kB
>> CommitLimit:         0 kB
>> Committed_AS:        0 kB
>> VmallocTotal:        0 kB
>> VmallocUsed:         0 kB
>> VmallocChunk:        0 kB
>> HugePages_Total:     0
>> HugePages_Free:      0
>> HugePages_Rsvd:      0
>> Hugepagesize:     2048 kB
>>
>>
>>> thanks,
>>> Sri
>>>
>>> On Fri, Dec 17, 2010 at 7:15 PM, Zhu Han <schumi....@gmail.com> wrote:
>>>
>>> > Seems like  the problem there after I upgrade to "OpenJDK Runtime
>>> > Environment (IcedTea6 1.9.2)". So it is not related to the bug I
>>> reported
>>> > two days ago.
>>> >
>>> > Can somebody else share some info with us? What's the java environment
>>> you
>>> > used? Is it stable for long-lived cassandra instances?
>>> >
>>> > best regards,
>>> > hanzhu
>>> >
>>> >
>>> > On Thu, Dec 16, 2010 at 9:28 PM, Zhu Han <schumi....@gmail.com> wrote:
>>> >
>>> > > I've tried it. But it does not work for me this afternoon.
>>> > >
>>> > > Thank you!
>>> > >
>>> > > best regards,
>>> > > hanzhu
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Dec 16, 2010 at 8:59 PM, Matthew Conway <m...@backupify.com
>>> > >wrote:
>>> > >
>>> > >> Thanks for debugging this, I'm running into the same problem.
>>> > >> BTW, if you can ssh into your nodes, you can use jconsole over ssh:
>>> > >> http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html
>>> > >>
>>> > >> Matt
>>> > >>
>>> > >>
>>> > >> On Dec 16, 2010, at Thu Dec 16, 2:39 AM, Zhu Han wrote:
>>> > >>
>>> > >> > Sorry for spam again. :-)
>>> > >> >
>>> > >> > I think I find the root cause. Here is a bug report[1] on memory
>>> leak
>>> > of
>>> > >> > ParNewGC.  It is solved by OpenJDK 1.6.0_20(IcedTea6 1.9.2)[2].
>>> > >> >
>>> > >> > So the suggestion is: for who runs cassandra  of Ubuntu 10.04,
>>> please
>>> > >> > upgrade OpenJDK to the latest version.
>>> > >> >
>>> > >> > [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6824570
>>> > >> > [2]
>>> > http://blog.fuseyism.com/index.php/2010/09/10/icedtea6-19-released/
>>> > >> >
>>> > >> > best regards,
>>> > >> > hanzhu
>>> > >> >
>>> > >> >
>>> > >> > On Thu, Dec 16, 2010 at 3:10 PM, Zhu Han <schumi....@gmail.com>
>>> > wrote:
>>> > >> >
>>> > >> >> The test node is behind a firewall. So I took some time to find a
>>> way
>>> > >> to
>>> > >> >> get JMX diagnostic information from it.
>>> > >> >>
>>> > >> >> What's interesting is, both the HeapMemoryUsage and
>>> > NonHeapMemoryUsage
>>> > >> >> reported by JVM is quite reasonable.  So, it's a myth why the JVM
>>> > >> process
>>> > >> >> maps such a big anonymous memory region...
>>> > >> >>
>>> > >> >> $ java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar -
>>> > localhost:8080
>>> > >> >> java.lang:type=Memory HeapMemoryUsage
>>> > >> >> 12/16/2010 15:07:45 +0800 org.archive.jmx.Client HeapMemoryUsage:
>>> > >> >> committed: 1065025536
>>> > >> >> init: 1073741824
>>> > >> >> max: 1065025536
>>> > >> >> used: 18295328
>>> > >> >>
>>> > >> >> $java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar -
>>> > localhost:8080
>>> > >> >> java.lang:type=Memory NonHeapMemoryUsage
>>> > >> >> 12/16/2010 15:01:51 +0800 org.archive.jmx.Client
>>> NonHeapMemoryUsage:
>>> > >> >> committed: 34308096
>>> > >> >> init: 24313856
>>> > >> >> max: 226492416
>>> > >> >> used: 21475376
>>> > >> >>
>>> > >> >> If anybody is interested in it, I can provide more diagnostic
>>> > >> information
>>> > >> >> before I restart the instance.
>>> > >> >>
>>> > >> >> best regards,
>>> > >> >> hanzhu
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> On Thu, Dec 16, 2010 at 1:00 PM, Zhu Han <schumi....@gmail.com>
>>> > wrote:
>>> > >> >>
>>> > >> >>> After investigating it deeper,  I suspect it's native memory
>>> leak of
>>> > >> JVM.
>>> > >> >>> The large anonymous map on lower address space should be the
>>> native
>>> > >> heap of
>>> > >> >>> JVM,  but not java object heap.  Has anybody met it before?
>>> > >> >>>
>>> > >> >>> I'll try to upgrade the JVM tonight.
>>> > >> >>>
>>> > >> >>> best regards,
>>> > >> >>> hanzhu
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>> On Thu, Dec 16, 2010 at 10:50 AM, Zhu Han <schumi....@gmail.com
>>> >
>>> > >> wrote:
>>> > >> >>>
>>> > >> >>>> Hi,
>>> > >> >>>>
>>> > >> >>>> I have a test node with apache-cassandra-0.6.8 on ubuntu 10.4.
>>>  The
>>> > >> >>>> hardware environment is an OpenVZ container. JVM settings is
>>> > >> >>>> # java -Xmx128m -version
>>> > >> >>>> java version "1.6.0_18"
>>> > >> >>>> OpenJDK Runtime Environment (IcedTea6 1.8.2)
>>> (6b18-1.8.2-4ubuntu2)
>>> > >> >>>> OpenJDK 64-Bit Server VM (build 16.0-b13, mixed mode)
>>> > >> >>>>
>>> > >> >>>> This is the memory settings:
>>> > >> >>>>
>>> > >> >>>> "/usr/bin/java -ea -Xms1G -Xmx1G ..."
>>> > >> >>>>
>>> > >> >>>> And the ondisk footprint of sstables is very small:
>>> > >> >>>>
>>> > >> >>>> "#du -sh data/
>>> > >> >>>> "9.8M    data/"
>>> > >> >>>>
>>> > >> >>>> The node was infrequently accessed in the last  three weeks.
>>>  After
>>> > >> that,
>>> > >> >>>> I observe the abnormal memory utilization by top:
>>> > >> >>>>
>>> > >> >>>>  PID USER      PR  NI  *VIRT*  *RES*  SHR S %CPU %MEM    TIME+
>>> > >> >>>> COMMAND
>>> > >> >>>>
>>> > >> >>>> 7836 root      15   0     *3300m* *2.4g*  13m S    0 26.0
>>> 2:58.51
>>> > >> >>>> java
>>> > >> >>>>
>>> > >> >>>> The jvm heap utilization is quite normal:
>>> > >> >>>>
>>> > >> >>>> #sudo jstat -gc -J"-Xmx128m" 7836
>>> > >> >>>> S0C    S1C    S0U    S1U      *EC*       *EU*          *OC*
>>> > >> >>>> *OU*            *PC           PU*          YGC  YGCT  FGC
>>>  FGCT
>>> > >> >>>> GCT
>>> > >> >>>> 8512.0 8512.0 372.8   0.0   *68160.0*   *5225.7*   *963392.0
>>> > >> 508200.7
>>> > >> >>>> 30604.0 18373.4*    480    3.979      2      0.005    3.984
>>> > >> >>>>
>>> > >> >>>> And then I try "pmap" to see the native memory mapping. *There
>>> is
>>> > two
>>> > >> >>>> large anonymous mmap regions.*
>>> > >> >>>>
>>> > >> >>>> 00000000080dc000 1573568K rw---    [ anon ]
>>> > >> >>>> 00002b2afc900000  1079180K rw---    [ anon ]
>>> > >> >>>>
>>> > >> >>>> The second one should be JVM heap.  What is the first one?
>>>  Mmap of
>>> > >> >>>> sstable should never be anonymous mmap, but file based mmap.
>>>  *Is
>>> > it
>>> > >>  a
>>> > >> >>>> native memory leak?  *Does cassandra allocate any
>>> DirectByteBuffer?
>>> > >> >>>>
>>> > >> >>>> best regards,
>>> > >> >>>> hanzhu
>>> > >> >>>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

Reply via email to