> storms and very slow responses. Also, I'd like to be able to load more data
> to the cluster and I'm hitting the memory wall, which I didn't expect.
> In the cassandra.in.sh you'd notice that I do provide Xmx=12G but given that
> there's so little data I wouldn't expect the process to be using all of
> that. As a matter of fact I wanted to insert more data to the cluster but I

So it seems to be heap space rather than mmap(), given:

> concurrent mark-sweep generation:
>    capacity = 12841320448 (12246.4375MB)
>    used     = 10867324872 (10363.888618469238MB)
>    free     = 1973995576 (1882.5488815307617MB)
>    84.62778353679785% used

The JVM will tend to gobble up the memory you allow it to gobble up
with -Xmx, depending on circumstances. Yes, there are heuristics in
the VM that are intended to prevent it from immediately going to -Xmx
if not needed, but in my experience there are many situations where
these heuristics mostly fail. This is particularly an issue with
CMS/G1, which have to try to stay incremental and avoid pauses while
at the same time trying to do something decent about memory use. The
default throughput collector should be better here (I presume, I
haven't bothered experimenting with it much since it's uninteresting
;)).

The situation is complicated because in general garbage collection
becomes more efficient the more memory you give it, so there is a
direct trade-off between memory use and performance. For this reason
one of the heuristics (which I believe are in place with CMS too;
definitely with G1) is how much time is spent on GC. Certain patterns
can cause this heuristic to cause the heap size to be bumped
(instantly to -Xmx in the case of G1 anyway...).

You can try tweaking CMS settings (there are several, such as forcing
concurrent mark to start at a constant occupancy rate, tweaking
minfree/minused etc), but it is difficult to get right. But by far the
easiest thing to do in your situation is probably to determine roughly
how large the live set is (looking at how much memory is used after a
concurrent mark/sweep as just finished is a good way of doing this)
and then set -Xmx accordingly instead of at 12 GB.

-- 
/ Peter Schuller

Reply via email to