Typically nothing is ever logged other than the GC failures
In addition to the heapdumps,
be useful to see some GC logs
(turn on GC logs via cassandra.in.sh
Or add
-Xloggc:/var/log/cassandra/gc.log
-XX:+PrintGCDetails
)
thanks, Sri
On May 7, 2011, at 6:37 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
The live:serialized size ratio depends on what your data looks like
(small columns will be less efficient than large blobs) but using the
rule of thumb of 10x, around 1G * (1 + memtable_flush_writers +
memtable_flush_queue_size).
So first thing I would do is drop writers and queue to 1 and 1.
Then I would drop the max heap to 1G, memtable size to 8MB so the heap
dump is easier to analyze. Then let it OOM and look at the dump with
http://www.eclipse.org/mat/
On Sat, May 7, 2011 at 3:54 PM, Serediuk, Adam
<adam.sered...@serialssolutions.com> wrote:
How much memory should a single hot cf with a 128mb memtable take
with row and key caching disabled during read?
Because I'm seeing heap go from 3.5gb skyrocketing straight to max
(regardless of the size, 8gb and 24gb both do the same) at which
time the jvm will do nothing but full gc and is unable to reclaim
any meaningful amount of memory. Cassandra then becomes unusable.
I see the same behavior with smaller memtables, eg 64mb.
This happens well into the read operation an only on a small number
of nodes in the cluster(1-4 out of a total of 60 nodes.)
Sent from my iPhone
On May 6, 2011, at 22:45, "Jonathan Ellis" <jbel...@gmail.com> wrote:
You don't GC storm without legitimately having a too-full heap.
It's
normal to see occasional full GCs from fragmentation, but that will
actually compact the heap and everything goes back to normal IF you
had space actually freed up.
You say you've played w/ memtable size but that would still be my
bet.
Most people severely underestimate how much space this takes (10x in
memory over serialized size), which will bite you when you have lots
of CFs defined.
Otherwise, force a heap dump after a full GC and take a look to see
what's referencing all the memory.
On Fri, May 6, 2011 at 12:25 PM, Serediuk, Adam
<adam.sered...@serialssolutions.com> wrote:
We're troubleshooting a memory usage problem during batch reads.
We've spent the last few days profiling and trying different GC
settings. The symptoms are that after a certain amount of time
during reads one or more nodes in the cluster will exhibit
extreme memory pressure followed by a gc storm. We've tried every
possible JVM setting and different GC methods and the issue
persists. This is pointing towards something instantiating a lot
of objects and keeping references so that they can't be cleaned up.
Typically nothing is ever logged other than the GC failures
however just now one of the nodes emitted logs we've never seen
before:
INFO [ScheduledTasks:1] 2011-05-06 15:04:55,085
StorageService.java (line 2218) Unable to reduce heap usage since
there are no dirty column families
We have tried increasing the heap on these nodes to large values,
eg 24GB and still run into the same issue. We're running 8GB of
heap normally and only one or two nodes will ever exhibit this
issue, randomly. We don't use key/row caching and our memtable
sizing is 64mb/0.3. Larger or smaller memtables make no
difference in avoiding the issue. We're on 0.7.5, mmap, jna and
jdk 1.6.0_24
We've somewhat hit the wall in troubleshooting and any advice is
greatly appreciated.
--
Adam
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra
support
http://www.datastax.com
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com