Re: oom in ROW-MUTATION-STAGE

Jonathan Ellis Sun, 23 May 2010 11:07:21 -0700

no, cache does not use soft references since they pretty much suck for
caching (the javadoc is not always right :).


you're oom-ing b/c you're making requests faster than they can be
satisfied.  increasing the amount of memory available there will just
make it take longer before it OOMs, it won't fix the problem.

so, use as much memory as you can for the cache, and if that's not
enough, you need to add capacity or do some kind of client rate
limiting.

On Sun, May 23, 2010 at 12:50 PM, Ran Tavory <ran...@gmail.com> wrote:
> I am disk bound, certainly. I'll try adding more keys and row caching, but I
> suspect it's a short blanket, if I add more caching I'll have less free
> memory so more chance to OOM again. (is the cache using soft ref so it won't
> take mem from real objects?)
>
> On Sun, May 23, 2010 at 8:15 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> On Sun, May 23, 2010 at 10:59 AM, Ran Tavory <ran...@gmail.com> wrote:
>> > Is there another solution except adding capacity?
>>
>> Either you need to get more performance/node or increase node count. :)
>>
>> > How does the ConcurrentReads (default 8) affect that? If I expect to
>> > have
>> > similar number of reads and writes should I set the ConcurrentReads
>> > equal
>> > to ConcurrentWrites (default 32) ?
>>
>> You should figure out where the bottleneck is, before tweaking things:
>> http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>>
>> Increasing CR will only help if you are (a) cpu bound and (b) have so
>> many cores that 8 threads isn't saturating them.
>>
>> Sight unseen, my guess is you are disk bound.  iostat can confirm this.
>>
>> If that's the case then you can try to reduce the disk load w/ row
>> cache or key cache.
>>
>> > On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis <jbel...@gmail.com>
>> > wrote:
>> >>
>> >> looks like reads are backing up, which in turn is making deserialize
>> >> back
>> >> up
>> >>
>> >> On Sun, May 23, 2010 at 4:25 AM, Ran Tavory <ran...@gmail.com> wrote:
>> >> > Here's tpstats on a server with traffic that I think will get OOM
>> >> > shortly.
>> >> > We have 4k pending reads and 123k pending at
>> >> > MESSAGE-DESERIALIZER-POOL
>> >> > Is there something I can do to prevent that? (other than adding
>> >> > RAM...)
>> >> > Pool Name                    Active   Pending      Completed
>> >> > FILEUTILS-DELETE-POOL             0         0             55
>> >> > STREAM-STAGE                      0         0              6
>> >> > RESPONSE-STAGE                    0         0              0
>> >> > ROW-READ-STAGE                    8      4088        7537229
>> >> > LB-OPERATIONS                     0         0              0
>> >> > MESSAGE-DESERIALIZER-POOL         1    123799       22198459
>> >> > GMFD                              0         0         471827
>> >> > LB-TARGET                         0         0              0
>> >> > CONSISTENCY-MANAGER               0         0              0
>> >> > ROW-MUTATION-STAGE                0         0       14142351
>> >> > MESSAGE-STREAMING-POOL            0         0             16
>> >> > LOAD-BALANCER-STAGE               0         0              0
>> >> > FLUSH-SORTER-POOL                 0         0              0
>> >> > MEMTABLE-POST-FLUSHER             0         0            128
>> >> > FLUSH-WRITER-POOL                 0         0            128
>> >> > AE-SERVICE-STAGE                  1         1              8
>> >> > HINTED-HANDOFF-POOL               0         0             10
>> >> >
>> >> > On Sat, May 22, 2010 at 11:05 PM, Ran Tavory <ran...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> The message deserializer has 10m pending tasks before the oom. What
>> >> >> do
>> >> >> you
>> >> >> think makes the message deserializer blow up? I'd suspect that when
>> >> >> it
>> >> >> goes
>> >> >> up to 10m pending tasks, don't know how much mem a task actually
>> >> >> takes
>> >> >> up,
>> >> >> but they may consume a lot of memory. Is there a setting I need to
>> >> >> tweak?
>> >> >> (or am I barking at the wrong tree?).
>> >> >> I'll add the counters
>> >> >> from http://github.com/jbellis/cassandra-munin-plugins but I already
>> >> >> have
>> >> >> most of them monitored, so I attached the graphs of the ones that
>> >> >> seemed the
>> >> >> most suspicious in the previous email.
>> >> >> The system keyspace and HH CF don't look too bad, I think, here they
>> >> >> are:
>> >> >> Keyspace: system
>> >> >>         Read Count: 154
>> >> >>         Read Latency: 0.875012987012987 ms.
>> >> >>         Write Count: 9
>> >> >>         Write Latency: 0.20055555555555554 ms.
>> >> >>         Pending Tasks: 0
>> >> >>                 Column Family: LocationInfo
>> >> >>                 SSTable count: 1
>> >> >>                 Space used (live): 2714
>> >> >>                 Space used (total): 2714
>> >> >>                 Memtable Columns Count: 0
>> >> >>                 Memtable Data Size: 0
>> >> >>                 Memtable Switch Count: 3
>> >> >>                 Read Count: 2
>> >> >>                 Read Latency: NaN ms.
>> >> >>                 Write Count: 9
>> >> >>                 Write Latency: 0.011 ms.
>> >> >>                 Pending Tasks: 0
>> >> >>                 Key cache capacity: 1
>> >> >>                 Key cache size: 1
>> >> >>                 Key cache hit rate: NaN
>> >> >>                 Row cache: disabled
>> >> >>                 Compacted row minimum size: 203
>> >> >>                 Compacted row maximum size: 397
>> >> >>                 Compacted row mean size: 300
>> >> >>                 Column Family: HintsColumnFamily
>> >> >>                 SSTable count: 1
>> >> >>                 Space used (live): 1457
>> >> >>                 Space used (total): 4371
>> >> >>                 Memtable Columns Count: 0
>> >> >>                 Memtable Data Size: 0
>> >> >>                 Memtable Switch Count: 0
>> >> >>                 Read Count: 152
>> >> >>                 Read Latency: 0.369 ms.
>> >> >>                 Write Count: 0
>> >> >>                 Write Latency: NaN ms.
>> >> >>                 Pending Tasks: 0
>> >> >>                 Key cache capacity: 1
>> >> >>                 Key cache size: 1
>> >> >>                 Key cache hit rate: 0.07142857142857142
>> >> >>                 Row cache: disabled
>> >> >>                 Compacted row minimum size: 829
>> >> >>                 Compacted row maximum size: 829
>> >> >>                 Compacted row mean size: 829
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis <jbel...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Can you monitor cassandra-level metrics like the ones in
>> >> >>> http://github.com/jbellis/cassandra-munin-plugins ?
>> >> >>>
>> >> >>> the usual culprit is usually compaction but your compacted row size
>> >> >>> is
>> >> >>> small.  nothing else really comes to mind.
>> >> >>>
>> >> >>> (you should check system keyspace too tho, HH rows can get large)
>> >> >>>
>> >> >>> On Fri, May 21, 2010 at 2:36 PM, Ran Tavory <ran...@gmail.com>
>> >> >>> wrote:
>> >> >>> > I see some OOM on one of the hosts in the cluster and I wonder if
>> >> >>> > there's a
>> >> >>> > formula that'll help me calculate what's the required memory
>> >> >>> > setting
>> >> >>> > given
>> >> >>> > the parameters x,y,z...
>> >> >>> > In short, I need advice on:
>> >> >>> > 1. How to set up proper heap space and which parameters should I
>> >> >>> > look
>> >> >>> > at
>> >> >>> > when doing so.
>> >> >>> > 2. Help setting up an alert policy and define some counter
>> >> >>> > measures
>> >> >>> > or
>> >> >>> > sos
>> >> >>> > steps an admin can take to prevent further degradation of service
>> >> >>> > when
>> >> >>> > alerts fire.
>> >> >>> > The OOM is at the row mutation stage and it happens after
>> >> >>> > extensive
>> >> >>> > GC
>> >> >>> > activity. (log tail below).
>> >> >>> > The server has 16G physical ram and java heap space 4G. No other
>> >> >>> > significant
>> >> >>> > processes run on the same server. I actually upped the java heap
>> >> >>> > space
>> >> >>> > to 8G
>> >> >>> > but it OOMed again...
>> >> >>> > Most of my settings are the defaults with a few keyspaces and a
>> >> >>> > few
>> >> >>> > CFs
>> >> >>> > in
>> >> >>> > each KS. Here's the output of cfstats for the largest and most
>> >> >>> > heavily
>> >> >>> > used
>> >> >>> > CF. (currently reads/writes are stopped but data is there).
>> >> >>> > Keyspace: outbrain_kvdb
>> >> >>> >         Read Count: 3392
>> >> >>> >         Read Latency: 160.33135908018866 ms.
>> >> >>> >         Write Count: 2005839
>> >> >>> >         Write Latency: 0.029233923061621595 ms.
>> >> >>> >         Pending Tasks: 0
>> >> >>> >                 Column Family: KvImpressions
>> >> >>> >                 SSTable count: 8
>> >> >>> >                 Space used (live): 21923629878
>> >> >>> >                 Space used (total): 21923629878
>> >> >>> >                 Memtable Columns Count: 69440
>> >> >>> >                 Memtable Data Size: 9719364
>> >> >>> >                 Memtable Switch Count: 26
>> >> >>> >                 Read Count: 3392
>> >> >>> >                 Read Latency: NaN ms.
>> >> >>> >                 Write Count: 1998821
>> >> >>> >                 Write Latency: 0.018 ms.
>> >> >>> >                 Pending Tasks: 0
>> >> >>> >                 Key cache capacity: 200000
>> >> >>> >                 Key cache size: 11661
>> >> >>> >                 Key cache hit rate: NaN
>> >> >>> >                 Row cache: disabled
>> >> >>> >                 Compacted row minimum size: 302
>> >> >>> >                 Compacted row maximum size: 22387
>> >> >>> >                 Compacted row mean size: 641
>> >> >>> > I'm also attaching a few graphs of "the incidenst" I hope they
>> >> >>> > help.
>> >> >>> > From
>> >> >>> > the graphs it looks like:
>> >> >>> > 1. message deserializer pool is behind so maybe taking too much
>> >> >>> > mem.
>> >> >>> > If
>> >> >>> > graphs are correct, it gets as high as 10m pending before crash.
>> >> >>> > 2. row-read-stage has a high number of pending (4k) so first of
>> >> >>> > all
>> >> >>> > -
>> >> >>> > this
>> >> >>> > isn't good for performance whether it caused the oom or not, and
>> >> >>> > second,
>> >> >>> > this may also have taken up heap space and caused the crash.
>> >> >>> > Thanks!
>> >> >>> >  INFO [GC inspection] 2010-05-21 00:53:25,885 GCInspector.java
>> >> >>> > (line
>> >> >>> > 110) GC
>> >> >>> > for ConcurrentMarkSweep: 10819 ms, 939992 reclaimed leaving
>> >> >>> > 4312064504
>> >> >>> > used;
>> >> >>> > max is 4431216640
>> >> >>> >  INFO [GC inspection] 2010-05-21 00:53:44,605 GCInspector.java
>> >> >>> > (line
>> >> >>> > 110) GC
>> >> >>> > for ConcurrentMarkSweep: 9672 ms, 673400 reclaimed leaving
>> >> >>> > 4312337208
>> >> >>> > used;
>> >> >>> > max is 4431216640
>> >> >>> >  INFO [GC inspection] 2010-05-21 00:54:23,110 GCInspector.java
>> >> >>> > (line
>> >> >>> > 110) GC
>> >> >>> > for ConcurrentMarkSweep: 9150 ms, 402072 reclaimed leaving
>> >> >>> > 4312609776
>> >> >>> > used;
>> >> >>> > max is 4431216640
>> >> >>> > ERROR [ROW-MUTATION-STAGE:19] 2010-05-21 01:55:37,951
>> >> >>> > CassandraDaemon.java
>> >> >>> > (line 88) Fatal exception in thread
>> >> >>> > Thread[ROW-MUTATION-STAGE:19,5,main]
>> >> >>> > java.lang.OutOfMemoryError: Java heap space
>> >> >>> > ERROR [Thread-10] 2010-05-21 01:55:37,951 CassandraDaemon.java
>> >> >>> > (line
>> >> >>> > 88)
>> >> >>> > Fatal exception in thread Thread[Thread-10,5,main]
>> >> >>> > java.lang.OutOfMemoryError: Java heap space
>> >> >>> > ERROR [CACHETABLE-TIMER-2] 2010-05-21 01:55:37,951
>> >> >>> > CassandraDaemon.java
>> >> >>> > (line 88) Fatal exception in thread
>> >> >>> > Thread[CACHETABLE-TIMER-2,5,main]
>> >> >>> > java.lang.OutOfMemoryError: Java heap space
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Jonathan Ellis
>> >> >>> Project Chair, Apache Cassandra
>> >> >>> co-founder of Riptano, the source for professional Cassandra
>> >> >>> support
>> >> >>> http://riptano.com
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: oom in ROW-MUTATION-STAGE

Reply via email to