Here's tpstats on a server with traffic that I think will get OOM shortly. We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL
Is there something I can do to prevent that? (other than adding RAM...) Pool Name Active Pending Completed FILEUTILS-DELETE-POOL 0 0 55 STREAM-STAGE 0 0 6 RESPONSE-STAGE 0 0 0 ROW-READ-STAGE 8 4088 7537229 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 1 123799 22198459 GMFD 0 0 471827 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 0 ROW-MUTATION-STAGE 0 0 14142351 MESSAGE-STREAMING-POOL 0 0 16 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0 128 FLUSH-WRITER-POOL 0 0 128 AE-SERVICE-STAGE 1 1 8 HINTED-HANDOFF-POOL 0 0 10 On Sat, May 22, 2010 at 11:05 PM, Ran Tavory <ran...@gmail.com> wrote: > The message deserializer has 10m pending tasks before the oom. What do you > think makes the message deserializer blow up? I'd suspect that when it goes > up to 10m pending tasks, don't know how much mem a task actually takes up, > but they may consume a lot of memory. Is there a setting I need to tweak? > (or am I barking at the wrong tree?). > > I'll add the counters from > http://github.com/jbellis/cassandra-munin-plugins but I already have most > of them monitored, so I attached the graphs of the ones that seemed the most > suspicious in the previous email. > > The system keyspace and HH CF don't look too bad, I think, here they are: > > Keyspace: system > Read Count: 154 > Read Latency: 0.875012987012987 ms. > Write Count: 9 > Write Latency: 0.20055555555555554 ms. > Pending Tasks: 0 > Column Family: LocationInfo > SSTable count: 1 > Space used (live): 2714 > Space used (total): 2714 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 3 > Read Count: 2 > Read Latency: NaN ms. > Write Count: 9 > Write Latency: 0.011 ms. > Pending Tasks: 0 > Key cache capacity: 1 > Key cache size: 1 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 203 > Compacted row maximum size: 397 > Compacted row mean size: 300 > > Column Family: HintsColumnFamily > SSTable count: 1 > Space used (live): 1457 > Space used (total): 4371 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 152 > Read Latency: 0.369 ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 1 > Key cache size: 1 > Key cache hit rate: 0.07142857142857142 > Row cache: disabled > Compacted row minimum size: 829 > Compacted row maximum size: 829 > Compacted row mean size: 829 > > > > > > On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> Can you monitor cassandra-level metrics like the ones in >> http://github.com/jbellis/cassandra-munin-plugins ? >> >> the usual culprit is usually compaction but your compacted row size is >> small. nothing else really comes to mind. >> >> (you should check system keyspace too tho, HH rows can get large) >> >> On Fri, May 21, 2010 at 2:36 PM, Ran Tavory <ran...@gmail.com> wrote: >> > I see some OOM on one of the hosts in the cluster and I wonder if >> there's a >> > formula that'll help me calculate what's the required memory setting >> given >> > the parameters x,y,z... >> > In short, I need advice on: >> > 1. How to set up proper heap space and which parameters should I look at >> > when doing so. >> > 2. Help setting up an alert policy and define some counter measures or >> sos >> > steps an admin can take to prevent further degradation of service when >> > alerts fire. >> > The OOM is at the row mutation stage and it happens after extensive GC >> > activity. (log tail below). >> > The server has 16G physical ram and java heap space 4G. No other >> significant >> > processes run on the same server. I actually upped the java heap space >> to 8G >> > but it OOMed again... >> > Most of my settings are the defaults with a few keyspaces and a few CFs >> in >> > each KS. Here's the output of cfstats for the largest and most heavily >> used >> > CF. (currently reads/writes are stopped but data is there). >> > Keyspace: outbrain_kvdb >> > Read Count: 3392 >> > Read Latency: 160.33135908018866 ms. >> > Write Count: 2005839 >> > Write Latency: 0.029233923061621595 ms. >> > Pending Tasks: 0 >> > Column Family: KvImpressions >> > SSTable count: 8 >> > Space used (live): 21923629878 >> > Space used (total): 21923629878 >> > Memtable Columns Count: 69440 >> > Memtable Data Size: 9719364 >> > Memtable Switch Count: 26 >> > Read Count: 3392 >> > Read Latency: NaN ms. >> > Write Count: 1998821 >> > Write Latency: 0.018 ms. >> > Pending Tasks: 0 >> > Key cache capacity: 200000 >> > Key cache size: 11661 >> > Key cache hit rate: NaN >> > Row cache: disabled >> > Compacted row minimum size: 302 >> > Compacted row maximum size: 22387 >> > Compacted row mean size: 641 >> > I'm also attaching a few graphs of "the incidenst" I hope they help. >> From >> > the graphs it looks like: >> > 1. message deserializer pool is behind so maybe taking too much mem. If >> > graphs are correct, it gets as high as 10m pending before crash. >> > 2. row-read-stage has a high number of pending (4k) so first of all - >> this >> > isn't good for performance whether it caused the oom or not, and second, >> > this may also have taken up heap space and caused the crash. >> > Thanks! >> > INFO [GC inspection] 2010-05-21 00:53:25,885 GCInspector.java (line >> 110) GC >> > for ConcurrentMarkSweep: 10819 ms, 939992 reclaimed leaving 4312064504 >> used; >> > max is 4431216640 >> > INFO [GC inspection] 2010-05-21 00:53:44,605 GCInspector.java (line >> 110) GC >> > for ConcurrentMarkSweep: 9672 ms, 673400 reclaimed leaving 4312337208 >> used; >> > max is 4431216640 >> > INFO [GC inspection] 2010-05-21 00:54:23,110 GCInspector.java (line >> 110) GC >> > for ConcurrentMarkSweep: 9150 ms, 402072 reclaimed leaving 4312609776 >> used; >> > max is 4431216640 >> > ERROR [ROW-MUTATION-STAGE:19] 2010-05-21 01:55:37,951 >> CassandraDaemon.java >> > (line 88) Fatal exception in thread Thread[ROW-MUTATION-STAGE:19,5,main] >> > java.lang.OutOfMemoryError: Java heap space >> > ERROR [Thread-10] 2010-05-21 01:55:37,951 CassandraDaemon.java (line 88) >> > Fatal exception in thread Thread[Thread-10,5,main] >> > java.lang.OutOfMemoryError: Java heap space >> > ERROR [CACHETABLE-TIMER-2] 2010-05-21 01:55:37,951 CassandraDaemon.java >> > (line 88) Fatal exception in thread Thread[CACHETABLE-TIMER-2,5,main] >> > java.lang.OutOfMemoryError: Java heap space >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > >