On Sun, May 23, 2010 at 10:59 AM, Ran Tavory <ran...@gmail.com> wrote: > Is there another solution except adding capacity?
Either you need to get more performance/node or increase node count. :) > How does the ConcurrentReads (default 8) affect that? If I expect to have > similar number of reads and writes should I set the ConcurrentReads equal > to ConcurrentWrites (default 32) ? You should figure out where the bottleneck is, before tweaking things: http://spyced.blogspot.com/2010/01/linux-performance-basics.html Increasing CR will only help if you are (a) cpu bound and (b) have so many cores that 8 threads isn't saturating them. Sight unseen, my guess is you are disk bound. iostat can confirm this. If that's the case then you can try to reduce the disk load w/ row cache or key cache. > On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> looks like reads are backing up, which in turn is making deserialize back >> up >> >> On Sun, May 23, 2010 at 4:25 AM, Ran Tavory <ran...@gmail.com> wrote: >> > Here's tpstats on a server with traffic that I think will get OOM >> > shortly. >> > We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL >> > Is there something I can do to prevent that? (other than adding RAM...) >> > Pool Name Active Pending Completed >> > FILEUTILS-DELETE-POOL 0 0 55 >> > STREAM-STAGE 0 0 6 >> > RESPONSE-STAGE 0 0 0 >> > ROW-READ-STAGE 8 4088 7537229 >> > LB-OPERATIONS 0 0 0 >> > MESSAGE-DESERIALIZER-POOL 1 123799 22198459 >> > GMFD 0 0 471827 >> > LB-TARGET 0 0 0 >> > CONSISTENCY-MANAGER 0 0 0 >> > ROW-MUTATION-STAGE 0 0 14142351 >> > MESSAGE-STREAMING-POOL 0 0 16 >> > LOAD-BALANCER-STAGE 0 0 0 >> > FLUSH-SORTER-POOL 0 0 0 >> > MEMTABLE-POST-FLUSHER 0 0 128 >> > FLUSH-WRITER-POOL 0 0 128 >> > AE-SERVICE-STAGE 1 1 8 >> > HINTED-HANDOFF-POOL 0 0 10 >> > >> > On Sat, May 22, 2010 at 11:05 PM, Ran Tavory <ran...@gmail.com> wrote: >> >> >> >> The message deserializer has 10m pending tasks before the oom. What do >> >> you >> >> think makes the message deserializer blow up? I'd suspect that when it >> >> goes >> >> up to 10m pending tasks, don't know how much mem a task actually takes >> >> up, >> >> but they may consume a lot of memory. Is there a setting I need to >> >> tweak? >> >> (or am I barking at the wrong tree?). >> >> I'll add the counters >> >> from http://github.com/jbellis/cassandra-munin-plugins but I already >> >> have >> >> most of them monitored, so I attached the graphs of the ones that >> >> seemed the >> >> most suspicious in the previous email. >> >> The system keyspace and HH CF don't look too bad, I think, here they >> >> are: >> >> Keyspace: system >> >> Read Count: 154 >> >> Read Latency: 0.875012987012987 ms. >> >> Write Count: 9 >> >> Write Latency: 0.20055555555555554 ms. >> >> Pending Tasks: 0 >> >> Column Family: LocationInfo >> >> SSTable count: 1 >> >> Space used (live): 2714 >> >> Space used (total): 2714 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 3 >> >> Read Count: 2 >> >> Read Latency: NaN ms. >> >> Write Count: 9 >> >> Write Latency: 0.011 ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 1 >> >> Key cache size: 1 >> >> Key cache hit rate: NaN >> >> Row cache: disabled >> >> Compacted row minimum size: 203 >> >> Compacted row maximum size: 397 >> >> Compacted row mean size: 300 >> >> Column Family: HintsColumnFamily >> >> SSTable count: 1 >> >> Space used (live): 1457 >> >> Space used (total): 4371 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 0 >> >> Read Count: 152 >> >> Read Latency: 0.369 ms. >> >> Write Count: 0 >> >> Write Latency: NaN ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 1 >> >> Key cache size: 1 >> >> Key cache hit rate: 0.07142857142857142 >> >> Row cache: disabled >> >> Compacted row minimum size: 829 >> >> Compacted row maximum size: 829 >> >> Compacted row mean size: 829 >> >> >> >> >> >> >> >> >> >> On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis <jbel...@gmail.com> >> >> wrote: >> >>> >> >>> Can you monitor cassandra-level metrics like the ones in >> >>> http://github.com/jbellis/cassandra-munin-plugins ? >> >>> >> >>> the usual culprit is usually compaction but your compacted row size is >> >>> small. nothing else really comes to mind. >> >>> >> >>> (you should check system keyspace too tho, HH rows can get large) >> >>> >> >>> On Fri, May 21, 2010 at 2:36 PM, Ran Tavory <ran...@gmail.com> wrote: >> >>> > I see some OOM on one of the hosts in the cluster and I wonder if >> >>> > there's a >> >>> > formula that'll help me calculate what's the required memory setting >> >>> > given >> >>> > the parameters x,y,z... >> >>> > In short, I need advice on: >> >>> > 1. How to set up proper heap space and which parameters should I >> >>> > look >> >>> > at >> >>> > when doing so. >> >>> > 2. Help setting up an alert policy and define some counter measures >> >>> > or >> >>> > sos >> >>> > steps an admin can take to prevent further degradation of service >> >>> > when >> >>> > alerts fire. >> >>> > The OOM is at the row mutation stage and it happens after extensive >> >>> > GC >> >>> > activity. (log tail below). >> >>> > The server has 16G physical ram and java heap space 4G. No other >> >>> > significant >> >>> > processes run on the same server. I actually upped the java heap >> >>> > space >> >>> > to 8G >> >>> > but it OOMed again... >> >>> > Most of my settings are the defaults with a few keyspaces and a few >> >>> > CFs >> >>> > in >> >>> > each KS. Here's the output of cfstats for the largest and most >> >>> > heavily >> >>> > used >> >>> > CF. (currently reads/writes are stopped but data is there). >> >>> > Keyspace: outbrain_kvdb >> >>> > Read Count: 3392 >> >>> > Read Latency: 160.33135908018866 ms. >> >>> > Write Count: 2005839 >> >>> > Write Latency: 0.029233923061621595 ms. >> >>> > Pending Tasks: 0 >> >>> > Column Family: KvImpressions >> >>> > SSTable count: 8 >> >>> > Space used (live): 21923629878 >> >>> > Space used (total): 21923629878 >> >>> > Memtable Columns Count: 69440 >> >>> > Memtable Data Size: 9719364 >> >>> > Memtable Switch Count: 26 >> >>> > Read Count: 3392 >> >>> > Read Latency: NaN ms. >> >>> > Write Count: 1998821 >> >>> > Write Latency: 0.018 ms. >> >>> > Pending Tasks: 0 >> >>> > Key cache capacity: 200000 >> >>> > Key cache size: 11661 >> >>> > Key cache hit rate: NaN >> >>> > Row cache: disabled >> >>> > Compacted row minimum size: 302 >> >>> > Compacted row maximum size: 22387 >> >>> > Compacted row mean size: 641 >> >>> > I'm also attaching a few graphs of "the incidenst" I hope they help. >> >>> > From >> >>> > the graphs it looks like: >> >>> > 1. message deserializer pool is behind so maybe taking too much mem. >> >>> > If >> >>> > graphs are correct, it gets as high as 10m pending before crash. >> >>> > 2. row-read-stage has a high number of pending (4k) so first of all >> >>> > - >> >>> > this >> >>> > isn't good for performance whether it caused the oom or not, and >> >>> > second, >> >>> > this may also have taken up heap space and caused the crash. >> >>> > Thanks! >> >>> > INFO [GC inspection] 2010-05-21 00:53:25,885 GCInspector.java (line >> >>> > 110) GC >> >>> > for ConcurrentMarkSweep: 10819 ms, 939992 reclaimed leaving >> >>> > 4312064504 >> >>> > used; >> >>> > max is 4431216640 >> >>> > INFO [GC inspection] 2010-05-21 00:53:44,605 GCInspector.java (line >> >>> > 110) GC >> >>> > for ConcurrentMarkSweep: 9672 ms, 673400 reclaimed leaving >> >>> > 4312337208 >> >>> > used; >> >>> > max is 4431216640 >> >>> > INFO [GC inspection] 2010-05-21 00:54:23,110 GCInspector.java (line >> >>> > 110) GC >> >>> > for ConcurrentMarkSweep: 9150 ms, 402072 reclaimed leaving >> >>> > 4312609776 >> >>> > used; >> >>> > max is 4431216640 >> >>> > ERROR [ROW-MUTATION-STAGE:19] 2010-05-21 01:55:37,951 >> >>> > CassandraDaemon.java >> >>> > (line 88) Fatal exception in thread >> >>> > Thread[ROW-MUTATION-STAGE:19,5,main] >> >>> > java.lang.OutOfMemoryError: Java heap space >> >>> > ERROR [Thread-10] 2010-05-21 01:55:37,951 CassandraDaemon.java (line >> >>> > 88) >> >>> > Fatal exception in thread Thread[Thread-10,5,main] >> >>> > java.lang.OutOfMemoryError: Java heap space >> >>> > ERROR [CACHETABLE-TIMER-2] 2010-05-21 01:55:37,951 >> >>> > CassandraDaemon.java >> >>> > (line 88) Fatal exception in thread >> >>> > Thread[CACHETABLE-TIMER-2,5,main] >> >>> > java.lang.OutOfMemoryError: Java heap space >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Jonathan Ellis >> >>> Project Chair, Apache Cassandra >> >>> co-founder of Riptano, the source for professional Cassandra support >> >>> http://riptano.com >> >> >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com