Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-12-03 Thread Paulo Ricardo Motta Gomes
Thanks a lot for the help Graham and Robert! Will try increasing heap and see how it goes. Here are my gc settings, if they're still helpful (they're mostly the defaults): -Xms6G -Xmx6G -Xmn400M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenu

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-12-02 Thread Jason Wee
ack and many thanks for the tips and help.. jason On Wed, Dec 3, 2014 at 4:49 AM, Robert Coli wrote: > On Mon, Dec 1, 2014 at 11:07 PM, Jason Wee wrote: > >> Hi Rob, any recommended documentation on describing >> explanation/configuration of the JVM heap and permanent generation ? We >> stucke

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-12-02 Thread Robert Coli
On Mon, Dec 1, 2014 at 11:07 PM, Jason Wee wrote: > Hi Rob, any recommended documentation on describing > explanation/configuration of the JVM heap and permanent generation ? We > stucked in this same situation too. :( > The archives of this list are chock full of explorations of various cases.

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-12-01 Thread Jason Wee
Hi Rob, any recommended documentation on describing explanation/configuration of the JVM heap and permanent generation ? We stucked in this same situation too. :( Jason On Tue, Dec 2, 2014 at 3:42 AM, Robert Coli wrote: > On Fri, Nov 28, 2014 at 12:55 PM, Paulo Ricardo Motta Gomes < > paulo.mo.

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-12-01 Thread Robert Coli
On Fri, Nov 28, 2014 at 12:55 PM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > We restart the whole cluster every 1 or 2 months, to avoid machines > getting into this crazy state. We tried tuning GC size and parameters, > different cassandra versions (1.1, 1.2, 2.0), but t

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
I should note that the young gen size is just a tuning suggestion, not directly related to your problem at hand. You might want to make sure you don’t have issues with key/row cache. Also, I’m assuming that your extra load isn’t hitting tables that you wouldn’t normally be hitting. > On Nov 28

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
Your GC settings would be helpful, though you can see guesstimate by eyeballing (assuming settings are the same across all 4 images) Bursty load can be a big cause of old gen fragmentation (as small working set objects tends to get spilled (promoted) along with memtable slabs which aren’t flush

Re: Nodes get stuck

2013-08-21 Thread Robert Coli
On Wed, Aug 21, 2013 at 10:47 AM, Robert Coli wrote: > On Tue, Aug 20, 2013 at 11:35 PM, Keith Wright wrote: > >> Still looking for help! We have stopped almost ALL traffic to the >> cluster and still some nodes are showing almost 1000% CPU for cassandra >> with no iostat activity. We were run

Re: Nodes get stuck

2013-08-21 Thread Robert Coli
On Tue, Aug 20, 2013 at 11:35 PM, Keith Wright wrote: > Still looking for help! We have stopped almost ALL traffic to the cluster > and still some nodes are showing almost 1000% CPU for cassandra with no > iostat activity. We were running cleanup on one of the nodes that was not > showing load

Re: Nodes get stuck

2013-08-21 Thread Sylvain Lebresne
m.java:82) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) > at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297) > at > org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244) > at org.xerial.snappy.SnappyOutputStream.write(Sn

Re: Nodes get stuck

2013-08-20 Thread Keith Wright
Still looking for help! We have stopped almost ALL traffic to the cluster and still some nodes are showing almost 1000% CPU for cassandra with no iostat activity. We were running cleanup on one of the nodes that was not showing load spikes however now when I attempt to stop cleanup there via