Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Robert, thanks for these references! We're not using DTCS, so 9056 and 8243 seem out, but I'll take a look at 9577 (also looked at the referenced thread on this list, which seems to have some interesting data) On Wed, Jul 22, 2015 at 5:33 PM, Robert Coli wrote: > On Wed, Jul 22, 2015 at 2:55 PM,

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Robert Coli
On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng wrote: > nodetool still reports the node as being healthy, and it does respond to > some local queries; however, the CPU is pegged at 100%. One common thread > (heh) each time this happens is that there always seems to be one of more > compaction threa

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
I faced something similar in past and the reason for nodes becoming unresponsive intermittently was Long GC pauses. That's why I wanted to bring this to your attention incase GC pause is a potential cause. Sent from my iPhone > On Jul 22, 2015, at 4:32 PM, Bryan Cheng wrote: > > Aiman, > > Y

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Aiman, Your post made me look back at our data a bit. The most recent occurrence of this incident was not preceded by any abnormal GC activity; however, the previous occurrence (which took place a few days ago) did correspond to a massive, order-of-magnitude increase in both ParNew and CMS collect

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi Aiman, We previously had issues with GC, but since upgrading to 2.1.7 things seem a lot healthier. We collect GC statistics through collectd via the garbage collector mbean, ParNew GC's report sub 500ms collection time on average (I believe accumulated per minute?) and CMS peaks at about 300ms

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
Hi Bryan How's GC behaving on these boxes? On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng wrote: > Hi there, > > Within our Cassandra cluster, we're observing, on occasion, one or two > nodes at a time becoming partially unresponsive. > > We're running 2.1.7 across the entire cluster. > > nodetool

Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi there, Within our Cassandra cluster, we're observing, on occasion, one or two nodes at a time becoming partially unresponsive. We're running 2.1.7 across the entire cluster. nodetool still reports the node as being healthy, and it does respond to some local queries; however, the CPU is pegged