Hi Rob, >>What version of Cassandra? What JVM? Are JNA and Jamm working? cassandra 1.0.8. Sun JDK 1.7.0_05-b06, JNA memlock enabled, jamm works.
>>It sounds like the two nodes that are pathological right now have exhausted the perm gen with actual non-garbage, probably mostly the Bloom filters and the JMX MBeans. JMAP shows that the per gen is only 40% used. >>Do you have a "large" number of ColumnFamilies? How large is the data stored per node? I have very few column families, maybe 30-50. The nodetool shows each node has 5 GB load. >> Disable swap for cassandra node I am gonna change swappiness to 20% Thanks, Daniel On Fri, Oct 12, 2012 at 2:02 AM, Rob Coli <rc...@palominodb.com> wrote: > On Wed, Oct 10, 2012 at 11:04 PM, Daniel Woo <daniel.y....@gmail.com> > wrote: > > I am running a mini cluster with 6 nodes, recently we see very frequent > > ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it > takes > > 5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client > throws > > SocketTimeoutException every 3 minutes. > > What version of Cassandra? What JVM? Are JNA and Jamm working? > > > I checked the load, it seems well balanced, and the two nodes are > running on > > the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4G > > heap, including 800MB young generation. We did not see any swap usage > during > > the GC, any idea about this? > > It sounds like the two nodes that are pathological right now have > exhausted the perm gen with actual non-garbage, probably mostly the > Bloom filters and the JMX MBeans. > > > Then I took a heap dump, it shows that 5 instances of JmxMBeanServer > holds > > 500MB memory and most of the referenced objects are JMX mbean related, > it's > > kind of wired to me and looks like a memory leak. > > Do you have a "large" number of ColumnFamilies? How large is the data > stored per node? > > =Rob > > -- > =Robert Coli > AIM>ALK - rc...@palominodb.com > YAHOO - rcoli.palominob > SKYPE - rcoli_palominodb > -- Thanks & Regards, Daniel