Oleg Anastasjev wrote: > >> >> Has anyone experienced this sort of problem? It would be great to hear >> from >> anyone who has had experience with this sort of issue and/or suggestions >> for >> how to deal with it. >> >> Thanks, Eric > > Yes, i did. Symptoms you described point to concurrent GC FAILURE. During > this > failure concurrent GC completely stops java program (i.e. cassandra) and > does a > GC cycle. Other cassandra nodes discover, that node is not responding and > considering it dead. > If concurrent GC is properly tuned, it should never do stop-the-world and > GC ( > thats why it is called concurrent ;-) ). > Reasons for concurrent GC failures can be several: > 1. Not enought java heap - try to raise max java heap limit > 2. Improperly sized java heap regions. > > To help you to narrow the problem, pass -XX:+PrintGCDetails option to JVM > launching cassandra node. This will log information about internal GC > activities. Let it run till it will be thrown out of cluster again and > search > for "concurrent mode failure" or "promotion failed" strings. >
We did indeed have a problem with our GC settings. The survivor ratio was too low. After changing that things are better but we are still seeing GC that takes 5-10 seconds, which is enough for the node to drop out of the cluster briefly. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132267.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.