Chris, I've deployed the patch to the cluster for two days. Everything is quite good since then.
Thank you! best regards, 韩竹(Zhu Han) On Sat, Jul 30, 2011 at 3:52 AM, Chris Burroughs <chris.burrou...@gmail.com>wrote: > Thanks to everyone who responded (I think I learned a few new tricks > from seeing what you tried and how your monitor). I didn't see any > patterns in JVM, OS, cassandra versions etc. > > At this time I'm confident in saying CASSANDRA-2868 (and thus really > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) is the > culprit. > > On 07/12/2011 09:28 AM, Chris Burroughs wrote: > > ### Preamble > > > > There have been several reports on the mailing list of the JVM running > > Cassandra using "too much" memory. That is, the resident set size is > >>> (max java heap size + mmaped segments) and continues to grow until the > > process swaps, kernel oom killer comes along, or performance just > > degrades too far due to the lack of space for the page cache. It has > > been unclear from these reports if there is a pattern. My hope here is > > that by comparing JVM versions, OS versions, JVM configuration etc., we > > will find something. Thank you everyone for your time. > > > > > > Some example reports: > > - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html > > - > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html > > - https://issues.apache.org/jira/browse/CASSANDRA-2868 > > - > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html > > - > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html > > > > For reference theories include (in no particular order): > > - memory fragmentation > > - JVM bug > > - OS/glibc bug > > - direct memory > > - swap induced fragmentation > > - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity. > > > > ### Survey > > > > 1. Do you think you are experiencing this problem? > > > > 2. Why? (This is a good time to share a graph like > > http://www.twitpic.com/5fdabn or > > http://img24.imageshack.us/img24/1754/cassandrarss.png) > > > > 2. Are you using mmap? (If yes be sure to have read > > http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have > > used pmap [or another tool] to rule you mmap and top decieving you.) > > > > 3. Are you using JNA? Was mlockall succesful (it's in the logs on > startup)? > > > > 4. Is swap enabled? Are you swapping? > > > > 5. What version of Apache Cassandra are you using? > > > > 6. What is the earliest version of Apache Cassandra you recall seeing > > this problem with? > > > > 7. Have you tried the patch from CASSANDRA-2654 ? > > > > 8. What jvm and version are you using? > > > > 9. What OS and version are you using? > > > > 10. What are your jvm flags? > > > > 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize) > > > > 12. Can you characterise how much GC your cluster is doing? > > > > 13. Approximately how many read/writes per unit time is your cluster > > doing (per node or the whole cluster)? > > > > 14. How are you column families configured (key cache size, row cache > > size, etc.)? > > > >