Hi, Has anyboy done any memory usage analysis for cassandra?
How much memory does cassandra need to manager 300G of data load? How much extra memory will be needed when doing compaction? Regarding mmap, memory usage will be determined by the OS so it has nothing to do with the heap size of JVM, am I right? I have a cassandra cluster of 13 nodes, each with 200~300g data. JVM settings JVM_OPTS=" \ -ea \ -Xms6G \ -Xmx6G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -XX:+PrintGC -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \ -Dcom.sun.management.jmxremote.port=4993 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false" KeysCache settings for 3 column families are 5,000,000 1,000,000 1,000,000 some nodes run for 1 to 2 days, and then gets very slow, due to bad gc performance, then crashed. This happed quite a lot, almost every day. Here is a fragment of the gc.log (concurrent mode failure): 6014591K->6014591K(6014592K), 25.4846400 secs] 6289343K->6282274K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 25.4848970 secs] [Times: user=37.76 sys=0.12, real=25.49 secs] 69695.771: [Full GC 69695.771: [CMS: 6014591K->6014591K(6014592K), 21.0911470 secs] 6289343K->6282177K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 21.0913910 secs] [Times: user=21.01 sys=0.12, real=21.09 secs] 69716.902: [GC [1 CMS-initial-mark: 6014591K(6014592K)] 6287620K(6289344K), 0.2759980 secs] [Times: user=0.28 sys=0.00, real=0.28 secs] 69717.178: [CMS-concurrent-mark-start] 69717.203: [Full GC 69717.203: [CMS69721.345: [CMS-concurrent-mark: 4.152/4.167 secs] [Times: user=16.64 sys=0.01, real=4.17 secs] (concurrent mode failure): 6014592K->6014591K(6014592K), 25.3649330 secs] 6289343K->6282200K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 25.3651670 secs] [Times: user=37.67 sys=0.13, real=25.37 secs] 69742.598: [Full GC 69742.598: [CMS: 6014591K->6014592K(6014592K), 21.0942430 secs] 6289343K->6282398K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 21.0944950 secs] [Times: user=21.00 sys=0.12, real=21.10 secs] 69763.721: [Full GC 69763.721: [CMS: 6014592K->6014591K(6014592K), 21.0978230 secs] 6289343K->6282553K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 21.0980600 secs] [Times: user=20.99 sys=0.12, real=21.09 secs] 69784.830: [GC [1 CMS-initial-mark: 6014591K(6014592K)] 6287995K(6289344K), 0.2765360 secs] [Times: user=0.28 sys=0.00, real=0.28 secs] 69785.107: [CMS-concurrent-mark-start] 69785.123: [Full GC 69785.123: [CMS69789.244: [CMS-concurrent-mark: 4.132/4.136 secs] [Times: user=16.49 sys=0.03, real=4.13 secs] (concurrent mode failure): 6014591K->6014591K(6014592K), 26.0883660 secs] 6289343K->6282549K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 26.0886060 secs] [Times: user=38.28 sys=0.15, real=26.09 secs] Anybody got an idea?