> How much memory does cassandra need to manager 300G of data load? How much > extra memory will be needed when doing compaction?
For one thing it depends on the data. One thing that scales linearly (but with a low constant) with the amount of data are the bloom filters. If those 300 GB correspond to 1 billion small values, more memory will be used for the sstable bloom filters than if they correspond to 1 million large values. > Regarding mmap, memory usage will be determined by the OS so it has nothing > to do with the heap size of JVM, am I right? Yes, though heap size can affect whether the OS starts swapping the JVM out. > some nodes run for 1 to 2 days, and then gets very slow, due to bad gc > performance, then crashed. This happed quite a lot, almost every day. > Here is a fragment of the gc.log > > (concurrent mode failure): 6014591K->6014591K(6014592K), 25.4846400 secs] 6289343K->6282274K(6289344K), [CMS Perm : 17290K->17287K(28988K)], 25.4848970 secs] [Times: user=37.76 sys=0.12, real=25.49 secs] > 69695.771: [Full GC 69695.771: [CMS: 6014591K->6014591K(6014592K), 21.0911470 secs] 6289343K->6282177K(6289344K), [CMS Perm : 17287K->17287K(28988K)], 21.0913910 secs] [Times: user=21.01 sys=0.12, real=21.09 secs] You're running out of heap size. Concurrent mode failure means the heap became full before concurrent marking could complete; the subsequent full GC then shows that almost no data was freeed in the full GC:s, indicating that you simply have far too much live data for your heap size. Either increase the JVM heap size or adjust Cassandra settings to take less memory (eg smaller memtables sizes, less caching). -- / Peter Schuller