What are your bloom filter settings on your CFs? Maybe look here: http://www.datastax.com/docs/1.1/operations/tuning#tuning-bloomfilters
On Nov 7, 2012, at 4:56 AM, Alain RODRIGUEZ wrote: > Hi, > > We just had some issue in production that we finally solve upgrading hardware > and increasing the heap. > > Now we have 3 xLarge servers from AWS (15G RAM, 4 cpu - 8 cores). We add them > and then removed the old ones. > > With full default configuration, 0.75 threshold of 4G was being reach > continuously, so I was obliged to increase the heap to 8G: > > Memtable : 2G (Manually configured) > Key cache : 0.1G (min(5% of Heap (in MB), 100MB)) > System : 1G (more or less, from datastax doc) > > It should use about 3 G and it actually use between 4 and 6 G. > > So here are my questions: > > How can we know how the heap is being used, monitor it ? > Why have I that much memory used in the heap of my new servers ? > > All configurations not specified are default from 1.1.2 Cassandra. > > Here is what happen to us before, why we change our hardware, if you have any > clue on what happen we would be glad to learn and maybe come back to our old > hardware. > > -------------------------------- User experience > ------------------------------------------------------------------------ > > We had a Cassandra 1.1.2 2 nodes cluster with RF2 and CL.ONE (R&W) running on > 2 m1.Large aws (7.5G RAM, 2 cpu - 4 cores dedicated to Cassandra only). > > Cassandra.yaml was configured with 1.1.2 default options and in > cassandra-env.sh I configured a 4G heap with a 200M "new size". > > That is the heap that was supposed to be used. > > Memtable : 1.4G (1/3 of the heap) > Key cache : 0.1G (min(5% of Heap (in MB), 100MB)) > System : 1G (more or less, from datastax doc) > > So we are around 2.5G max in theory out of 3G usable (threshold 0.75 of the > heap before flushing memtable because of pressure) > > I thought it was ok regarding Datastax documentation: > > "Regardless of how much RAM your hardware has, you should keep the JVM heap > size constrained by the following formula and allow the operating system’s > file cache to do the rest: > (memtable_total_space_in_mb) + 1GB + (cache_size_estimate)" > > After adding a third node and changing the RF from 2 to 3 (to allow using > CL.QUORUM and still be able to restart a node whenever we want), things went > really bad. Even if I still don't get how any of these operations could > possibly affect the heap needed. > > All the 3 nodes reached the 0.75 heap threshold (I tried to increase it to > 0.85, but it was still reached). And they never came down. So my cluster > started flushing a lot and the load increased because of unceasing > compactions. This unexpected load produced latency that broke down our > service for a while. Even with the service down, Cassandra was unable to > recover. >