> When we analyzed the heap, almost all of it was memtables.
What were the top classes ? 
I would normally expect an OOM in pre 1.2 days to be the result of bloom 
filters, compaction meta data and index samples. 

> Is there any known issue with 1.1.5 which causes memtable_total_space_in_mb 
> not to be respected, or not defaulting to 1/3rd of the heap size?
Nothing I can remember. 
We estimate the in memory size of the memtables using the live ratio. That’s 
been pretty good for a while now, but you may want to check the change log for 
changes there. 

> The latest test was running on high performance 32-core, 128 GB RAM, 7 RAID-0 
> 1TB disks (regular).
With all those cores grab the TLAB setting from the 1.2 cassandra-env.sh file. 

Cheers


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 1/11/2013, at 2:59 pm, Arindam Barua <aba...@247-inc.com> wrote:

> 
> Thank you for your responses. In another recent test, the heap actually got 
> full, and we got an out of memory error. When we analyzed the heap, almost 
> all of it was memtables. Is there any known issue with 1.1.5 which causes 
> memtable_total_space_in_mb not to be respected, or not defaulting to 1/3rd of 
> the heap size? Or is it possible that the load in the test is that high that 
> Cassandra is not able to keep flushing even though it starts the process when 
> memtable_total_space_in_mb is 1/3rd of the heap?
> 
> We recently switched to LeveledCompaction, however, when we got the earlier 
> heap warning, that was running on SizeTiered.
> The latest test was running on high performance 32-core, 128 GB RAM, 7 RAID-0 
> 1TB disks (regular). Earlier tests were run on lesser hardware with the same 
> load, but there was no memory problem. We are running more tests to check if 
> this is always reproducible.
> 
> Answering some of the earlier questions if it helps:
> 
> We have Cassandra 1.1.5 running in production. Upgrading to the latest 1.2.x 
> release is on the roadmap, but till then this needs to be figured out.
> 
>> - How many data do you got per node ?
> We are running into these errors while running tests in QA starting with 0 
> load. These are around 4 hr tests which end up adding under 1 GB of data on 
> each node of a 4-node ring, or a 2-node ring.
> 
>> - What is the value of the "index_intval" (cassandra.yaml) ?
> It's the default value of 128.
> 
> Thanks,
> Arindam
> 
> -----Original Message-----
> From: Aaron Morton [mailto:aa...@thelastpickle.com] 
> Sent: Monday, October 28, 2013 12:09 AM
> To: Cassandra User
> Subject: Re: Heap almost full
> 
>> 1] [14/10/2013:19:15:08 PDT] ScheduledTasks:1:  WARN GCInspector.java (line 
>> 145) Heap is 0.8287082580489245 full.  You may need to reduce memtable 
>> and/or cache sizes.  Cassandra will now flush up to the two largest 
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold in 
>> cassandra.yaml if you don't want Cassandra to do this automatically
> This means that the CMS GC was unable to free memory quickly, you've not run 
> out but may do under heavy load. 
> 
> CMS uses CPU resources to do it's job, how much CPU do you have available ? 
> To check the behaviour of the CMS collector using JConsole or another tool to 
> watch the heap size, you should see a nice saw tooth graph. It should 
> gradually grow then drop quickly to below 3ish GB. If the size of CMS is not 
> low enough you will spend more time in GC. 
> 
> You may also want to adjust flush_largest_memtables_at to be .8 to give CMS a 
> chance to do it's work. It starts at .75
> 
>> In 1.2+ bloomfilters are off-heap, you can use vnodes...
> +1 for 1.2 with off heap bloom filters. 
> 
>> - increasing the heap to 10GB.
> 
> -1 
> Unless you have a node under heavy memory problems, pre 1.2 with 1+billion 
> rows and lots of bloom filters, increasing the heap is not the answer. It 
> will increase the time taken for ParNew CMS and in kicks the problem down the 
> road. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 26/10/2013, at 8:32 am, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> 
>> If you are starting with Cassandra I really advice you to start with 1.2.11
>> 
>> In 1.2+ bloomfilters are off-heap, you can use vnodes...
>> 
>> "I summed up the bloom filter usage reported by nodetool cfstats in all the 
>> CFs and it was under 50 MB."
>> 
>> This is quite a small value. Is there no error in your conversion from Bytes 
>> read in cfstats ?
>> 
>> If you are trying to understand this could you tell us :
>> 
>> - How many data do you got per node ?
>> - What is the value of the "index_intval" (cassandra.yaml) ?
>> 
>> If you are trying to fix this, you can try :
>> 
>> - changing the "memtable_total_space_in_mb" to 1024
>> - increasing the heap to 10GB.
>> 
>> Hope this will help somehow :).
>> 
>> Good luck
>> 
>> 
>> 2013/10/16 Arindam Barua <aba...@247-inc.com>
>> 
>> 
>> During performance testing being run on our 4 node Cassandra 1.1.5 cluster, 
>> we are seeing warning logs about the heap being almost full - [1]. I'm 
>> trying to figure out why, and how to prevent it.
>> 
>> 
>> 
>> The tests are being run on a Cassandra ring consisting of 4 dedicated boxes 
>> with 32 GB of RAM each.
>> 
>> The heap size is set to 8 GB as recommended.
>> 
>> All the other relevant settings I know off are the default ones:
>> 
>> -          memtable_total_space_in_mb is not set in the yaml, so should 
>> default to 1/3rd the heap size.
>> 
>> -          They key cache should be 100 MB at the most. I checked the key 
>> cache the day after the tests were run via nodetool info, and it reported 
>> 4.5 MB being used.
>> 
>> -          row cache is not being used
>> 
>> -          I summed up the bloom filter usage reported by nodetool cfstats 
>> in all the CFs and it was under 50 MB.
>> 
>> 
>> 
>> The resident size of the cassandra process accd to top is 8.4g even now. Did 
>> a heap histogram using jmap, but not sure how to interpret those results 
>> usefully - [2].
>> 
>> 
>> 
>> Performance test details:
>> 
>> -          The test is write only, and is writing relatively large amount of 
>> data to one CF.
>> 
>> -          There is some other traffic that is constantly on that writes 
>> smaller amounts of data to many CFs, and does some reads.
>> 
>> 
>> 
>> The total number of CFs are 114, but quite a few of them are not used.
>> 
>> 
>> 
>> Thanks,
>> 
>> Arindam
>> 
>> 
>> 
>> [1] [14/10/2013:19:15:08 PDT] ScheduledTasks:1:  WARN GCInspector.java (line 
>> 145) Heap is 0.8287082580489245 full.  You may need to reduce memtable 
>> and/or cache sizes.  Cassandra will now flush up to the two largest 
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold in 
>> cassandra.yaml if you don't want Cassandra to do this automatically
>> 
>> 
>> 
>> [2] Object Histogram:
>> 
>> 
>> 
>> num       #instances    #bytes  Class description
>> 
>> --------------------------------------------------------------------------
>> 
>> 1:              152855  86035312        int[]
>> 
>> 2:              13395   45388008        long[]
>> 
>> 3:              49517   9712000 java.lang.Object[]
>> 
>> 4:              120094  8415560 char[]
>> 
>> 5:              145106  6965088 java.nio.HeapByteBuffer
>> 
>> 6:              40525   5891040 * ConstMethodKlass
>> 
>> 7:              231258  5550192 java.lang.Long
>> 
>> 8:              40525   5521592 * MethodKlass
>> 
>> 9:              134574  5382960 java.math.BigInteger
>> 
>> 10:             36692   4403040 java.net.SocksSocketImpl
>> 
>> 11:             3741    4385048 * ConstantPoolKlass
>> 
>> 12:             63875   3538128 * SymbolKlass
>> 
>> 13:             104048  3329536 java.lang.String
>> 
>> 14:             132636  3183264 org.apache.cassandra.db.DecoratedKey
>> 
>> 15:             97466   3118912 
>> java.util.concurrent.ConcurrentHashMap$HashEntry
>> 
>> 16:             97216   3110912 
>> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
>> 
>> 
>> 
>> 
>> 
> 

Reply via email to