memtables etc

graham sanderson Tue, 20 May 2014 16:58:28 -0700

So i’ve been tinkering a bit with CMS config because we are still seeing fairly 
frequent full compacting GC due to framgentation/promotion failure

As mentioned below, we are usually too fragmented to promote new in-flight 
memtables.

This is likely caused by sudden write spikes (which we do have), though 
actually the problems don’t generally happen at that time of our largest write 
spikes (though any write spikes likely cause spill of both new memtables along 
with many other new objects of unknown size into the tenured gen, so they cause 
fragmentation if not immediate GC issue). We have lots of things going on in 
this multi-tenant cluster (GC pauses are of course extra bad, since they cause 
spike in hinted-handoff on other nodes which were already busy etc…)

Anyway, considering possibilities:

0) Try and make our application behavior more steady state - this is probably 
possible, but there are lots of other things (e.g. compaction, opscenter, 
repair etc.) which are both tunable and generally throttle-able to think about 
too.
1) Play with tweaking PLAB configs to see if we can ease fragmentation (I’d be 
curious what the “crud” is in particular that is getting spilled - presumably 
it is larger objects since it affects the binary tree of large objects)
2) Given the above, if we can guarantee even > 24 hours without full GC, I 
don’t think we’d mind running a regular rolling re-start on the servers during 
off hours (note usually the GCs don’t have a visible impact, but when they hit 
multiple machines at once they can)
3) Zing is seriously an option, if it would save us large amounts of tuning, 
and constant worry about the “next” thing tweaking the allocation patterns - 
does anyone have any experience with Zing & Cassandra
4) Given that we expect periodic bursts of writes, memtable_total_space_in_mb 
is bounded, we are not actually short of memory (it just gets fragmented), I’m 
wondering if anyone has played with pinning (up to or initially?) that many 1MB 
chunks of memory via SlabAllocator and re-using… It will get promoted once, and 
then these 1M chunks won’t be part of the subsequent promotion hassle… it will 
probably also allow more crud to die in eden under write load since we aren’t 
allocating these large chunks in eden at the same time. Anyway, I had a little 
look at the code, and the life cycles of memtables is not trivial, but was 
considering attempting a patch to play with… anyone have any thoughts?

Basically in summary, the Slab allocator helps by allocating and freeing lots 
of objects at the same time, however any time slabs are allocated under load, 
we end up promoting them with whatever other live stuff in eden is still there. 
If we only do this once and reuse the slabs, we are likely to minimize our 
promotion problem later (at least for these large objects)

On May 16, 2014, at 9:37 PM, graham sanderson <gra...@vast.com> wrote:

> Excellent - thank you… 
> 
> On May 16, 2014, at 7:08 AM, Samuel CARRIERE <samuel.carri...@urssaf.fr> 
> wrote:
> 
>> Hi,
>> This is arena allocation of memtables. See here for more infos : 
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
>> 
>> 
>> 
>> 
>> De :    graham sanderson <gra...@vast.com>
>> A :     dev@cassandra.apache.org, 
>> Date :  16/05/2014 14:03
>> Objet : Things that are about 1M big
>> 
>> 
>> 
>> So just throwing this out there for those for whom this might ring a bell.
>> 
>> I?m debugging some CMS memory fragmentation issues on 2.0.5 - and 
>> interestingly enough most of the objects giving us promotion failures are 
>> of size 131074 (dwords) - GC logging obviously doesn?t say what those are, 
>> but I?d wager money they are either 1M big byte arrays, or less likely 
>> 256k entry object arrays backing large maps
>> 
>> So not strictly critical to solving my problem, but I was wondering if 
>> anyone can think of any heap allocated C* objects which are (with no 
>> significant changes to standard cassandra config) allocated in 1M chunks. 
>> (It would save me scouring the code, or a 9 gig heap dump if I need to 
>> figure it out!)
>> 
>> Thanks,
>> 
>> Graham
>

smime.p7s
Description: S/MIME cryptographic signature

CMS GC / fragmentation / memtables etc

Reply via email to