Re: MemtableReclaimMemory pending building up

Alain RODRIGUEZ Thu, 10 Mar 2016 03:00:53 -0800

Hi Dan,

You're welcome, but I must admit you solved it on your own as I was about
to advice you reducing all the JVM stuff, the exact contrary to the working
solution you found :-). As 48 GB is a lot (I would have say something like
26 GB heap, and memtables about 4GB or something like that) to try reducing
GC Pause time and leave some more free space for page caching. Truth is
without accessing the cluster, the best we can do is guessing. The operator
is the only one having all the needed informations ;-).


If things are running smoothly and efficiently enough, don't try anything
else, just stick with the working config imho.

Glad you figured it out while I was out, sorry I missed the follow-up.

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-03-08 20:11 GMT+01:00 Dan Kinder <dkin...@turnitin.com>:

> Quick follow-up here, so far I've had these nodes stable for about 2 days
> now with the following (still mysterious) solution: *increase* 
> memtable_heap_space_in_mb
> to 20GB. This was having issues at the default value of 1/4 heap (12GB in
> my case, I misspoke earlier and said 16GB). Upping it to 20GB seems to have
> made the issue go away so far.
>
> Best guess now is that it simply was memtable flush throughput. Playing
> with memtable_cleanup_threshold further may have also helped but I didn't
> want to create small SSTables.
>
> Thanks again for the input @Alain.
>
> On Fri, Mar 4, 2016 at 4:53 PM, Dan Kinder <dkin...@turnitin.com> wrote:
>
>> Hi thanks for responding Alain. Going to provide more info inline.
>>
>> However a small update that is probably relevant: while the node was in
>> this state (MemtableReclaimMemory building up), since this cluster is
>> not serving live traffic I temporarily turned off ALL client traffic, and
>> the node still never recovered, MemtableReclaimMemory never went down.
>> Seems like there is one thread doing this reclaiming and it has gotten
>> stuck somehow.
>>
>> Will let you know when I have more results from experimenting... but
>> again, merci
>>
>> On Thu, Mar 3, 2016 at 2:32 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> Hi Dan,
>>>
>>> I'll try to go through all the elements:
>>>
>>> seeing this odd behavior happen, seemingly to single nodes at a time
>>>
>>>
>>> Is that one node at the time or always on the same node. Do you consider
>>> your data model if fairly, evenly distributed ?
>>>
>>
>> of 6 nodes, 2 of them seem to be the recurring culprits. Could be related
>> to a particular data partition.
>>
>>
>>>
>>> The node starts to take more and more memory (instance has 48GB memory
>>>> on G1GC)
>>>
>>>
>>> Do you use 48 GB heap size or is that the total amount of memory in the
>>> node ? Could we have your JVM settings (GC and heap sizes), also memtable
>>> size and type (off heap?) and the amount of available memory ?
>>>
>>
>> Machine spec: 24 virtual cores, 64GB memory, 12 HDD JBOD (yes an absurd
>> number of disks, not my choice)
>>
>> memtable_heap_space_in_mb: 10240 # 10GB (previously left as default
>> which was 16GB and caused the issue more frequently)
>> memtable_allocation_type: heap_buffers
>> memtable_flush_writers: 12
>>
>> MAX_HEAP_SIZE="48G"
>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>> JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>
>>>
>>> Note that there is a decent number of compactions going on as well but
>>>> that is expected on these nodes and this particular one is catching up from
>>>> a high volume of writes
>>>>
>>>
>>> Are the *concurrent_compactors* correctly throttled (about 8 with good
>>> machines) and the *compaction_throughput_mb_per_sec* high enough to
>>> cope with what is thrown at the node ? Using SSD I often see the latter
>>> unthrottled (using 0 value), but I would try small increments first.
>>>
>> concurrent_compactors: 12
>> compaction_throughput_mb_per_sec: 0
>>
>>>
>>> Also interestingly, neither CPU nor disk utilization are pegged while
>>>> this is going on
>>>>
>>>
>>> First thing is making sure your memory management is fine. Having
>>> information about the JVM and memory usage globally would help. Then, if
>>> you are not fully using the resources you might want to try increasing the
>>> number of *concurrent_writes* to a higher value (probably a way higher,
>>> given the pending requests, but go safely, incrementally, first on a canary
>>> node) and monitor tpstats + resources. Hope this will help Mutation pending
>>> going down. My guess is that pending requests are messing with the JVM, but
>>> it could be the exact contrary as well.
>>>
>> concurrent_writes: 192
>> It may be worth noting that the main reads going on are large batch
>> reads, while these writes are happening (akin to analytics jobs).
>>
>> I'm going to look into JVM use a bit more but otherwise it seems like
>> normal Young generation GCs are happening even as this problem surfaces.
>>
>>
>>>
>>> Native-Transport-Requests        25         0      547935519         0
>>>>         2586907
>>>
>>>
>>> About Native requests being blocked, you can probably mitigate things by
>>> increasing the native_transport_max_threads: 128 (try to double it and
>>> continue tuning incrementally). Also, an up to date client, using the
>>> Native protocol V3 handles a lot better connections / threads from clients.
>>> Having an heavy throughput like yours, you might want to give this a try.
>>>
>>
>> This one is a good idea and I'll probably try increasing it, but I don't
>> really see these back up so.
>>
>>
>>>
>>> What is your current client ?
>>> What does "netstat -an | grep -e 9042 -e 9160 | grep ESTABLISHED | wc
>>> -l" outputs ? This is the number of clients connected to the node.
>>> Do you have other significant errors or warning in the logs (other than
>>> dropped mutations)? "grep -i -e "ERROR" -e "WARN"
>>> /var/log/cassandra/system.log"
>>>
>>
>> 435 incoming connections, only warning is compaction of some large
>> partitions.
>>
>>
>>>
>>> As a small conclusion I would have an eye on things related to the
>>> memory management and also trying to push Cassandra limits by increasing
>>> default values as you seems to have resources available, to make sure
>>> Cassandra can cope with the high throughput. Pending operations = high
>>> memory pressure. Reducing pending stuff somehow will probably get you out
>>> off troubles.
>>>
>>> Hope this first round of ideas will help you.
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-02 22:58 GMT+01:00 Dan Kinder <dkin...@turnitin.com>:
>>>
>>>> Also should note: Cassandra 2.2.5, Centos 6.7
>>>>
>>>> On Wed, Mar 2, 2016 at 1:34 PM, Dan Kinder <dkin...@turnitin.com>
>>>> wrote:
>>>>
>>>>> Hi y'all,
>>>>>
>>>>> I am writing to a cluster fairly fast and seeing this odd behavior
>>>>> happen, seemingly to single nodes at a time. The node starts to take more
>>>>> and more memory (instance has 48GB memory on G1GC). tpstats shows that
>>>>> MemtableReclaimMemory Pending starts to grow first, then later
>>>>> MutationStage builds up as well. By then most of the memory is being
>>>>> consumed, GC is getting longer, node slows down and everything slows down
>>>>> unless I kill the node. Also the number of Active MemtableReclaimMemory
>>>>> threads seems to stay at 1. Also interestingly, neither CPU nor disk
>>>>> utilization are pegged while this is going on; it's on jbod and there is
>>>>> plenty of headroom there. (Note that there is a decent number of
>>>>> compactions going on as well but that is expected on these nodes and this
>>>>> particular one is catching up from a high volume of writes).
>>>>>
>>>>> Anyone have any theories on why this would be happening?
>>>>>
>>>>>
>>>>> $ nodetool tpstats
>>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>>  All time blocked
>>>>> MutationStage                   192    715481      311327142         0
>>>>>                 0
>>>>> ReadStage                         7         0        9142871         0
>>>>>                 0
>>>>> RequestResponseStage              1         0      690823199         0
>>>>>                 0
>>>>> ReadRepairStage                   0         0        2145627         0
>>>>>                 0
>>>>> CounterMutationStage              0         0              0         0
>>>>>                 0
>>>>> HintedHandoff                     0         0            144         0
>>>>>                 0
>>>>> MiscStage                         0         0              0         0
>>>>>                 0
>>>>> CompactionExecutor               12        24          41022         0
>>>>>                 0
>>>>> MemtableReclaimMemory             1       102           4263         0
>>>>>                 0
>>>>> PendingRangeCalculator            0         0             10         0
>>>>>                 0
>>>>> GossipStage                       0         0         148329         0
>>>>>                 0
>>>>> MigrationStage                    0         0              0         0
>>>>>                 0
>>>>> MemtablePostFlush                 0         0           5233         0
>>>>>                 0
>>>>> ValidationExecutor                0         0              0         0
>>>>>                 0
>>>>> Sampler                           0         0              0         0
>>>>>                 0
>>>>> MemtableFlushWriter               0         0           4270         0
>>>>>                 0
>>>>> InternalResponseStage             0         0       16322698         0
>>>>>                 0
>>>>> AntiEntropyStage                  0         0              0         0
>>>>>                 0
>>>>> CacheCleanupExecutor              0         0              0         0
>>>>>                 0
>>>>> Native-Transport-Requests        25         0      547935519         0
>>>>>           2586907
>>>>>
>>>>> Message type           Dropped
>>>>> READ                         0
>>>>> RANGE_SLICE                  0
>>>>> _TRACE                       0
>>>>> MUTATION                287057
>>>>> COUNTER_MUTATION             0
>>>>> REQUEST_RESPONSE             0
>>>>> PAGED_RANGE                  0
>>>>> READ_REPAIR                149
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dan Kinder
>>>> Principal Software Engineer
>>>> Turnitin – www.turnitin.com
>>>> dkin...@turnitin.com
>>>>
>>>
>>
>

Re: MemtableReclaimMemory pending building up

Reply via email to