Re: memtable mem usage off by 10?

Benedict Elliott Smith Wed, 04 Jun 2014 08:19:34 -0700

In that case I would assume the problem is that for some reason JAMM is
failing to load, and so the liveRatio it would ordinarily calculate is
defaulting to 10 - are you using the bundled cassandra launch scripts?



On 4 June 2014 15:51, Idrén, Johan <johan.id...@dice.se> wrote:

>  I wasn’t supplying it, I was assuming it was using the default. It does
> not exist in my config file. Sorry for the confusion.
>
>
>
>   From: Benedict Elliott Smith <belliottsm...@datastax.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Wednesday 4 June 2014 16:36
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>
> Subject: Re: memtable mem usage off by 10?
>
>    Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry,
>> I was going by the documentation. It claims that the property is around in
>> 2.0.
>
> But something else is wrong, as Cassandra will crash if you supply an
> invalid property, implying it's not sourcing the config file you're using.
>  I'm afraid I don't have the context for why it was removed, but it
> happened as part of the 2.0 release.
>
>>
>
> On 4 June 2014 13:59, Jack Krupansky <j...@basetechnology.com> wrote:
>
>>   Yeah, it is in the doc:
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>>
>> And I don’t find a Jira issue mentioning it being removed, so... what’s
>> the full story there?!
>>
>> -- Jack Krupansky
>>
>>  *From:* Idrén, Johan <johan.id...@dice.se>
>> *Sent:* Wednesday, June 4, 2014 8:26 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: memtable mem usage off by 10?
>>
>>
>> Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
>> was going by the documentation. It claims that the property is around in
>> 2.0.
>>
>>
>>
>> If we skip that, part of my reply still makes sense:
>>
>>
>>
>> Having memtable_total_size_in_mb set to 20480, memtables are flushed at a
>> reported value of ~2GB.
>>
>>
>>
>> With a constant overhead of ~10x, as suggested, this would mean that it
>> used 20GB, which is 2x the size of the heap.
>>
>>
>>
>> That shouldn't work. According to the OS, cassandra doesn't use more than
>> ~11-12GB.
>>
>>
>>  ------------------------------
>> *From:* Benedict Elliott Smith <belliottsm...@datastax.com>
>> *Sent:* Wednesday, June 4, 2014 2:07 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: memtable mem usage off by 10?
>>
>>  I'm confused: there is no flush_largest_memtables_at property in C* 2.0?
>>
>>
>> On 4 June 2014 12:55, Idrén, Johan <johan.id...@dice.se> wrote:
>>
>>>  Ok, so the overhead is a constant modifier, right.
>>>
>>>
>>>
>>> The 3x I arrived at with the following assumptions:
>>>
>>>
>>>
>>> heap is 10GB
>>>
>>> Default memory for memtable usage is 1/4 of heap in c* 2.0
>>>  max memory used for memtables is 2,5GB (10/4)
>>>
>>> flush_largest_memtables_at is 0.75
>>>
>>> flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the
>>> default)
>>>
>>>
>>>
>>> With an overhead of 10x, it makes sense that my memtable is flushed when
>>> the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap
>>>
>>>
>>>
>>> After I've set the memtable_total_size_in_mb to a value larger than
>>> 7,5GB, it should still not go over 7,5GB on account of
>>> flush_largest_memtables_at, 3/4 the heap
>>>
>>>
>>>
>>> So I would expect to see memtables flushed to disk after they're being
>>> reportedly at around 750MB.
>>>
>>>
>>>
>>> Having memtable_total_size_in_mb set to 20480, memtables are flushed at
>>> a reported value of ~2GB.
>>>
>>>
>>>
>>> With a constant overhead, this would mean that it used 20GB, which is 2x
>>> the size of the heap, instead of 3/4 of the heap as it should be if
>>> flush_largest_memtables_at was being respected.
>>>
>>>
>>>
>>> This shouldn't be possible.
>>>
>>>
>>>  ------------------------------
>>>  *From:* Benedict Elliott Smith <belliottsm...@datastax.com>
>>>  *Sent:* Wednesday, June 4, 2014 1:19 PM
>>>
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: memtable mem usage off by 10?
>>>
>>>   Unfortunately it looks like the heap utilisation of memtables was not
>>> exposed in earlier versions, because they only maintained an estimate.
>>>
>>> The overhead scales linearly with the amount of data in your memtables
>>> (assuming the size of each cell is approx. constant).
>>>
>>> flush_largest_memtables_at is an independent setting to
>>> memtable_total_space_in_mb, and generally has little effect. Ordinarily
>>> sstable flushes are triggered by hitting the memtable_total_space_in_mb
>>> limit. I'm afraid I don't follow where your 3x comes from?
>>>
>>>
>>> On 4 June 2014 12:04, Idrén, Johan <johan.id...@dice.se> wrote:
>>>
>>>>  Aha, ok. Thanks.
>>>>
>>>>
>>>>
>>>> Trying to understand what my cluster is doing:
>>>>
>>>>
>>>>
>>>> cassandra.db.memtable_data_size only gets me the actual data but not
>>>> the memtable heap memory usage. Is there a way to check for heap memory
>>>> usage?
>>>>
>>>>
>>>>  I would expect to hit the flush_largest_memtables_at value, and this
>>>> would be what causes the memtable flush to sstable then? By default 0.75?
>>>>
>>>>
>>>>  Then I would expect the amount of memory to be used to be maximum ~3x
>>>> of what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
>>>> default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).
>>>>
>>>>
>>>> This is of course assuming that the overhead scales linearly with the
>>>> amount of data in my table, we're using one table with three cells in this
>>>> case. If it hardly increases at all, then I'll give up I guess :)
>>>>
>>>> At least until 2.1.0 comes out and I can compare.
>>>>
>>>>
>>>>  BR
>>>>
>>>> Johan
>>>>
>>>>
>>>>  ------------------------------
>>>>  *From:* Benedict Elliott Smith <belliottsm...@datastax.com>
>>>>  *Sent:* Wednesday, June 4, 2014 12:33 PM
>>>>
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: memtable mem usage off by 10?
>>>>
>>>>   These measurements tell you the amount of user data stored in the
>>>> memtables, not the amount of heap used to store it, so the same applies.
>>>>
>>>>
>>>> On 4 June 2014 11:04, Idrén, Johan <johan.id...@dice.se> wrote:
>>>>
>>>>>  I'm not measuring memtable size by looking at the sstables on disk,
>>>>> no. I'm looking through the JMX data. So I would believe (or hope) that 
>>>>> I'm
>>>>> getting relevant data.
>>>>>
>>>>>
>>>>>
>>>>> If I have a heap of 10GB and set the memtable usage to 20GB, I would
>>>>> expect to hit other problems, but I'm not seeing memory usage over 10GB 
>>>>> for
>>>>> the heap, and the machine (which has ~30gb of memory) is showing ~10GB
>>>>> free, with ~12GB used by cassandra, the rest in caches.
>>>>>
>>>>>
>>>>>
>>>>> Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
>>>>> idling.
>>>>>
>>>>>
>>>>>
>>>>> BR
>>>>>
>>>>> Johan
>>>>>
>>>>>
>>>>>  ------------------------------
>>>>> *From:* Benedict Elliott Smith <belliottsm...@datastax.com>
>>>>> *Sent:* Wednesday, June 4, 2014 11:56 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Re: memtable mem usage off by 10?
>>>>>
>>>>>   If you are storing small values in your columns, the object
>>>>> overhead is very substantial. So what is 400Mb on disk may well be 4Gb in
>>>>> memtables, so if you are measuring the memtable size by the resulting
>>>>> sstable size, you are not getting an accurate picture. This overhead has
>>>>> been reduced by about 90% in the upcoming 2.1 release, through tickets
>>>>> 6271 <https://issues.apache.org/jira/browse/CASSANDRA-6271>, 6689
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-6689> and 6694
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-6694>.
>>>>>
>>>>>
>>>>> On 4 June 2014 10:49, Idrén, Johan <johan.id...@dice.se> wrote:
>>>>>
>>>>>>  Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm seeing some strange behavior of the memtables, both in 1.2.13 and
>>>>>> 2.0.7, basically it looks like it's using 10x less memory than it should
>>>>>> based on the documentation and options.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 10GB heap for both clusters.
>>>>>>
>>>>>> 1.2.x should use 1/3 of the heap for memtables, but it uses max
>>>>>> ~300mb before flushing
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2.0.7, same but 1/4 and ~250mb
>>>>>>
>>>>>>
>>>>>>
>>>>>> In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
>>>>>> which then allowed cassandra to use up to ~400mb for memtables...
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm now running with 20480 for memtable_total_space_in_mb and
>>>>>> cassandra is using ~2GB for memtables.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Soo, off by 10 somewhere? Has anyone else seen this? Can't find a
>>>>>> JIRA for any bug connected to this.
>>>>>>
>>>>>> java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)
>>>>>>
>>>>>>
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> Johan
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: memtable mem usage off by 10?

Reply via email to