Re: kswapd0 causing read timeouts

Gurpreet Singh Wed, 13 Jun 2012 10:20:16 -0700

Alright, here it goes again...
Even with mmap_index_only, once the RES memory hit 15 gigs, the read
latency went berserk. This happens in 12 hours if diskaccessmode is mmap,
abt 48 hrs if its mmap_index_only.


only reads happening at 50 reads/second
row cache size: 730 mb, row cache hit ratio: 0.75
key cache size: 400 mb, key cache hit ratio: 0.4
heap size (max 8 gigs): used 6.1-6.9 gigs

No messages about reducing cache sizes in the logs

stats:
vmstat 1 : no swapping here, however high sys cpu utilization
iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6,
util = 15-30%
top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
cfstats - 70-100 ms. This number used to be 20-30 ms.

The value of the SHR keeps increasing (owing to mmap i guess), while at the
same time buffers keeps decreasing. buffers starts as high as 50 mb, and
goes down to 2 mb.


This is very easily reproducible for me. Every time the RES memory hits abt
15 gigs, the client starts getting timeouts from cassandra, the sys cpu
jumps a lot. All this, even though my row cache hit ratio is almost 0.75.

Other than just turning off mmap completely, is there any other solution or
setting to avoid a cassandra restart every cpl of days. Something to keep
the RES memory to hit such a high number. I have been constantly monitoring
the RES, was not seeing issues when RES was at 14 gigs.
/G

On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh <gurpreet.si...@gmail.com>wrote:

> Aaron, Ruslan,
> I changed the disk access mode to mmap_index_only, and it has been stable
> ever since, well at least for the past 20 hours. Previously, in abt 10-12
> hours, as soon as the resident memory was full, the client would start
> timing out on all its reads. It looks fine for now, i am going to let it
> continue to see how long it lasts and if the problem comes again.
>
> Aaron,
> yes, i had turned swap off.
>
> The total cpu utilization was at 700% roughly.. It looked like kswapd0 was
> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a
> bit. top was reporting high system cpu, and low user cpu.
> vmstat was not showing swapping. java heap size max is 8 gigs. while only
> 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat
> was doing ok from what i remember, i will have to reproduce the issue for
> the exact numbers.
>
> cfstats latency had gone very high, but that is partly due to high cpu
> usage.
>
> One thing was clear, that the SHR was inching higher (due to the mmap)
> while buffer cache which started at abt 20-25mb reduced to 2 MB by the end,
> which probably means that pagecache was being evicted by the kswapd0. Is
> there a way to fix the size of the buffer cache and not let system evict it
> in favour of mmap?
>
> Also, mmapping data files would basically cause not only the data (asked
> for) to be read into main memory, but also a bunch of extra pages
> (readahead), which would not be very useful, right? The same thing for
> index would actually be more useful, as there would be more index entries
> in the readahead part.. and the index files being small wouldnt cause
> memory pressure that page cache would be evicted. mmapping the data files
> would make sense if the data size is smaller than the RAM or the hot data
> set is smaller than the RAM, otherwise just the index would probably be a
> better thing to mmap, no?. In my case data size is 85 gigs, while available
> RAM is 16 gigs (only 8 gigs after heap).
>
> /G
>
>
> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> Ruslan,
>> Why did you suggest changing the disk_access_mode ?
>>
>> Gurpreet,
>> I would leave the disk_access_mode with the default until you have a
>> reason to change it.
>>
>>  > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>
>> is swap disabled ?
>>
>>  Gradually,
>>> > the system cpu becomes high almost 70%, and the client starts getting
>>> > continuous timeouts
>>>
>> 70% of one core or 70% of all cores ?
>> Check the server logs, is there GC activity ?
>> check nodetool cfstats to see the read latency for the cf.
>>
>> Take a look at vmstat to see if you are swapping, and look at iostats to
>> see if io is the problem
>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>>
>> Thanks Ruslan.
>> I will try the mmap_index_only.
>> Is there any guideline as to when to leave it to auto and when to use
>> mmap_index_only?
>>
>> /G
>>
>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov <ruslan.usi...@gmail.com>wrote:
>>
>>> disk_access_mode: mmap??
>>>
>>> set to disk_access_mode: mmap_index_only in cassandra yaml
>>>
>>> 2012/6/8 Gurpreet Singh <gurpreet.si...@gmail.com>:
>>> > Hi,
>>> > I am testing cassandra 1.1 on a 1 node cluster.
>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>> >
>>> > cassandra 1.1.1
>>> > heap size: 8 gigs
>>> > key cache size in mb: 800 (used only 200mb till now)
>>> > memtable_total_space_in_mb : 2048
>>> >
>>> > I am running a read workload.. about 30 reads/second. no writes at all.
>>> > The system runs fine for roughly 12 hours.
>>> >
>>> > jconsole shows that my heap size has hardly touched 4 gigs.
>>> > top shows -
>>> >   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
>>> >   RES increases slowly from 6 gigs all the way to 15 gigs
>>> >   buffers are at a healthy 25 mb at some point and that goes down to 2
>>> mb in
>>> > these 12 hrs
>>> >   VIRT stays at 85 gigs
>>> >
>>> > I understand that SHR goes up because of mmap, RES goes up because it
>>> is
>>> > showing SHR value as well.
>>> >
>>> > After around 10-12 hrs, the cpu utilization of the system starts
>>> increasing,
>>> > and i notice that kswapd0 process starts becoming more active.
>>> Gradually,
>>> > the system cpu becomes high almost 70%, and the client starts getting
>>> > continuous timeouts. The fact that the buffers went down from 20 mb to
>>> 2 mb
>>> > suggests that kswapd0 is probably swapping out the pagecache.
>>> >
>>> > Is there a way out of this to avoid the kswapd0 starting to do things
>>> even
>>> > when there is no swap configured?
>>> > This is very easily reproducible for me, and would like a way out of
>>> this
>>> > situation. Do i need to adjust vm memory management stuff like
>>> pagecache,
>>> > vfs_cache_pressure.. things like that?
>>> >
>>> > just some extra information, jna is installed, mlockall is successful.
>>> there
>>> > is no compaction running.
>>> > would appreciate any help on this.
>>> > Thanks
>>> > Gurpreet
>>> >
>>> >
>>>
>>
>>
>>
>

Re: kswapd0 causing read timeouts

Reply via email to