Re: kswapd0 causing read timeouts

ruslan usifov Wed, 13 Jun 2012 12:26:06 -0700

Hm, it's very strange what amount of you data? You linux kernel
version? Java version?


PS: i can suggest switch diskaccessmode to standart in you case
PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
(from oracle site)

2012/6/13 Gurpreet Singh <gurpreet.si...@gmail.com>:
> Alright, here it goes again...
> Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency
> went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs
> if its mmap_index_only.
>
> only reads happening at 50 reads/second
> row cache size: 730 mb, row cache hit ratio: 0.75
> key cache size: 400 mb, key cache hit ratio: 0.4
> heap size (max 8 gigs): used 6.1-6.9 gigs
>
> No messages about reducing cache sizes in the logs
>
> stats:
> vmstat 1 : no swapping here, however high sys cpu utilization
> iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util
> = 15-30%
> top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
> cfstats - 70-100 ms. This number used to be 20-30 ms.
>
> The value of the SHR keeps increasing (owing to mmap i guess), while at the
> same time buffers keeps decreasing. buffers starts as high as 50 mb, and
> goes down to 2 mb.
>
>
> This is very easily reproducible for me. Every time the RES memory hits abt
> 15 gigs, the client starts getting timeouts from cassandra, the sys cpu
> jumps a lot. All this, even though my row cache hit ratio is almost 0.75.
>
> Other than just turning off mmap completely, is there any other solution or
> setting to avoid a cassandra restart every cpl of days. Something to keep
> the RES memory to hit such a high number. I have been constantly monitoring
> the RES, was not seeing issues when RES was at 14 gigs.
> /G
>
> On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh <gurpreet.si...@gmail.com>
> wrote:
>>
>> Aaron, Ruslan,
>> I changed the disk access mode to mmap_index_only, and it has been stable
>> ever since, well at least for the past 20 hours. Previously, in abt 10-12
>> hours, as soon as the resident memory was full, the client would start
>> timing out on all its reads. It looks fine for now, i am going to let it
>> continue to see how long it lasts and if the problem comes again.
>>
>> Aaron,
>> yes, i had turned swap off.
>>
>> The total cpu utilization was at 700% roughly.. It looked like kswapd0 was
>> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a
>> bit. top was reporting high system cpu, and low user cpu.
>> vmstat was not showing swapping. java heap size max is 8 gigs. while only
>> 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat
>> was doing ok from what i remember, i will have to reproduce the issue for
>> the exact numbers.
>>
>> cfstats latency had gone very high, but that is partly due to high cpu
>> usage.
>>
>> One thing was clear, that the SHR was inching higher (due to the mmap)
>> while buffer cache which started at abt 20-25mb reduced to 2 MB by the end,
>> which probably means that pagecache was being evicted by the kswapd0. Is
>> there a way to fix the size of the buffer cache and not let system evict it
>> in favour of mmap?
>>
>> Also, mmapping data files would basically cause not only the data (asked
>> for) to be read into main memory, but also a bunch of extra pages
>> (readahead), which would not be very useful, right? The same thing for index
>> would actually be more useful, as there would be more index entries in the
>> readahead part.. and the index files being small wouldnt cause memory
>> pressure that page cache would be evicted. mmapping the data files would
>> make sense if the data size is smaller than the RAM or the hot data set is
>> smaller than the RAM, otherwise just the index would probably be a better
>> thing to mmap, no?. In my case data size is 85 gigs, while available RAM is
>> 16 gigs (only 8 gigs after heap).
>>
>> /G
>>
>>
>> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>>>
>>> Ruslan,
>>> Why did you suggest changing the disk_access_mode ?
>>>
>>> Gurpreet,
>>> I would leave the disk_access_mode with the default until you have a
>>> reason to change it.
>>>
>>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>
>>> is swap disabled ?
>>>
>>>> Gradually,
>>>> > the system cpu becomes high almost 70%, and the client starts getting
>>>> > continuous timeouts
>>>
>>> 70% of one core or 70% of all cores ?
>>> Check the server logs, is there GC activity ?
>>> check nodetool cfstats to see the read latency for the cf.
>>>
>>> Take a look at vmstat to see if you are swapping, and look at iostats to
>>> see if io is the problem
>>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>>>
>>> Thanks Ruslan.
>>> I will try the mmap_index_only.
>>> Is there any guideline as to when to leave it to auto and when to use
>>> mmap_index_only?
>>>
>>> /G
>>>
>>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov <ruslan.usi...@gmail.com>
>>> wrote:
>>>>
>>>> disk_access_mode: mmap??
>>>>
>>>> set to disk_access_mode: mmap_index_only in cassandra yaml
>>>>
>>>> 2012/6/8 Gurpreet Singh <gurpreet.si...@gmail.com>:
>>>> > Hi,
>>>> > I am testing cassandra 1.1 on a 1 node cluster.
>>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>> >
>>>> > cassandra 1.1.1
>>>> > heap size: 8 gigs
>>>> > key cache size in mb: 800 (used only 200mb till now)
>>>> > memtable_total_space_in_mb : 2048
>>>> >
>>>> > I am running a read workload.. about 30 reads/second. no writes at
>>>> > all.
>>>> > The system runs fine for roughly 12 hours.
>>>> >
>>>> > jconsole shows that my heap size has hardly touched 4 gigs.
>>>> > top shows -
>>>> >   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
>>>> >   RES increases slowly from 6 gigs all the way to 15 gigs
>>>> >   buffers are at a healthy 25 mb at some point and that goes down to 2
>>>> > mb in
>>>> > these 12 hrs
>>>> >   VIRT stays at 85 gigs
>>>> >
>>>> > I understand that SHR goes up because of mmap, RES goes up because it
>>>> > is
>>>> > showing SHR value as well.
>>>> >
>>>> > After around 10-12 hrs, the cpu utilization of the system starts
>>>> > increasing,
>>>> > and i notice that kswapd0 process starts becoming more active.
>>>> > Gradually,
>>>> > the system cpu becomes high almost 70%, and the client starts getting
>>>> > continuous timeouts. The fact that the buffers went down from 20 mb to
>>>> > 2 mb
>>>> > suggests that kswapd0 is probably swapping out the pagecache.
>>>> >
>>>> > Is there a way out of this to avoid the kswapd0 starting to do things
>>>> > even
>>>> > when there is no swap configured?
>>>> > This is very easily reproducible for me, and would like a way out of
>>>> > this
>>>> > situation. Do i need to adjust vm memory management stuff like
>>>> > pagecache,
>>>> > vfs_cache_pressure.. things like that?
>>>> >
>>>> > just some extra information, jna is installed, mlockall is successful.
>>>> > there
>>>> > is no compaction running.
>>>> > would appreciate any help on this.
>>>> > Thanks
>>>> > Gurpreet
>>>> >
>>>> >
>>>
>>>
>>>
>>
>

Re: kswapd0 causing read timeouts

Reply via email to