Re: kswapd0 causing read timeouts

ruslan usifov Thu, 14 Jun 2012 13:32:12 -0700

Soory i mistaken,here is right string

 INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA
mlockall successful





2012/6/15 ruslan usifov <ruslan.usi...@gmail.com>:
> 2012/6/14 Gurpreet Singh <gurpreet.si...@gmail.com>:
>> JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
>> on this..
>> 1. Is there a way to find out if mlockall really worked other than just the
>> mlockall successful log message?
> yes you must see something like this (from our test server):
>
>  INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line
> 233) Global memtable threshold is enabled at 512MB
>
>
>> 2. Does cassandra only mlock the jvm heap or also the mmaped memory?
>
> Cassandra obviously mlock only heap, and doesn't mmaped sstables
>
>
>>
>> I disabled mmap completely, and things look so much better.
>> latency is surprisingly half of what i see when i have mmap enabled.
>> Its funny that i keep reading tall claims abt mmap, but in practise a lot of
>> ppl have problems with it, especially when it uses up all the memory. We
>> have tried mmap for different purposes in our company before,and had finally
>> ended up disabling it, because it just doesnt handle things right when
>> memory is low. Maybe the proc/sys/vm needs to be configured right, but thats
>> not the easiest of configurations to get right.
>>
>> Right now, i am handling only 80 gigs of data. kernel version is 2.6.26.
>> java version is 1.6.21
>> /G
>>
>>
>> On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey <a...@ooyala.com> wrote:
>>>
>>> I would check /etc/sysctl.conf and get the values of
>>> /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure.
>>>
>>> If you don't have JNA enabled (which Cassandra uses to fadvise) and
>>> swappiness is at its default of 60, the Linux kernel will happily swap out
>>> your heap for cache space.  Set swappiness to 1 or 'swapoff -a' and kswapd
>>> shouldn't be doing much unless you have a too-large heap or some other app
>>> using up memory on the system.
>>>
>>>
>>> On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov <ruslan.usi...@gmail.com>
>>> wrote:
>>>>
>>>> Hm, it's very strange what amount of you data? You linux kernel
>>>> version? Java version?
>>>>
>>>> PS: i can suggest switch diskaccessmode to standart in you case
>>>> PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
>>>> (from oracle site)
>>>>
>>>> 2012/6/13 Gurpreet Singh <gurpreet.si...@gmail.com>:
>>>> > Alright, here it goes again...
>>>> > Even with mmap_index_only, once the RES memory hit 15 gigs, the read
>>>> > latency
>>>> > went berserk. This happens in 12 hours if diskaccessmode is mmap, abt
>>>> > 48 hrs
>>>> > if its mmap_index_only.
>>>> >
>>>> > only reads happening at 50 reads/second
>>>> > row cache size: 730 mb, row cache hit ratio: 0.75
>>>> > key cache size: 400 mb, key cache hit ratio: 0.4
>>>> > heap size (max 8 gigs): used 6.1-6.9 gigs
>>>> >
>>>> > No messages about reducing cache sizes in the logs
>>>> >
>>>> > stats:
>>>> > vmstat 1 : no swapping here, however high sys cpu utilization
>>>> > iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6,
>>>> > util
>>>> > = 15-30%
>>>> > top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
>>>> > cfstats - 70-100 ms. This number used to be 20-30 ms.
>>>> >
>>>> > The value of the SHR keeps increasing (owing to mmap i guess), while at
>>>> > the
>>>> > same time buffers keeps decreasing. buffers starts as high as 50 mb,
>>>> > and
>>>> > goes down to 2 mb.
>>>> >
>>>> >
>>>> > This is very easily reproducible for me. Every time the RES memory hits
>>>> > abt
>>>> > 15 gigs, the client starts getting timeouts from cassandra, the sys cpu
>>>> > jumps a lot. All this, even though my row cache hit ratio is almost
>>>> > 0.75.
>>>> >
>>>> > Other than just turning off mmap completely, is there any other
>>>> > solution or
>>>> > setting to avoid a cassandra restart every cpl of days. Something to
>>>> > keep
>>>> > the RES memory to hit such a high number. I have been constantly
>>>> > monitoring
>>>> > the RES, was not seeing issues when RES was at 14 gigs.
>>>> > /G
>>>> >
>>>> > On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh
>>>> > <gurpreet.si...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Aaron, Ruslan,
>>>> >> I changed the disk access mode to mmap_index_only, and it has been
>>>> >> stable
>>>> >> ever since, well at least for the past 20 hours. Previously, in abt
>>>> >> 10-12
>>>> >> hours, as soon as the resident memory was full, the client would start
>>>> >> timing out on all its reads. It looks fine for now, i am going to let
>>>> >> it
>>>> >> continue to see how long it lasts and if the problem comes again.
>>>> >>
>>>> >> Aaron,
>>>> >> yes, i had turned swap off.
>>>> >>
>>>> >> The total cpu utilization was at 700% roughly.. It looked like kswapd0
>>>> >> was
>>>> >> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite
>>>> >> a
>>>> >> bit. top was reporting high system cpu, and low user cpu.
>>>> >> vmstat was not showing swapping. java heap size max is 8 gigs. while
>>>> >> only
>>>> >> 4 gigs was in use, so java heap was doing great. no gc in the logs.
>>>> >> iostat
>>>> >> was doing ok from what i remember, i will have to reproduce the issue
>>>> >> for
>>>> >> the exact numbers.
>>>> >>
>>>> >> cfstats latency had gone very high, but that is partly due to high cpu
>>>> >> usage.
>>>> >>
>>>> >> One thing was clear, that the SHR was inching higher (due to the mmap)
>>>> >> while buffer cache which started at abt 20-25mb reduced to 2 MB by the
>>>> >> end,
>>>> >> which probably means that pagecache was being evicted by the kswapd0.
>>>> >> Is
>>>> >> there a way to fix the size of the buffer cache and not let system
>>>> >> evict it
>>>> >> in favour of mmap?
>>>> >>
>>>> >> Also, mmapping data files would basically cause not only the data
>>>> >> (asked
>>>> >> for) to be read into main memory, but also a bunch of extra pages
>>>> >> (readahead), which would not be very useful, right? The same thing for
>>>> >> index
>>>> >> would actually be more useful, as there would be more index entries in
>>>> >> the
>>>> >> readahead part.. and the index files being small wouldnt cause memory
>>>> >> pressure that page cache would be evicted. mmapping the data files
>>>> >> would
>>>> >> make sense if the data size is smaller than the RAM or the hot data
>>>> >> set is
>>>> >> smaller than the RAM, otherwise just the index would probably be a
>>>> >> better
>>>> >> thing to mmap, no?. In my case data size is 85 gigs, while available
>>>> >> RAM is
>>>> >> 16 gigs (only 8 gigs after heap).
>>>> >>
>>>> >> /G
>>>> >>
>>>> >>
>>>> >> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton
>>>> >> <aa...@thelastpickle.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Ruslan,
>>>> >>> Why did you suggest changing the disk_access_mode ?
>>>> >>>
>>>> >>> Gurpreet,
>>>> >>> I would leave the disk_access_mode with the default until you have a
>>>> >>> reason to change it.
>>>> >>>
>>>> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>> >>>
>>>> >>> is swap disabled ?
>>>> >>>
>>>> >>>> Gradually,
>>>> >>>> > the system cpu becomes high almost 70%, and the client starts
>>>> >>>> > getting
>>>> >>>> > continuous timeouts
>>>> >>>
>>>> >>> 70% of one core or 70% of all cores ?
>>>> >>> Check the server logs, is there GC activity ?
>>>> >>> check nodetool cfstats to see the read latency for the cf.
>>>> >>>
>>>> >>> Take a look at vmstat to see if you are swapping, and look at iostats
>>>> >>> to
>>>> >>> see if io is the problem
>>>> >>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>>>> >>>
>>>> >>> Cheers
>>>> >>>
>>>> >>> -----------------
>>>> >>> Aaron Morton
>>>> >>> Freelance Developer
>>>> >>> @aaronmorton
>>>> >>> http://www.thelastpickle.com
>>>> >>>
>>>> >>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>>>> >>>
>>>> >>> Thanks Ruslan.
>>>> >>> I will try the mmap_index_only.
>>>> >>> Is there any guideline as to when to leave it to auto and when to use
>>>> >>> mmap_index_only?
>>>> >>>
>>>> >>> /G
>>>> >>>
>>>> >>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov
>>>> >>> <ruslan.usi...@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> disk_access_mode: mmap??
>>>> >>>>
>>>> >>>> set to disk_access_mode: mmap_index_only in cassandra yaml
>>>> >>>>
>>>> >>>> 2012/6/8 Gurpreet Singh <gurpreet.si...@gmail.com>:
>>>> >>>> > Hi,
>>>> >>>> > I am testing cassandra 1.1 on a 1 node cluster.
>>>> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>> >>>> >
>>>> >>>> > cassandra 1.1.1
>>>> >>>> > heap size: 8 gigs
>>>> >>>> > key cache size in mb: 800 (used only 200mb till now)
>>>> >>>> > memtable_total_space_in_mb : 2048
>>>> >>>> >
>>>> >>>> > I am running a read workload.. about 30 reads/second. no writes at
>>>> >>>> > all.
>>>> >>>> > The system runs fine for roughly 12 hours.
>>>> >>>> >
>>>> >>>> > jconsole shows that my heap size has hardly touched 4 gigs.
>>>> >>>> > top shows -
>>>> >>>> >   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
>>>> >>>> >   RES increases slowly from 6 gigs all the way to 15 gigs
>>>> >>>> >   buffers are at a healthy 25 mb at some point and that goes down
>>>> >>>> > to 2
>>>> >>>> > mb in
>>>> >>>> > these 12 hrs
>>>> >>>> >   VIRT stays at 85 gigs
>>>> >>>> >
>>>> >>>> > I understand that SHR goes up because of mmap, RES goes up because
>>>> >>>> > it
>>>> >>>> > is
>>>> >>>> > showing SHR value as well.
>>>> >>>> >
>>>> >>>> > After around 10-12 hrs, the cpu utilization of the system starts
>>>> >>>> > increasing,
>>>> >>>> > and i notice that kswapd0 process starts becoming more active.
>>>> >>>> > Gradually,
>>>> >>>> > the system cpu becomes high almost 70%, and the client starts
>>>> >>>> > getting
>>>> >>>> > continuous timeouts. The fact that the buffers went down from 20
>>>> >>>> > mb to
>>>> >>>> > 2 mb
>>>> >>>> > suggests that kswapd0 is probably swapping out the pagecache.
>>>> >>>> >
>>>> >>>> > Is there a way out of this to avoid the kswapd0 starting to do
>>>> >>>> > things
>>>> >>>> > even
>>>> >>>> > when there is no swap configured?
>>>> >>>> > This is very easily reproducible for me, and would like a way out
>>>> >>>> > of
>>>> >>>> > this
>>>> >>>> > situation. Do i need to adjust vm memory management stuff like
>>>> >>>> > pagecache,
>>>> >>>> > vfs_cache_pressure.. things like that?
>>>> >>>> >
>>>> >>>> > just some extra information, jna is installed, mlockall is
>>>> >>>> > successful.
>>>> >>>> > there
>>>> >>>> > is no compaction running.
>>>> >>>> > would appreciate any help on this.
>>>> >>>> > Thanks
>>>> >>>> > Gurpreet
>>>> >>>> >
>>>> >>>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>

Re: kswapd0 causing read timeouts

Reply via email to