> I recently tried to debug a similar idiopathic problem with a
colleague.
> We weren't able to figure out what was causing the problem, partially
> because the system took about a week to get into the state where a lot
> of swap was in use.  Once it was there, there were only a few obvious
> signs of problematic behavior.  Like you, I'm having a hard time
> determining whether this is the result of an intentional policy change
> in the OS, or a subtle bug that has arisen from unknown causes.  In my
> colleague's case, disabling swap improved his performance a lot, but
I'm
> assuming that's not an option in your configuration.

Two things:
 - I think it is a policy change, or a bug in the kernel, as our usage
of the system has not changed from when we ran Solaris 10 proper.
 - Disabling swap in Solaris doesn't work for us, because MySQL won't
even start if the buffer_pool is set past ~10G.
>
> The next step we were going to take would be to limit the ARC.  It's
> interesting that it didn't seem to help in your case.
We were swapping more when the ARC cache was unlimited, limiting it
seems to just not swap out as much, or not swap as many pages that will
be re-requested soon.
>
> In your case, it looks like a few processes in the system are using a
> lot of memory; however, in his case it was less clear what was
consuming
> all of the memory.  We've postulated that it's a misbehaving daemon,
but
> we haven't been able to prove it yet.
Really 1 process, MySQL.  We may have a misbehaving process, like "pkg
refresh" or something that chews up a couple hundred MBs of ram
periodically, but that doesn't seem like it would chew up 18G of swap.

>
> If you have time to look at this further, there are some additional
> options to the commands that you've been using that might be helpful.
>
> There's a -p option to vmstat that shows the paging statistics.  If
you
> run with this option, it's really easy to see when pageout or swapout
> are writing pages to swap as the apo column will show when anon pages
> are written out.
>
This is useful, but I think I'm in the same boat as you were/are.  The
swap usage creeps up really slowly over 3-4 days time, so it's not
simple to see what's the culprit.

>
> If pmap -x isn't working for you, there might be another option.
> There's a -S option to pmap that shows the swap allocations.  It may
not
> provide as much detail as -x, but it should give you a good idea of
how
> much swap each process is using.
>
We noticed the same problem with -S as with -x, it hangs the process, as
if you paused truss when attached.  Maybe I'll try it on one of the
slave MySQL instances.

Thanks.

> It may also be beneficial to take a look at how the kernel is using
> memory.  You can do this by running the following as root:
>
My results:
# mdb -k
::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     765235              2989    6%
ZFS File Data               39798               155    0%
Anon                     11488452             44876   91%
Exec and libs                6907                26    0%
Page cache                    220                 0    0%
Free (cachelist)             5259                20    0%
Free (freelist)            274795              1073    2%

Total                    12580666             49143
Physical                 12580665             49143

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to