> I recently tried to debug a similar idiopathic problem with a colleague. > We weren't able to figure out what was causing the problem, partially > because the system took about a week to get into the state where a lot > of swap was in use. Once it was there, there were only a few obvious > signs of problematic behavior. Like you, I'm having a hard time > determining whether this is the result of an intentional policy change > in the OS, or a subtle bug that has arisen from unknown causes. In my > colleague's case, disabling swap improved his performance a lot, but I'm > assuming that's not an option in your configuration.
Two things: - I think it is a policy change, or a bug in the kernel, as our usage of the system has not changed from when we ran Solaris 10 proper. - Disabling swap in Solaris doesn't work for us, because MySQL won't even start if the buffer_pool is set past ~10G. > > The next step we were going to take would be to limit the ARC. It's > interesting that it didn't seem to help in your case. We were swapping more when the ARC cache was unlimited, limiting it seems to just not swap out as much, or not swap as many pages that will be re-requested soon. > > In your case, it looks like a few processes in the system are using a > lot of memory; however, in his case it was less clear what was consuming > all of the memory. We've postulated that it's a misbehaving daemon, but > we haven't been able to prove it yet. Really 1 process, MySQL. We may have a misbehaving process, like "pkg refresh" or something that chews up a couple hundred MBs of ram periodically, but that doesn't seem like it would chew up 18G of swap. > > If you have time to look at this further, there are some additional > options to the commands that you've been using that might be helpful. > > There's a -p option to vmstat that shows the paging statistics. If you > run with this option, it's really easy to see when pageout or swapout > are writing pages to swap as the apo column will show when anon pages > are written out. > This is useful, but I think I'm in the same boat as you were/are. The swap usage creeps up really slowly over 3-4 days time, so it's not simple to see what's the culprit. > > If pmap -x isn't working for you, there might be another option. > There's a -S option to pmap that shows the swap allocations. It may not > provide as much detail as -x, but it should give you a good idea of how > much swap each process is using. > We noticed the same problem with -S as with -x, it hangs the process, as if you paused truss when attached. Maybe I'll try it on one of the slave MySQL instances. Thanks. > It may also be beneficial to take a look at how the kernel is using > memory. You can do this by running the following as root: > My results: # mdb -k ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 765235 2989 6% ZFS File Data 39798 155 0% Anon 11488452 44876 91% Exec and libs 6907 26 0% Page cache 220 0 0% Free (cachelist) 5259 20 0% Free (freelist) 274795 1073 2% Total 12580666 49143 Physical 12580665 49143 _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org