On Tue, May 12, 2009 at 09:41:41AM -0700, Ethan Erchinger wrote:
> Hi all,
> 
> I'm having trouble determining what is using a large amount of swap on a
> few of our OpenSolaris systems.  These systems run MySQL, the 5.0.65
> version that came with snv_101, have 48G of ram, and 24G of swap.  The
> MySQL instances are configured to use a 36G innodb buffer pool.  With
> the other (ridiculous amount) of overhead that MySQL has, we're seeing
> the following detail:
> 
> $ prstat -c -s size
>    PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
>  28524 mysql      45G   39G sleep   59   -3 144:13:04 7.0% mysqld/586
>    357 root       30M 4288K sleep   59    0   1:11:05 0.0% fmd/19
>      7 root       15M  844K sleep   59    0   0:01:48 0.0% svc.startd/11
>      9 root       13M  756K sleep   59    0   0:00:33 0.0%
> svc.configd/26
>  17617 root       11M 6724K sleep   59    0   0:27:16 0.1% perl/1
>     78 root     9488K 1500K sleep   59    0   0:00:25 0.0% devfsadm/6
>  26902 ethan    8352K 5368K sleep   59    0   0:00:00 0.0% sshd/1
>  17646 nobody   8120K 2568K sleep   59    0   0:01:38 0.0% gmond/1
>    122 daemon   6948K 3952K sleep   59    0   0:00:05 0.0% kcfd/3
>    440 root     6808K 3456K sleep   59    0   0:05:01 0.0% intrd/1
>  26901 root     6740K 3724K sleep   59    0   0:00:00 0.0% sshd/1
>    409 smmsp    6416K 1448K sleep   59    0   0:00:03 0.0% sendmail/1
>    410 root     6284K 2036K sleep   59    0   0:00:32 0.0% sendmail/1
>  17927 root     5488K 3376K sleep   59    0   0:01:58 0.0% nagmon.pl/1
>    117 root     5264K 1272K sleep   59    0   0:00:00 0.0% syseventd/16
> Total: 38 processes, 720 lwps, load averages: 0.97, 1.14, 1.35
> 
> I include top because it has a bit different detail, but they are
> consistent.
> $ top -b -o size
> load averages:  1.40,  1.25,  1.33;               up 20+16:47:22
> 09:26:14
> 38 processes: 37 sleeping, 1 on cpu
> CPU states: 82.6% idle, 11.9% user,  5.6% kernel,  0.0% iowait,  0.0%
> swap
> Kernel: 33954 ctxsw, 526 trap, 22651 intr, 31375 syscall, 341 flt
> Memory: 48G phys mem, 783M free mem, 24G total swap, 8885M free swap
> 
>    PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  28524 mysql     584  59   -3   45G   39G sleep  144.3H 14.34% mysqld
>    357 root       19  59    0   30M 4284K sleep   71:06  0.00% fmd
>      7 root       11  59    0   15M  844K sleep    1:48  0.00%
> svc.startd
>      9 root       26  59    0   13M  756K sleep    0:33  0.00%
> svc.configd
>  17617 root        1  59    0   11M 6720K sleep   27:17  0.00% perl
>     78 root        6  59    0 9488K 1500K sleep    0:25  0.00% devfsadm
>  26902 ethan       1  59    0 8352K 5364K sleep    0:00  0.00% sshd
>  17646 nobody      1  59    0 8120K 2568K sleep    1:38  0.01% gmond
>    122 daemon      3  59    0 6948K 3952K sleep    0:05  0.00% kcfd
>    440 root        1  59    0 6808K 3456K sleep    5:02  0.00% intrd
>  26901 root        1  59    0 6740K 3724K sleep    0:00  0.00% sshd
>    409 smmsp       1  59    0 6416K 1448K sleep    0:03  0.00% sendmail
>    410 root        1  59    0 6284K 2036K sleep    0:32  0.00% sendmail
>    117 root       16  59    0 5264K 1272K sleep    0:00  0.00% syseventd
>  27431 nobody      1  59    0 5152K 2364K sleep    0:00  0.11% zpool
>    417 root        1  59    0 4308K 1640K sleep    0:00  0.00% sshd
>    377 root       11  59    0 4268K 1676K sleep    0:24  0.00% syslogd
>   2156 daemon      1  59    0 4060K 1616K sleep    0:00  0.00% statd
>  28496 root        1  59    0 3996K  956K sleep    0:00  0.00%
> mysqld_safe
>  27430 ethan       1  59    0 3812K 1880K cpu/0    0:00  0.06% top
>  26905 ethan       1  59    0 3504K 2384K sleep    0:00  0.00% bash
>    354 daemon      1  59    0 3256K 1184K sleep    0:00  0.00% rpcbind
>   2161 daemon      2  60  -20 2980K 1400K sleep    0:00  0.00% lockd
>    272 root        1 100  -20 2872K 1488K sleep    1:35  0.00% xntpd
>     15 dladm       6  59    0 2780K  440K sleep    0:00  0.00% dlmgmtd
>      1 root        1  59    0 2664K  692K sleep    0:06  0.00% init
> 
> $ vmstat 2 2
> kthr      memory            page            disk          faults
> cpu
>  r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us
> sy id
>  0 0 0 34033300 17894104 40 232 79 85 98 0 75 5 31 30 30 12532 11892
> 19573 3 3 93
>  0 0 0 17014080 810356 139 1015 2 0 0 0  0  1 45 46 52 20671 24345 32825
> 4 5 91        
> 
> 
> As you can see, even though the resident size of MySQL is < 40G, the
> system is still using close to 16G of swap.  At first we thought that
> ZFS arc cache was causing this to happen, but we've limited it to 2G,
> via:
> set zfs:zfs_arc_max = 0x80000000 #2G
> 
> # arcstat.pl
>     Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz
> c
> 09:31:55  330M  248M     75  248M   75    2K   18   14M   17     1G
> 1G
> 09:31:56   513   441     85   441   85     0    0   140   66     1G
> 1G
> 
> Running pmap on mysqld shows the same detail, total usage is about 45G:
> 
> # pmap 28524
> 28524:  /usr/mysql/5.0/bin/amd64/mysqld
> 0000000000400000       6548K r-x--  /usr/mysql/5.0/bin/amd64/mysqld
> 0000000000A74000       1832K rw---  /usr/mysql/5.0/bin/amd64/mysqld
> 0000000000C3E000      44816K rw---    [ heap ]
> 0000000003802000   39352336K rw---    [ heap ]
> 0000000965606000    1707892K rw---    [ heap ]
> 00000009CD9E3000    4781200K rw---    [ heap ]
> 0000000AF1707000     780208K rw---    [ heap ]
> FFFFFD7FEA200000       2048K rwx--    [ anon ]
> ...
> FFFFFD7FFFDF3000         52K rw---    [ stack ]
>          total     46715044K
> 
> We see periodic traffic to the rpool disks (where the swap zvol sits),
> but that disk usage is not terribly high, or concerning, though we think
> it does cause slowness in MySQL when paging.  The bigger question is,
> what's using all the swap?  I cannot find a process that needs that kind
> of RAM.  This problem didn't occur on very similar systems, when we were
> running Sol10u5.
> 
> I feel like I'm missing something simple. Anyone have ideas?

I recently tried to debug a similar idiopathic problem with a colleague.
We weren't able to figure out what was causing the problem, partially
because the system took about a week to get into the state where a lot
of swap was in use.  Once it was there, there were only a few obvious
signs of problematic behavior.  Like you, I'm having a hard time
determining whether this is the result of an intentional policy change
in the OS, or a subtle bug that has arisen from unknown causes.  In my
colleague's case, disabling swap improved his performance a lot, but I'm
assuming that's not an option in your configuration.

The next step we were going to take would be to limit the ARC.  It's
interesting that it didn't seem to help in your case.

In your case, it looks like a few processes in the system are using a
lot of memory; however, in his case it was less clear what was consuming
all of the memory.  We've postulated that it's a misbehaving daemon, but
we haven't been able to prove it yet.

If you have time to look at this further, there are some additional
options to the commands that you've been using that might be helpful.

There's a -p option to vmstat that shows the paging statistics.  If you
run with this option, it's really easy to see when pageout or swapout
are writing pages to swap as the apo column will show when anon pages
are written out.

$ vmstat -p 1
     memory           page          executable      anonymous      filesystem 
   swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 20650604 3063604 78 491 0  0   0    0    0    0    0    0    0   56    0    0
 20375652 2077816 377 1270 0 0  0    0    0    0    0    0    0    0    0    0
 20375652 2065968 0 343 0   0   0    0    0    0    0    0    0    0    0    0
 20288612 1969196 13 390 0  0   0    0    0    0    0    0    0    0    0    0
 20291820 1971112 40 489 0  0   0    0    0    0    0    0    0    0    0    0


The swap(1M) command has a -l and a -s option that's useful for getting
a quick baseline about the current swap useage, and physical allocation:

$ swap -sh
total: 1.0G allocated + 259M reserved = 1.3G used, 19G available

$ swap -lh
swapfile             dev    swaplo   blocks     free
/dev/dsk/c4t1d0s1   28,257       4K      20G      20G

If pmap -x isn't working for you, there might be another option.
There's a -S option to pmap that shows the swap allocations.  It may not
provide as much detail as -x, but it should give you a good idea of how
much swap each process is using.

$ pmap -S 106017
106017: /usr/lib/fm/fmd/fmd
 Address  Kbytes    Swap Mode Mapped File
08044000      16      16 rw---    [ stack ]
08050000     248       - r-x--  fmd
0809E000       4       4 rw---  fmd
0809F000  169496  169496 rw---    [ heap ]
FC180000       4       4 rw---    [ anon ]
FC190000      64      64 rwx--    [ anon ]
FC1B0000       4       4 rwx--    [ anon ]
FC2DB000      16      16 rw--R    [ stack tid=24 ]
<...>

It may also be beneficial to take a look at how the kernel is using
memory.  You can do this by running the following as root:

# mdb -k
> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     300079              1172   57%
ZFS File Data               52813               206   10%
Anon                        53397               208   10%
Exec and libs                2073                 8    0%
Page cache                  51586               201   10%
Free (cachelist)            14507                56    3%
Free (freelist)             47442               185    9%

Total                      521897              2038
Physical                   521896              2038

This should show you how the memory is currently allocated between the
kernel and other parts of the system.

HTH,

-j
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to