On Tue, May 12, 2009 at 09:41:41AM -0700, Ethan Erchinger wrote: > Hi all, > > I'm having trouble determining what is using a large amount of swap on a > few of our OpenSolaris systems. These systems run MySQL, the 5.0.65 > version that came with snv_101, have 48G of ram, and 24G of swap. The > MySQL instances are configured to use a 36G innodb buffer pool. With > the other (ridiculous amount) of overhead that MySQL has, we're seeing > the following detail: > > $ prstat -c -s size > PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP > 28524 mysql 45G 39G sleep 59 -3 144:13:04 7.0% mysqld/586 > 357 root 30M 4288K sleep 59 0 1:11:05 0.0% fmd/19 > 7 root 15M 844K sleep 59 0 0:01:48 0.0% svc.startd/11 > 9 root 13M 756K sleep 59 0 0:00:33 0.0% > svc.configd/26 > 17617 root 11M 6724K sleep 59 0 0:27:16 0.1% perl/1 > 78 root 9488K 1500K sleep 59 0 0:00:25 0.0% devfsadm/6 > 26902 ethan 8352K 5368K sleep 59 0 0:00:00 0.0% sshd/1 > 17646 nobody 8120K 2568K sleep 59 0 0:01:38 0.0% gmond/1 > 122 daemon 6948K 3952K sleep 59 0 0:00:05 0.0% kcfd/3 > 440 root 6808K 3456K sleep 59 0 0:05:01 0.0% intrd/1 > 26901 root 6740K 3724K sleep 59 0 0:00:00 0.0% sshd/1 > 409 smmsp 6416K 1448K sleep 59 0 0:00:03 0.0% sendmail/1 > 410 root 6284K 2036K sleep 59 0 0:00:32 0.0% sendmail/1 > 17927 root 5488K 3376K sleep 59 0 0:01:58 0.0% nagmon.pl/1 > 117 root 5264K 1272K sleep 59 0 0:00:00 0.0% syseventd/16 > Total: 38 processes, 720 lwps, load averages: 0.97, 1.14, 1.35 > > I include top because it has a bit different detail, but they are > consistent. > $ top -b -o size > load averages: 1.40, 1.25, 1.33; up 20+16:47:22 > 09:26:14 > 38 processes: 37 sleeping, 1 on cpu > CPU states: 82.6% idle, 11.9% user, 5.6% kernel, 0.0% iowait, 0.0% > swap > Kernel: 33954 ctxsw, 526 trap, 22651 intr, 31375 syscall, 341 flt > Memory: 48G phys mem, 783M free mem, 24G total swap, 8885M free swap > > PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 28524 mysql 584 59 -3 45G 39G sleep 144.3H 14.34% mysqld > 357 root 19 59 0 30M 4284K sleep 71:06 0.00% fmd > 7 root 11 59 0 15M 844K sleep 1:48 0.00% > svc.startd > 9 root 26 59 0 13M 756K sleep 0:33 0.00% > svc.configd > 17617 root 1 59 0 11M 6720K sleep 27:17 0.00% perl > 78 root 6 59 0 9488K 1500K sleep 0:25 0.00% devfsadm > 26902 ethan 1 59 0 8352K 5364K sleep 0:00 0.00% sshd > 17646 nobody 1 59 0 8120K 2568K sleep 1:38 0.01% gmond > 122 daemon 3 59 0 6948K 3952K sleep 0:05 0.00% kcfd > 440 root 1 59 0 6808K 3456K sleep 5:02 0.00% intrd > 26901 root 1 59 0 6740K 3724K sleep 0:00 0.00% sshd > 409 smmsp 1 59 0 6416K 1448K sleep 0:03 0.00% sendmail > 410 root 1 59 0 6284K 2036K sleep 0:32 0.00% sendmail > 117 root 16 59 0 5264K 1272K sleep 0:00 0.00% syseventd > 27431 nobody 1 59 0 5152K 2364K sleep 0:00 0.11% zpool > 417 root 1 59 0 4308K 1640K sleep 0:00 0.00% sshd > 377 root 11 59 0 4268K 1676K sleep 0:24 0.00% syslogd > 2156 daemon 1 59 0 4060K 1616K sleep 0:00 0.00% statd > 28496 root 1 59 0 3996K 956K sleep 0:00 0.00% > mysqld_safe > 27430 ethan 1 59 0 3812K 1880K cpu/0 0:00 0.06% top > 26905 ethan 1 59 0 3504K 2384K sleep 0:00 0.00% bash > 354 daemon 1 59 0 3256K 1184K sleep 0:00 0.00% rpcbind > 2161 daemon 2 60 -20 2980K 1400K sleep 0:00 0.00% lockd > 272 root 1 100 -20 2872K 1488K sleep 1:35 0.00% xntpd > 15 dladm 6 59 0 2780K 440K sleep 0:00 0.00% dlmgmtd > 1 root 1 59 0 2664K 692K sleep 0:06 0.00% init > > $ vmstat 2 2 > kthr memory page disk faults > cpu > r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us > sy id > 0 0 0 34033300 17894104 40 232 79 85 98 0 75 5 31 30 30 12532 11892 > 19573 3 3 93 > 0 0 0 17014080 810356 139 1015 2 0 0 0 0 1 45 46 52 20671 24345 32825 > 4 5 91 > > > As you can see, even though the resident size of MySQL is < 40G, the > system is still using close to 16G of swap. At first we thought that > ZFS arc cache was causing this to happen, but we've limited it to 2G, > via: > set zfs:zfs_arc_max = 0x80000000 #2G > > # arcstat.pl > Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz > c > 09:31:55 330M 248M 75 248M 75 2K 18 14M 17 1G > 1G > 09:31:56 513 441 85 441 85 0 0 140 66 1G > 1G > > Running pmap on mysqld shows the same detail, total usage is about 45G: > > # pmap 28524 > 28524: /usr/mysql/5.0/bin/amd64/mysqld > 0000000000400000 6548K r-x-- /usr/mysql/5.0/bin/amd64/mysqld > 0000000000A74000 1832K rw--- /usr/mysql/5.0/bin/amd64/mysqld > 0000000000C3E000 44816K rw--- [ heap ] > 0000000003802000 39352336K rw--- [ heap ] > 0000000965606000 1707892K rw--- [ heap ] > 00000009CD9E3000 4781200K rw--- [ heap ] > 0000000AF1707000 780208K rw--- [ heap ] > FFFFFD7FEA200000 2048K rwx-- [ anon ] > ... > FFFFFD7FFFDF3000 52K rw--- [ stack ] > total 46715044K > > We see periodic traffic to the rpool disks (where the swap zvol sits), > but that disk usage is not terribly high, or concerning, though we think > it does cause slowness in MySQL when paging. The bigger question is, > what's using all the swap? I cannot find a process that needs that kind > of RAM. This problem didn't occur on very similar systems, when we were > running Sol10u5. > > I feel like I'm missing something simple. Anyone have ideas?
I recently tried to debug a similar idiopathic problem with a colleague. We weren't able to figure out what was causing the problem, partially because the system took about a week to get into the state where a lot of swap was in use. Once it was there, there were only a few obvious signs of problematic behavior. Like you, I'm having a hard time determining whether this is the result of an intentional policy change in the OS, or a subtle bug that has arisen from unknown causes. In my colleague's case, disabling swap improved his performance a lot, but I'm assuming that's not an option in your configuration. The next step we were going to take would be to limit the ARC. It's interesting that it didn't seem to help in your case. In your case, it looks like a few processes in the system are using a lot of memory; however, in his case it was less clear what was consuming all of the memory. We've postulated that it's a misbehaving daemon, but we haven't been able to prove it yet. If you have time to look at this further, there are some additional options to the commands that you've been using that might be helpful. There's a -p option to vmstat that shows the paging statistics. If you run with this option, it's really easy to see when pageout or swapout are writing pages to swap as the apo column will show when anon pages are written out. $ vmstat -p 1 memory page executable anonymous filesystem swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf 20650604 3063604 78 491 0 0 0 0 0 0 0 0 0 56 0 0 20375652 2077816 377 1270 0 0 0 0 0 0 0 0 0 0 0 0 20375652 2065968 0 343 0 0 0 0 0 0 0 0 0 0 0 0 20288612 1969196 13 390 0 0 0 0 0 0 0 0 0 0 0 0 20291820 1971112 40 489 0 0 0 0 0 0 0 0 0 0 0 0 The swap(1M) command has a -l and a -s option that's useful for getting a quick baseline about the current swap useage, and physical allocation: $ swap -sh total: 1.0G allocated + 259M reserved = 1.3G used, 19G available $ swap -lh swapfile dev swaplo blocks free /dev/dsk/c4t1d0s1 28,257 4K 20G 20G If pmap -x isn't working for you, there might be another option. There's a -S option to pmap that shows the swap allocations. It may not provide as much detail as -x, but it should give you a good idea of how much swap each process is using. $ pmap -S 106017 106017: /usr/lib/fm/fmd/fmd Address Kbytes Swap Mode Mapped File 08044000 16 16 rw--- [ stack ] 08050000 248 - r-x-- fmd 0809E000 4 4 rw--- fmd 0809F000 169496 169496 rw--- [ heap ] FC180000 4 4 rw--- [ anon ] FC190000 64 64 rwx-- [ anon ] FC1B0000 4 4 rwx-- [ anon ] FC2DB000 16 16 rw--R [ stack tid=24 ] <...> It may also be beneficial to take a look at how the kernel is using memory. You can do this by running the following as root: # mdb -k > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 300079 1172 57% ZFS File Data 52813 206 10% Anon 53397 208 10% Exec and libs 2073 8 0% Page cache 51586 201 10% Free (cachelist) 14507 56 3% Free (freelist) 47442 185 9% Total 521897 2038 Physical 521896 2038 This should show you how the memory is currently allocated between the kernel and other parts of the system. HTH, -j _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org