On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote: > Ahoy. This morning, I awoke to the following on one of my servers: > > pid 59630 (httpd), uid 80, was killed: out of swap space > pid 59341 (find), uid 0, was killed: out of swap space > pid 23134 (irssi), uid 1001, was killed: out of swap space > pid 49332 (sshd), uid 1001, was killed: out of swap space > pid 69074 (httpd), uid 0, was killed: out of swap space > pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space > ... > > And so on. > > The machine is: > > FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: > Thu Dec 2 11:39:21 EST 2010 > sp...@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS amd64 > > 10:13AM up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00 > > The memory line from top intrigued me: > > Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free > > The machine has 8 gigs of memory, and I don't know what all that > wired memory is being used for. There is a large-ish (6 x 1.5-TB) > ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a > few months:
The ZFS ARC is what's responsible for your large wired count. How much swap space do you have? You excluded that line from top. "swapinfo" would also be helpful, but would indicate the same thing. If you lack swap (which is a bad idea for a lot of reasons), then the machine running out of available memory for userspace (a process which grew too large, thus impacting others which were trying to malloc() at the time) would make sense. Can you please provide /boot/loader.conf and /etc/sysctl.conf ? > # zpool status > pool: home > state: DEGRADED > status: One or more devices could not be used because the label is > missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > home DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 UNAVAIL 0 85 11 experienced I/O failures > > errors: No known data errors I would also recommend fixing ada5; I'm not sure why any SA would let a bad disk sit in a machine for "a few months". Though, hopefully, this doesn't cause extra memory usage or something odd behind the scenes (in the kernel). I'm going to assume the two things are completely unrelated. > "vmstat -m" and "vmstat -z" output: > > http://acm.poly.edu/~spawk/vmstat-m.txt > http://acm.poly.edu/~spawk/vmstat-z.txt > > Anyone have a clue? I know it's just going to happen again if I > reboot the machine. It is still up in case there are diagnostics for > me to run. The above vmstat data won't be too helpful since you need to see what's going on "over time" and not what the values are right now. There may be one of them that indicates available userspace vs. available kmem. Basically what you need is the equivalent of Solaris sar(1), so that you can see memory usage of processes/etc. over time and find out if something went crazy and started going malloc-crazy. If the kernel itself ran out, you'd be seeing a panic. Sorry if these ideas/comments seem like a ramble, I've been up all night trying to decode a circa-1992 font routine in 65816 assembly, heh. :-) -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"