Are you running compression on the file systems that you're rsync'ing to? That'll drive up the load average pretty high, and it's in the kernel (from what I can tell).
In particular, I've seen gzip compression on ZFS file systems bump the load average over 60 when running multiple parallel rsyncs over SSH. prstat/top shows little userland CPU usage. We're running on 2 cores (8 threads per core) of an UltraSPARC T2 (using LDOMs) and it handles the load nicely - the domain is still acceptably responsive. I can see how a dual core x86 machine would get swamped with such a load. We're running Solaris 10, not OpenSolaris, so it could also be the case that there is a regression somewhere in there. Scott Duckworth, Systems Programmer II Clemson University School of Computing On Tue, May 12, 2009 at 10:10 PM, Rince <rincebr...@gmail.com> wrote: > Hi world, > I have a 10-disk RAID-Z2 system with 4 GB of DDR2 RAM and a 3 GHz Core 2 > Duo. > > It's exporting ~280 filesystems over NFS to about half a dozen machines. > > Under some loads (in particular, any attempts to rsync between another > machine and this one over SSH), the machine's load average sometimes > goes insane (27+), and it appears to all be in kernel-land (as nothing > in userland reports more than 5% CPU usage, and top reports 50%+ CPU > usage). > > I say 27+ because when the load spikes this high, the machine stops > responding to any meaningful commands. Console login will take a > username and password then hang forever without printing anything. SSH > login will block forever without prompting for user or password. > Machine responds to ping. NFS drops. > > snv_113, this has occurred since the RAID-Z2 was created (b102). > > I have no idea how to instrument this, as it doesn't appear to be > panicking, or running out of RAM (as far as I can see from the last > responses of top and prstat), and I don't know how to ask dtrace about > where I'm mostly spending my time. I read one or two guides, but I > don't follow how the output of it is meaningful. > > I'm sending this to zfs-discuss as I can't replicate this problem > unless I'm doing heavy I/O on ZFS. > > (Final note - this 10-disk pool is serviced by an ARC 1280ML, and > during the time the kernel is heavily under load, zpool iostat -v is > reporting no more than 1 MB/s per disk, and almost always to the tune > of 128 KB/s.) > > - Rich > > -- > > The generation of random numbers is too important to be left to chance. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss