Hi world, I have a 10-disk RAID-Z2 system with 4 GB of DDR2 RAM and a 3 GHz Core 2 Duo.
It's exporting ~280 filesystems over NFS to about half a dozen machines. Under some loads (in particular, any attempts to rsync between another machine and this one over SSH), the machine's load average sometimes goes insane (27+), and it appears to all be in kernel-land (as nothing in userland reports more than 5% CPU usage, and top reports 50%+ CPU usage). I say 27+ because when the load spikes this high, the machine stops responding to any meaningful commands. Console login will take a username and password then hang forever without printing anything. SSH login will block forever without prompting for user or password. Machine responds to ping. NFS drops. snv_113, this has occurred since the RAID-Z2 was created (b102). I have no idea how to instrument this, as it doesn't appear to be panicking, or running out of RAM (as far as I can see from the last responses of top and prstat), and I don't know how to ask dtrace about where I'm mostly spending my time. I read one or two guides, but I don't follow how the output of it is meaningful. I'm sending this to zfs-discuss as I can't replicate this problem unless I'm doing heavy I/O on ZFS. (Final note - this 10-disk pool is serviced by an ARC 1280ML, and during the time the kernel is heavily under load, zpool iostat -v is reporting no more than 1 MB/s per disk, and almost always to the tune of 128 KB/s.) - Rich -- The generation of random numbers is too important to be left to chance. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss