Todd H. Poole wrote: > Hmm... I'm leaning away a bit from the hardware, but just in case you've > got an idea, the machine is as follows: > > CPU: AMD Athlon X2 4850e 2.5GHz Socket AM2 45W Dual-Core Processor Model > ADH4850DOBOX > (http://www.newegg.com/Product/Product.aspx?Item=N82E16819103255) > > Motherboard: GIGABYTE GA-MA770-DS3 AM2+/AM2 AMD 770 ATX All Solid > Capacitor AMD Motherboard > (http://www.newegg.com/Product/Product.aspx?Item=N82E16813128081)
.. > The reason why I don't think there's a hardware issue is because before I > got OpenSolaris up and running, I had a fully functional install of > openSuSE 11.0 running (with everything similar to the original server) to > make sure that none of the components were damaged during shipping from > Newegg. Everything worked as expected. Yes, but you're running a new operating system, new filesystem... that's a mountain of difference right in front of you. A few commands that you could provide the output from include: (these two show any FMA-related telemetry) fmadm faulty fmdump -v (this shows your storage controllers and what's connected to them) cfgadm -lav You'll also find messages in /var/adm/messages which might prove useful to review. Apart from that, your description of what you're doing to simulate failure is "however, whenever I unplug the SATA cable from one of the drives (to simulate a catastrophic drive failure) while doing moderate reading from the zpool (such as streaming HD video), not only does the video hang on the remote machine (which is accessing the zpool via NFS), but the server running OpenSolaris seems to either hang, or become incredibly unresponsive." First and foremost, for me, this is a stupid thing to do. You've got common-or-garden PC hardware which almost *definitely* does not support hot plug of devices. Which is what you're telling us that you're doing. Would try this with your pci/pci-e cards in this system? I think not. If you absolutely must do something like this, then please use what's known as "coordinated hotswap" using the cfgadm(1m) command. Viz: (detect fault in disk c2t3d0, in some way) # cfgadm -c unconfigure c2::dsk/c2t3d0 # cfgadm -c disconnect c2::dsk/c2t3d0 (go and swap the drive, plugin new drive with same cable) # zpool replace -f poolname c2t3d0 What this will do is tell the kernel to do things in the right order, and - for zpool - tell it to do an in-place replacement of device c2t3d0 in your pool. There are manpages and admin guides you could have a look through, too: http://docs.sun.com/app/docs/coll/40.17 (manpages) http://docs.sun.com/app/docs/coll/47.23 (system admin collection) http://docs.sun.com/app/docs/doc/817-2271 ZFS admin guide http://docs.sun.com/app/docs/doc/819-2723 devices + filesystems guide James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss