Howdy James, While responding to halstead's post (see below), I had to restart several times to complete some testing. I'm not sure if that's important to these commands or not, but I just wanted to put it out there anyway.
> A few commands that you could provide the output from > include: > > > (these two show any FMA-related telemetry) > fmadm faulty > fmdump -v This is the output from both commands: [EMAIL PROTECTED]:~# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 27 01:07:08 0d9c30f1-b2c7-66b6-f58d-9c6bcb95392a ZFS-8000-FD Major Fault class : fault.fs.zfs.vdev.io Description : The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Response : The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Run 'zpool status -x' and replace the bad device. [EMAIL PROTECTED]:~# fmdump -v TIME UUID SUNW-MSG-ID Aug 27 01:07:08.2040 0d9c30f1-b2c7-66b6-f58d-9c6bcb95392a ZFS-8000-FD 100% fault.fs.zfs.vdev.io Problem in: zfs://pool=mediapool/vdev=bfaa3595c0bf719 Affects: zfs://pool=mediapool/vdev=bfaa3595c0bf719 FRU: - Location: - > (this shows your storage controllers and what's > connected to them) cfgadm -lav This is the output from cfgadm -lav [EMAIL PROTECTED]:~# cfgadm -lav Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id usb2/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED]:1 usb2/2 connected configured ok Mfg: Microsoft Product: Microsoft 3-Button Mouse with IntelliEye(TM) NConfigs: 1 Config: 0 <no cfg str descr> unavailable usb-mouse n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED]:2 usb3/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],2:1 usb3/2 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],2:2 usb4/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],3:1 usb4/2 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],3:2 usb5/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],4:1 usb5/2 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],4:2 usb6/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:1 usb6/2 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:2 usb6/3 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:3 usb6/4 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:4 usb6/5 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:5 usb6/6 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:6 usb6/7 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:7 usb6/8 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:8 usb6/9 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:9 usb6/10 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],5:10 usb7/1 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],1:1 usb7/2 empty unconfigured ok unavailable unknown n /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],1:2 You'll notice that the only thing listed is my USB mouse... is that expected? > You'll also find messages in /var/adm/messages which > might prove > useful to review. If you really want, I can list the output from /var/adm/messages, but it doesn't seem to add anything new to what I've already copied and pasted. > First and foremost, for me, this is a stupid thing to > do. You've got common-or-garden PC hardware which almost > *definitely* does not support hot plug of devices. Which is what you're > telling us that you're doing. Would try this with your pci/pci-e > cards in this system? I think not. I would if I had some sort of set-up that supposedly promised me redundant PCI/PCI-E cards... You might think it's stupid, but how else could one be sure that the back-up PCI/PCI-E card would take over when the primary one died? Unplugging one of them seems like a fine test to me - It's definitely the worst case scenario, and if the rig survives that, then I _know_ I would be able to rely on it for redundancy should one of the cards fail (which would most likely occur in a less spectacular fashion than a quick yank anyways) > If you absolutely must do something like this, then > please use what's known as "coordinated hotswap" using the > cfgadm(1m) command. > > > Viz: > > (detect fault in disk c2t3d0, in some way) > > # cfgadm -c unconfigure c2::dsk/c2t3d0 > # cfgadm -c disconnect c2::dsk/c2t3d0 > > (go and swap the drive, plugin new drive with same > cable) > > # zpool replace -f poolname c2t3d0 > > > What this will do is tell the kernel to do things in > the right order, and - for zpool - tell it to do an > in-place replacement of device c2t3d0 in your pool. Thanks for the command listings - they'll certainly prove useful if I should ever find myself in a situation where I have to manually swap a disk like you described. Unfortunately though, I'm with Miles Nordin (see below) on this one - I don't want to warn OpenSolaris of what I'm about to do... That would defeat the purpose of the test. Even with technologies (like S.M.A.R.T.) that are designed to give you a bit of a heads-up, as Heikki Suonsivu and Google have noted, they're not very reliable at all (research.google.com/archive/disk_failures.pdf). And I want this test to be as rough as it gets. I don't want to play nice with this system... I want to drag it through the most tortuous worst-case scenario tests I can imagine, and if it survives with all my test data intact, then (and only then) will I begin to trust it. > http://docs.sun.com/app/docs/coll/40.17 (manpages) > http://docs.sun.com/app/docs/coll/47.23 (system admin collection) > http://docs.sun.com/app/docs/doc/817-2271 ZFS admin guide > http://docs.sun.com/app/docs/doc/819-2723 devices + filesystems guide Oohh... Thank you. Good Links. I'm bookmarking these for future reading. They'll definitely be helpful if we end up choosing to deploy OpenSolaris + ZFS for our media servers. -Todd This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss