Howdy James,

While responding to halstead's post (see below), I had to restart several times 
to complete some testing. I'm not sure if that's important to these commands or 
not, but I just wanted to put it out there anyway.

> A few commands that you could provide the output from
> include:
> 
> 
> (these two show any FMA-related telemetry)
> fmadm faulty
> fmdump -v

This is the output from both commands:

[EMAIL PROTECTED]:~# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 27 01:07:08 0d9c30f1-b2c7-66b6-f58d-9c6bcb95392a  ZFS-8000-FD    Major

Fault class : fault.fs.zfs.vdev.io
Description : The number of I/O errors associated with a ZFS device exceeded
                    acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
             for more information.
Response    : The device has been offlined and marked as faulted.  An attempt
                    will be made to activate a hot spare if available.
Impact      : Fault tolerance of the pool may be compromised.
Action      : Run 'zpool status -x' and replace the bad device.



[EMAIL PROTECTED]:~# fmdump -v
TIME                 UUID                                 SUNW-MSG-ID
Aug 27 01:07:08.2040 0d9c30f1-b2c7-66b6-f58d-9c6bcb95392a ZFS-8000-FD
 100%  fault.fs.zfs.vdev.io

       Problem in: zfs://pool=mediapool/vdev=bfaa3595c0bf719
          Affects: zfs://pool=mediapool/vdev=bfaa3595c0bf719
              FRU: -
         Location: -


> (this shows your storage controllers and what's
> connected to them) cfgadm -lav

This is the output from cfgadm -lav

[EMAIL PROTECTED]:~# cfgadm -lav
Ap_Id                          Receptacle   Occupant     Condition  Information
When         Type         Busy     Phys_Id
usb2/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED]:1
usb2/2                         connected    configured   ok
Mfg: Microsoft  Product: Microsoft 3-Button Mouse with IntelliEye(TM)
NConfigs: 1  Config: 0  <no cfg str descr>
unavailable  usb-mouse    n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED]:2
usb3/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],2:1
usb3/2                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],2:2
usb4/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],3:1
usb4/2                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],3:2
usb5/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],4:1
usb5/2                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],4:2
usb6/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:1
usb6/2                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:2
usb6/3                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:3
usb6/4                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:4
usb6/5                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:5
usb6/6                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:6
usb6/7                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:7
usb6/8                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:8
usb6/9                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:9
usb6/10                        empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],5:10
usb7/1                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],1:1
usb7/2                         empty        unconfigured ok
unavailable  unknown      n        /devices/[EMAIL PROTECTED],0/pci1458,[EMAIL 
PROTECTED],1:2

You'll notice that the only thing listed is my USB mouse... is that expected?


> You'll also find messages in /var/adm/messages which
> might prove
> useful to review.

If you really want, I can list the output from /var/adm/messages, but it 
doesn't seem to add anything new to what I've already copied and pasted.
 
> First and foremost, for me, this is a stupid thing to
> do. You've got common-or-garden PC hardware which almost
> *definitely* does not support hot plug of devices. Which is what you're
> telling us that you're doing. Would try this with your pci/pci-e
> cards in this system? I think not.

I would if I had some sort of set-up that supposedly promised me redundant 
PCI/PCI-E cards... You might think it's stupid, but how else could one be sure 
that the back-up PCI/PCI-E card would take over when the primary one died?

Unplugging one of them seems like a fine test to me - It's definitely the worst 
case scenario, and if the rig survives that, then I _know_ I would be able to 
rely on it for redundancy should one of the cards fail (which would most likely 
occur in a less spectacular fashion than a quick yank anyways)

> If you absolutely must do something like this, then
> please use what's known as "coordinated hotswap" using the
> cfgadm(1m) command.
> 
> 
> Viz:
> 
> (detect fault in disk c2t3d0, in some way)
> 
> # cfgadm -c unconfigure c2::dsk/c2t3d0
> # cfgadm -c disconnect c2::dsk/c2t3d0
> 
> (go and swap the drive, plugin new drive with same
> cable)
> 
> # zpool replace -f poolname c2t3d0
> 
> 
> What this will do is tell the kernel to do things in
> the right order, and - for zpool - tell it to do an
> in-place replacement of device c2t3d0 in your pool.

Thanks for the command listings - they'll certainly prove useful if I should 
ever find myself in a situation where I have to manually swap a disk like you 
described. Unfortunately though, I'm with Miles Nordin (see below) on this one 
- I don't want to warn OpenSolaris of what I'm about to do... That would defeat 
the purpose of the test. Even with technologies (like S.M.A.R.T.) that are 
designed to give you a bit of a heads-up, as Heikki Suonsivu and Google have 
noted, they're not very reliable at all 
(research.google.com/archive/disk_failures.pdf).

And I want this test to be as rough as it gets. I don't want to play nice with 
this system... I want to drag it through the most tortuous worst-case scenario 
tests I can imagine, and if it survives with all my test data intact, then (and 
only then) will I begin to trust it.

> http://docs.sun.com/app/docs/coll/40.17 (manpages)
> http://docs.sun.com/app/docs/coll/47.23 (system admin collection)
> http://docs.sun.com/app/docs/doc/817-2271 ZFS admin guide
> http://docs.sun.com/app/docs/doc/819-2723 devices + filesystems guide

Oohh... Thank you. Good Links. I'm bookmarking these for future reading. 
They'll definitely be helpful if we end up choosing to deploy OpenSolaris + ZFS 
for our media servers.

-Todd
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to