On 05/05/10 10:56, Harald Schmalzbauer wrote:
Harald Schmalzbauer schrieb am 05.05.2010 14:41 (localtime):
Hello,

one drive of my mirror failed today, but 'zpool staus' shows it "online". Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1' hangs infinitely.
...
Sorry, I made an error with zpool create. Somehow the little word "mirror" must have been lost. So the pool wasn't a mirror but a stripe. Then of course I can't make one vdev offline. Sorry for the noise. But I took the opportunity to do some tests with that failing drive and created a _real_ mirror. That works without failures, but using the mirror again leads to:
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ata3: port is not ready (timeout 10000ms) tfd = 00000080
ata3: hardware reset timeout
ad1: FAILURE - device detached

Now zpool reporsts the vdev ad1 still online although it has been detached and 'atacontrol list' doesn't show it anymore:

zpool status
  pool: URUBAmirrorP1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        URUBAmirrorP1  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad1     ONLINE       3  302K     0
            ad2     ONLINE       0     0     0

errors: No known data errors

atacontrol list
ATA channel 2:
    Master:  ad0 <TRANSCEND/20090520> SATA revision 1.x
    Slave:       no device present
ATA channel 3:
    Master:      no device present
    Slave:       no device present
ATA channel 4:
    Master:  ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x
    Slave:       no device present
ATA channel 5:
    Master:  ad3 <ST3750640NS/3.AEG> SATA revision 1.x
    Slave:       no device present

How should such a failure be handled?
Do I have to manually mark the drive offline for zpool?

Thanks,

-Harry

You may want to try newer controller drivers like ahci(4) if possible. Otherwise, building the kernel with ATA_CAM may accomplish something similar. I'm not sure, but I'm speculating that the newer ATA/CAM system may feed the proper notifications back to the ZFS systems.

I use many drives on the siis(4) driver, which is CAM-enabled, and haven't had any issues. However, I have not had an outright drive failure. I do recall testing situations where we would yank a working drive, and I seem to remember it working correctly after the last set of CAM improvements.

It may not be something you can try on a production system, but if you can experiment, it's worth a shot. Note that your device names WILL change to adaX instead of adX. I would definitely recommend you glabel(8) and create the zpool/zdevs using the glabel devices instead to circumvent any future problems associated with device numbering.

Steve
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to