On 05/05/10 10:56, Harald Schmalzbauer wrote:
Harald Schmalzbauer schrieb am 05.05.2010 14:41 (localtime):
Hello,
one drive of my mirror failed today, but 'zpool staus' shows it
"online".
Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1'
hangs infinitely.
...
Sorry, I made an error with zpool create. Somehow the little word
"mirror" must have been lost. So the pool wasn't a mirror but a
stripe. Then of course I can't make one vdev offline. Sorry for the
noise.
But I took the opportunity to do some tests with that failing drive
and created a _real_ mirror. That works without failures, but using
the mirror again leads to:
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ata3: port is not ready (timeout 10000ms) tfd = 00000080
ata3: hardware reset timeout
ad1: FAILURE - device detached
Now zpool reporsts the vdev ad1 still online although it has been
detached and 'atacontrol list' doesn't show it anymore:
zpool status
pool: URUBAmirrorP1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
URUBAmirrorP1 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad1 ONLINE 3 302K 0
ad2 ONLINE 0 0 0
errors: No known data errors
atacontrol list
ATA channel 2:
Master: ad0 <TRANSCEND/20090520> SATA revision 1.x
Slave: no device present
ATA channel 3:
Master: no device present
Slave: no device present
ATA channel 4:
Master: ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x
Slave: no device present
ATA channel 5:
Master: ad3 <ST3750640NS/3.AEG> SATA revision 1.x
Slave: no device present
How should such a failure be handled?
Do I have to manually mark the drive offline for zpool?
Thanks,
-Harry
You may want to try newer controller drivers like ahci(4) if possible.
Otherwise, building the kernel with ATA_CAM may accomplish something
similar. I'm not sure, but I'm speculating that the newer ATA/CAM
system may feed the proper notifications back to the ZFS systems.
I use many drives on the siis(4) driver, which is CAM-enabled, and
haven't had any issues. However, I have not had an outright drive
failure. I do recall testing situations where we would yank a working
drive, and I seem to remember it working correctly after the last set of
CAM improvements.
It may not be something you can try on a production system, but if you
can experiment, it's worth a shot. Note that your device names WILL
change to adaX instead of adX. I would definitely recommend you
glabel(8) and create the zpool/zdevs using the glabel devices instead to
circumvent any future problems associated with device numbering.
Steve
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"