Mark Grant wrote:
Yeah, this is my main concern with moving from my cheap Linux server with no
redundancy to ZFS RAID on OpenSolaris; I don't really want to have to pay twice
as much to buy the 'enterprise' disks which appear to be exactly the same
drives with a flag set in the firmware to limit read retries, but I also don't
want to lose all my data because a sector fails and the drive hangs for a
minute trying to relocate it, causing the file system to fall over.
I haven't found a definitive answer as to whether this will kill a ZFS RAID
like it kills traditional hardware RAID or whether ZFS will recover after the
drive stops attempting to relocate the sector. At least with a single drive
setup the OS will eventually get an error response and the other files on the
disk will be readable when I copy them over to a new drive.
The issue is excessive error recovery times INTERNAL to the hard drive.
So, worst case scenario is that ZFS marks the drive as "bad" during a
write, causing the zpool to be degraded. It's not going to lose your
data. It just may case a "premature" marking of a drive as bad.
None of this kills a RAID (ZFS, traditional SW Raid, or HW Raid). It
doesn't cause data corruption. The issue is sub-optimal disk fault
determination.
If you suspect that the drive really isn't bad, you can simply re-add it
back into the zpool, and have it resilvered, which should take
considerably less time than a full-drive resilver.
That said, if your drive really is taking 10-15 seconds to remap bad
sectors, maybe you _should_ replace it.
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss