Mark Grant wrote:
Yeah, this is my main concern with moving from my cheap Linux server with no 
redundancy to ZFS RAID on OpenSolaris; I don't really want to have to pay twice 
as much to buy the 'enterprise' disks which appear to be exactly the same 
drives with a flag set in the firmware to limit read retries, but I also don't 
want to lose all my data because a sector fails and the drive hangs for a 
minute trying to relocate it, causing the file system to fall over.

I haven't found a definitive answer as to whether this will kill a ZFS RAID 
like it kills traditional hardware RAID or whether ZFS will recover after the 
drive stops attempting to relocate the sector. At least with a single drive 
setup the OS will eventually get an error response and the other files on the 
disk will be readable when I copy them over to a new drive.

The issue is excessive error recovery times INTERNAL to the hard drive. So, worst case scenario is that ZFS marks the drive as "bad" during a write, causing the zpool to be degraded. It's not going to lose your data. It just may case a "premature" marking of a drive as bad. None of this kills a RAID (ZFS, traditional SW Raid, or HW Raid). It doesn't cause data corruption. The issue is sub-optimal disk fault determination.

If you suspect that the drive really isn't bad, you can simply re-add it back into the zpool, and have it resilvered, which should take considerably less time than a full-drive resilver.


That said, if your drive really is taking 10-15 seconds to remap bad sectors, maybe you _should_ replace it.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to