>>>>> "bh" == Brandon High <bh...@freaks.com> writes:

    bh> Recent versions no longer support enabling TLER or ERC.  To
    bh> the best of my knowledge, Samsung and Hitachi drives all
    bh> support CCTL, which is yet another name for the same thing.

once again, I have to ask, has anyone actually found these features to
make a verified positive difference with ZFS?

Some of those things you cannot even set on Solaris because the
channel to the drive with a LSI controller isn't sufficiently
transparent to support smartctl, and the settings don't survive
reboots.  Brandon have you actually set it yourself, or are you just
aggregating forum discussion?

The experience so far that I've read here has been:

 * if a drive goes bad completely

   + zfs will mark the drive unavailable after a delay that depends on
     the controller you're using, but with lengths like 60 seconds,
     180 seconds, 2 hours, or forever.  The delay is not sane or
     reasonable with all controllers, and even if redundancy is
     available ZFS will patiently wait for the controller.  The delay
     depends on the controller driver.  It's part of the Solaris code.
     best case zpool will freeze until the delay is up, but there are
     application timeouts and iSCSI initiator-target timeouts,
     too---getting the equivalent of an NFS hard mount is hard these
     days (even with NFS, in some people's experiences).

   + the delay is different if the system's running when the drive
     fails, or if it's trying to boot up.  For example iSCSI will
     ``patiently wait'' forever for a drive to appear while booting
     up, but will notice after 180 seconds while running.

   + because the disk is compeltely bad, TLER, ERC, CCTL, whatever you
     call it, doesn't apply.  The drive might not answer commands
     ever, at all.  The timer is not in the drive: the drive is bad
     starting now, continuing forever.

 * if a drive goes partially bad (large and increasing numbers of
   latent sector errors, which for me happens more often than
   bad-completely):

   + the zpool becomes unusably slow

   + it stays unusably slow until you use 'iostat' or 'fmdump' to find
     the marginal drive and offline it

   + TLER, ERC, CCTL makes the slowness factor 7ms : 7000ms vs 
     7ms : 30000ms.  In other words, it's unusably slow with or
     without the feature.

AFAICT the feature is useful as a workaround for buggy RAID card
firmware and nothing else.  It's a cost differentiator, and you're
swallowing it hook, line and sinker.

If you know otherwise please reinform me, but the discussion here so
far doesn't match what I've learned about ZFS and Solaris exception
handling.

That said, to reword Don Marti, ``uninformed Western Digital bashing
is better than no Western Digital bashing at all.''

Attachment: pgpFMSCuYt2qE.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to