Re: [zfs-discuss] Raidz vdev size... again.

Miles Nordin Tue, 28 Apr 2009 13:55:06 -0700

>>>>> "kn" == Kees Nuyt <k.n...@zonnet.nl> writes:


    kn> Some high availablility storage systems overcome this decay by
    kn> not just reading, but also writing all blocks during a
    kn> scrub. 

sounds like a good idea but harder in the ZFS model where the software
isn't the proprietary work of the only permitted integrator.

 * it'd be harmful to do this on SSD's.  it might also be a really
   good idea to do it on SSD's.  who knows yet.

 * optimizing the overall system depends on intimate knowledge of, and
   control over the release binding of, drive firmware and its
   errata/quirks/decisions

   * it may be wasteful to do read/rewrite on an ordinary magnetic
     drive because if you just do a read, the drive should notice a
     decaying block and rewrite it without being told specifically,
     maybe.  though from netapp's paper, they say they disable many of
     these features in their SCSI drives, including bad block
     remapping, and delegate them to the layer of their own software
     right above the drive

   * there's an ``offline self test'' in SMART where the drive is
     supposed to scrub itself, possibly including badblock remapping
     and marginal sector rewriting.  If this feature worked it could
     possibly accomplish scrubs with better QoS (less interference to
     real read/writes) and no controller-to-storage bandwidth wastage,
     compared to actually reading and rewriting through the
     controller, or possibly several layers above the controller
     through fanouts and such.

   * drives with caches may suppress overwrites to sectors containing
     what the cache says is already in those sectors.  I guess I heard
     on this list that SCSI has commands to ignore the cache for read
     and other commands to bypass it for write, but not SATA, or the
     commands could be broken because no one else uses them.  You have
     to have some business relationship with the drive company before
     they will admit what their proprietary firmware really does, much
     less alter it to your wishes, even if your wish is merely that it
     complies, or behaves like it did yesterday.  Every tiny piece of
     software that remains proprietary eventually turns into a blob
     that does someone else's bidding and fucks with you.

In the end, though, I bet we may end up with this feature on ZFS in
the disguise of a ``defragmenter''.  If the defragmenter will promise
to rewrite every block to a new spot, not jhust the ones it pleases,
this will do the job of your ``write scrub'' and also solve the drive
caching problem.

    kn> In those systems, scrubbing is done semi-continously in the
    kn> background, not on user/admin demand.

which ones?  name names. :) I thought netapp's two papers said they
are doing it ``every Sunday'' or something.

but, yeah, asking the admin to initiate it manually means if it makes
the array uselessly slow you blame the admin rather than the software
stack.  linux ubifs (NAND flash) scrubs are also mandatory/unsupervised.

pgpACOKK377Hd.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Raidz vdev size... again.

Reply via email to