Re: [OpenIndiana-discuss] Zfs stability "Scrubs"

Jim Klimov Sat, 13 Oct 2012 06:02:08 -0700

2012-10-13 7:26, Michael Stapleton wrote:

The VAST majority of data centers are not storing data in storage that
does checksums to verify data, that is just the reality. Regular backups
and site replication rule.


And this actually concerns me... we help maintain some deployments
built by customers including professional arrays like Sun Storagetek
6140 serving a few LUNs to directly attached servers (so it happens).

The arrays are black boxes to us - we don't know if they use
something block-checksummed similar to ZFS inside, or can only
protect against whole-disk failures, when a device just stops
responding?

We still have little idea - in what config would the data be
safer to hold a ZFS pool, and which should give more performance:
* if we use the array with its internal RAID6, and the client
  computer makes a pool over the single LUN
* a couple of RAID6 array boxes in a mirror provided by arrays'
  firmware (independently of client computers, who see a MPxIO
  target LUN), and the computer makes a pool over the single
  multi-pathed LUN
* a couple of RAID6 array boxes in a mirror provided by ZFS
  (two independent LUNs mirrored by computer)
* serve LUNs from each disk in JBOD manner from the one or two
  arrays, and have ZFS construct pools over that.

Having expensive hardware RAIDs (anyway available on customer's
site) serving as JBODs is kind of overkill - any well-built JBOD
costing a fraction of this array could suffice. But regarding
data integrity known to be provided by ZFS and unknown to be
really provided by black-box appliances, downgrading the arrays
to JBODs might be better. Who knows?.. (We don't, advice welcome).



There are several more things to think about:

1) Redundant configs without knowledge of which side of the mirror
   is good, or what permutation of RAID blocks yields the correct
   answer, is basically useless, and it can propagate errors by
   overwriting an unknownly-good copy of the data with unknownly-
   corrupted one.

   For example, take a root mirror. You find that your OS can't
   boot. You can try to split the mirror into two separate disks,
   fsck each of them and if one is still correct, recreate the
   mirror using it as base (first half). Even if both disks give
   some errors, these might be in different parts of the data, so
   you have a chance of reconstructing the data using these two
   halves and/or backups. However, if your simplistic RAID just
   copies data from disk1 to disk2 in case of any discrepancies
   and unclean shutdowns, you're roughly 50% likely to corrupt a
   good disk2 with bad data from disk1.

   This setup assumed that bit-rot never occurred or was too rare,
   bus/RAM errors never happened or were ruled out by CRC/ECC,
   and instead disks died altogether, instantly becoming bricks
   (which could be quite true in the old days, and can still be
   probable with expensive enterprise hardware). Basically, this
   assumed that data written from a process was the same data that
   hit the disk platters and the same data that was returned upon
   reads (unless an IO error/deviceMissing were reported) - in that
   case old RAIDs could indeed propagate assumed-good data onto
   replacement disk(s) during reconstruction of the array.

2) Backups and replicas without means to verify them (checksums
   or at least three-way comparisons at some level) are also
   tainted, because you don't really know if what you read from
   them ever matches what you wrote to them (perhaps several years
   ago, counting from the moment the data was written onto RAID
   originally).

My few cents,
//Jim

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Zfs stability "Scrubs"

Reply via email to