Re: [zfs-discuss] repost - high read iops

Erik Trimble Tue, 29 Dec 2009 11:16:04 -0800

Eric D. Mudama wrote:

On Tue, Dec 29 at  9:16, Brad wrote:
The disk cost of a raidz pool of mirrors is identical to the disk cost
of raid10.

ZFS can't do a raidz of mirrors or a mirror of raidz. Members of amirror or raidz[123] must be a fundamental device (i.e. file or drive)

"This winds up looking similar to RAID10 in layout, in that you're
striping across a lot of disks that each consists of a mirror, though
the checksumming rules are different. Performance should also be
similar, though it's possible RAID10 may give slightly better random
read performance at the expense of some data quality guarantees, since
I don't believe RAID10 normally validates checksums on returned data
if the device didn't return an error. In normal practice, RAID10 and
a pool of mirrored vdevs should benchmark against each other within
your margin of error."

That's interesting to know that with ZFS's implementation of raid10
it doesn't have checksumming built-in.


I don't believe I said this.  I am reasonably certain that all
zpool/zfs layouts validate checksums, even if built with no
redundancy.  The "RAID10-similar" layout in ZFS is an array of
mirrors, such that you build a bunch of 2-device mirrored vdevs, and
add them all into a single pool.  You wind up with a layout like:



Yes. PLEASE be careful - checksumming and redundancy are DIFFERENT concepts.

In ZFS, EVERYTHING is checksummed - the data blocks, and the metadata.This is separate from redundancy. Regardless of the zpool layout(mirrors, raidz, or no redundancy), ZFS stores a checksum of all objects- this checksum is used to determine if the object has been corrupted.This check is done on any /read/

Should the checksum determine that the object is corrupt, then there aretwo things that can happen: if your zpool has some form of redundancyfor that object, ZFS will then reread the object from the redundant sideof the mirror, or reconstruct the data using parity. It will thenre-write the object to another place in the zpool, and eliminate the"bad" object. Else, if there is no redundancy, then it will fail toreturn the data, and log an error message to the syslog.

In the case of metadata, even in a non-redundant zpool, some of thatmetadata is stored multiple times, so there is the possibility that youwill be able to recover/reconstruct some metadata which fails checksumming.

In short, Checksumming is how ZFS /determines/ data corruption, andRedundancy is how ZFS /fixes/ it. Checksumming is /always/ present,while redundancy depends on the pool layout and options (cf. "copies"property).




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] repost - high read iops

Reply via email to