Re: [zfs-discuss] Errors on mirrored drive

Toby Thain Fri, 22 May 2009 18:09:45 -0700


On 22-May-09, at 5:24 PM, Frank Middleton wrote:

There have been a number of threads here on the reliability of ZFSin theface of flaky hardware. ZFS certainly runs well on decent (e.g.,SPARC)hardware, but isn't it reasonable to expect it to run well onsomething

less well engineered? I am a real ZFS fan, and I'd hate to see folks
trash it because it appears to be unreliable.

In an attempt to bolster the proposition that there should at least be
an option to buffer the data before checksumming and writing, we've
been doing a lot of testing on presumed flaky (cheap) hardware, with a
peculiar result - see below.

On 04/21/09 12:16, casper....@sun.com wrote:

And so what?  You can't write two different checksums; I mean, we're
mirroring the data so it MUST BE THE SAME. (A different checksumwould be
wrong: I don't think ZFS will allow different checksums for different
sides of a mirror)


Unless it does a read after write on each disk, how would it know that
the checksums are the same? If the data is damaged before the checksum
is calculated then it is no worse than the ufs/ext3 case. If data +
checksum is damaged whilst the (single) checksum is being calculated,
or after, then the file is already lost before it is even written!
There is a significant probability that this could occur on a machine
with no ecc. Evidently memory concerns /are/ an issue

Yes, the important thing is to *detect* them, no system can runreliably with bad memory, and that includes any system with ZFS.Doing nutty things like calculating the checksum twice does not buyanything of value here.

If the memory is this bad then applications will be dying all overthe place, compilers will be segfaulting, and databases will bewriting bad data even before it reaches ZFS.

- this thread
http://opensolaris.org/jive/thread.jspa?messageID=338148 even suggests
including a memory diagnostic with the distribution CD (Fedora already
does so).

Absolutely, memory diags are essential. And you certainly run them ifyou see unexpected behaviour that has no other obvious cause.


Memory diagnostics just test memory. Disk diagnostics just test disks.
ZFS keeps disks pretty busy, so perhaps it loads the power supply
to the point where it heats up and memory glitches are more likely.

Your logic is rather tortuous. If the hardware is that crappy thenthere's not much ZFS can do about it.

It might also explain why errors don't really begin until ~15 minutes
after the busy time starts.

You might argue that this problem could only affect systems doing a
lot of disk i/o and such systems probably have ecc memory. But doing
an o/s install is the one time where a consumer grade computer does
a *lot* of disk i/o for quite a long time and is hence vulnerable.
Ironically,  the Open Solaris installer does not allow for ZFS

mirroring at install time, one time where it might be reallyimportant!

Now that sounds like a more useful RFE, especially since it would be
relatively easy to implement. Anaconda does it...

A Solaris install writes almost 4*10^10 bits. Quoting Wikipedia, look
at Cypress on ECC, see http://www.edn.com/article/CA454636.html.
Possibly, statistically likely random memory glitches could actually
explain the error rate that is occurring.

You are assuming that the error is the memory being modified after
computing the checksums; I would say that that is unlikely; Ithink it's abit more likely that the data gets corrupted when it's handled bythe diskcontroller or the disk itself. (The data is continuously re-written by
the DRAM controller)


See below for an example where a checksum error occurs without the
disk subsystem being involved. There seems to be no other plausible
explanation other than an improbable bug in X86 ZFS itself.

It would have been nice if we were able to recover the contents ofthefile; if you also know what was supposed to be there, you can diffand
then we can find out what was wrong.

"file" on those files resulted in "bus error". Is there a way toactually

read a file reported by ZFS as unrecoverable to do just that (and to
separately retrieve the copy from each half of the mirror)?

Maybe this should be a new thread, but I suspect the following
proves that the problem must be memory, and that begs the question
as to how memory glitches can cause fatal ZFS checksum errors.

Of course they can; but they will also break anything else on themachine.

...

If a memory that can pass diagnostics for 24 hours at a
stretch can cause glitches in huge datastreams, then IMO it
behooves ZFS to defend itself against them. Buffering disk
i/o on machines with no ECC seems like reasonably cheap
insurance against a whole class of errors, and could make
ZFS usable on PCs that, although they work fine with ext3,


How can a machine with bad memory "work fine with ext3"?

--Toby

fail annoyingly with ZFS. Ironically this wouldn't fix the
peculiar recv problem, which none-the-less seems to point
to memory glitches as a source of errors.

-- Frank






_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Errors on mirrored drive

Reply via email to