On 22-May-09, at 5:24 PM, Frank Middleton wrote:

There have been a number of threads here on the reliability of ZFS in the face of flaky hardware. ZFS certainly runs well on decent (e.g., SPARC) hardware, but isn't it reasonable to expect it to run well on something
less well engineered? I am a real ZFS fan, and I'd hate to see folks
trash it because it appears to be unreliable.

In an attempt to bolster the proposition that there should at least be
an option to buffer the data before checksumming and writing, we've
been doing a lot of testing on presumed flaky (cheap) hardware, with a
peculiar result - see below.

On 04/21/09 12:16, casper....@sun.com wrote:

And so what?  You can't write two different checksums; I mean, we're
mirroring the data so it MUST BE THE SAME. (A different checksum would be
wrong: I don't think ZFS will allow different checksums for different
sides of a mirror)

Unless it does a read after write on each disk, how would it know that
the checksums are the same? If the data is damaged before the checksum
is calculated then it is no worse than the ufs/ext3 case. If data +
checksum is damaged whilst the (single) checksum is being calculated,
or after, then the file is already lost before it is even written!
There is a significant probability that this could occur on a machine
with no ecc. Evidently memory concerns /are/ an issue

Yes, the important thing is to *detect* them, no system can run reliably with bad memory, and that includes any system with ZFS. Doing nutty things like calculating the checksum twice does not buy anything of value here.

If the memory is this bad then applications will be dying all over the place, compilers will be segfaulting, and databases will be writing bad data even before it reaches ZFS.

- this thread
http://opensolaris.org/jive/thread.jspa?messageID=338148 even suggests
including a memory diagnostic with the distribution CD (Fedora already
does so).

Absolutely, memory diags are essential. And you certainly run them if you see unexpected behaviour that has no other obvious cause.


Memory diagnostics just test memory. Disk diagnostics just test disks.
ZFS keeps disks pretty busy, so perhaps it loads the power supply
to the point where it heats up and memory glitches are more likely.

Your logic is rather tortuous. If the hardware is that crappy then there's not much ZFS can do about it.

It might also explain why errors don't really begin until ~15 minutes
after the busy time starts.

You might argue that this problem could only affect systems doing a
lot of disk i/o and such systems probably have ecc memory. But doing
an o/s install is the one time where a consumer grade computer does
a *lot* of disk i/o for quite a long time and is hence vulnerable.
Ironically,  the Open Solaris installer does not allow for ZFS
mirroring at install time, one time where it might be really important!
Now that sounds like a more useful RFE, especially since it would be
relatively easy to implement. Anaconda does it...

A Solaris install writes almost 4*10^10 bits. Quoting Wikipedia, look
at Cypress on ECC, see http://www.edn.com/article/CA454636.html.
Possibly, statistically likely random memory glitches could actually
explain the error rate that is occurring.

You are assuming that the error is the memory being modified after
computing the checksums; I would say that that is unlikely; I think it's a bit more likely that the data gets corrupted when it's handled by the disk controller or the disk itself. (The data is continuously re- written by
the DRAM controller)

See below for an example where a checksum error occurs without the
disk subsystem being involved. There seems to be no other plausible
explanation other than an improbable bug in X86 ZFS itself.

It would have been nice if we were able to recover the contents of the file; if you also know what was supposed to be there, you can diff and
then we can find out what was wrong.

"file" on those files resulted in "bus error". Is there a way to actually
read a file reported by ZFS as unrecoverable to do just that (and to
separately retrieve the copy from each half of the mirror)?

Maybe this should be a new thread, but I suspect the following
proves that the problem must be memory, and that begs the question
as to how memory glitches can cause fatal ZFS checksum errors.

Of course they can; but they will also break anything else on the machine.

...

If a memory that can pass diagnostics for 24 hours at a
stretch can cause glitches in huge datastreams, then IMO it
behooves ZFS to defend itself against them. Buffering disk
i/o on machines with no ECC seems like reasonably cheap
insurance against a whole class of errors, and could make
ZFS usable on PCs that, although they work fine with ext3,

How can a machine with bad memory "work fine with ext3"?

--Toby

fail annoyingly with ZFS. Ironically this wouldn't fix the
peculiar recv problem, which none-the-less seems to point
to memory glitches as a source of errors.

-- Frank






_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to