Re: [zfs-discuss] Errors on mirrored drive

Toby Thain Tue, 26 May 2009 08:42:39 -0700


On 25-May-09, at 11:16 PM, Frank Middleton wrote:

On 05/22/09 21:08, Toby Thain wrote:

Yes, the important thing is to *detect* them, no system can runreliably
with bad memory, and that includes any system with ZFS. Doing nutty
things like calculating the checksum twice does not buy anything of
value here.


All memory is "bad" if it doesn't have ECC. There are only varying
degrees of badness. Calculating the checksum twice on its own would
be nutty, as you say, but doing so on a separate copy of the data
might prevent unrecoverable errors

I don't see this at all. The kernel reads the application buffer. Howdoes reading it twice buy you anything?? It sounds like you areassuming 1) the buffer includes faulty RAM; and 2) the faulty RAMreads differently each time. Doesn't that seem statistically unlikelyto you? And even if you really are chasing this improbable scenario,why make ZFS do the job of a memory tester?

after writes to mirrored drives.
You can't detect memory errors if you don't have ECC. But you can
try to mitigate them. Without doing so makes ZFS less reliable than
the memory it is running on. The problem is that ZFS makes any file
with a bad checksum inaccessible, even if one really doesn't care
if the data has been corrupted. A workaround might be a way to allow
such files to be readable despite the bad checksum...


I am not sure what you are trying to say here.

...
How can a machine with bad memory "work fine with ext3"?
It does. It works fine with ZFS too. Just really annoyingunrecoverablefiles every now and then on mirrored drives. This shouldn't happeneven
with lousy memory and wouldn't (doesn't) with ECC. If there was a way
to examine the files and their checksums, I would be surprised if they
were different (If they were, it would almost certainly be thecontroller
or the PCI bus itself causing the problem). But I speculate that it is
predictable memory hits.

You're making this harder than it really is. Run a memory test. If itfails, take the machine out of service until it's fixed. There's noreasonable way to keep running faulty hardware.


--Toby


-- Frank


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Errors on mirrored drive

Reply via email to