On 14-Nov-07, at 7:06 AM, can you guess? wrote:

> ...
>
>>>> And how about FAULTS?
>>>> hw/firmware/cable/controller/ram/...
>>>
>>> If you had read either the CERN study or what I
>> already said about
>>> it, you would have realized that it included the
>> effects of such
>>> faults.
>>
>>
>> ...and ZFS is the only prophylactic available.
>
> You don't *need* a prophylactic if you're not having sex:  the CERN  
> study found *no* clear instances of faults that would occur in  
> consumer systems and that could be attributed to the kinds of  
> errors that ZFS can catch and more conventional file systems can't.

Hmm, that's odd, because I've certainly had such faults myself. (Bad  
RAM is a very common one, that nobody even thinks to check.)

--Toby

>   It found faults in the interaction of its add-on RAID controller  
> (not a normal 'consumer' component) with its WD disks, it found  
> single-bit errors that appeared to correlate with ECC RAM errors  
> (i.e., likely occurred in RAM rather than at any point where ZFS  
> would be involved), it found block-sized errors that appeared to  
> correlate with misplaced virtual memory allocation (again, outside  
> ZFS's sphere of influence).
>
>>
>>
>>>
>>> ...
>>>
>>>>>  but I had a box that was randomly
>>>>>> corrupting blocks during
>>>>>> DMA.  The errors showed up when doing a ZFS
>> scrub
>>>> and
>>>>>> I caught the
>>>>>> problem in time.
>>>>>
>>>>> Yup - that's exactly the kind of error that ZFS
>> and
>>>> WAFL do a
>>>>> perhaps uniquely good job of catching.
>>>>
>>>> WAFL can't catch all: It's distantly isolated from
>>>> the CPU end.
>>>
>>> WAFL will catch everything that ZFS catches,
>> including the kind of
>>> DMA error described above:  it contains validating
>> information
>>> outside the data blocks just as ZFS does.
>>
>> Explain how it can do that, when it is isolated from
>> the application
>> by several layers including the network?
>
> Darrell covered one aspect of this (i.e., that ZFS couldn't either  
> if it were being used in a server), but there's another as well:   
> as long as the NFS messages between client RAM and server RAM are  
> checksummed in RAM on both ends, then that extends the checking all  
> the way to client RAM (the same place where local ZFS checks end)  
> save for any problems occurring *in* RAM at one end or the other  
> (and ZFS can't deal with in-RAM problems either:  all it can do is  
> protect the data until it gets to RAM).
>
> - bill
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to