Ian Collins wrote:
> Al Hopper wrote:
>   
>> On Sat, Nov 15, 2008 at 9:26 AM, Richard Elling <[EMAIL PROTECTED]> wrote:
>>   
>>     
>>> dick hoogendijk wrote:
>>>     
>>>       
>>>> On Sat, 15 Nov 2008 18:49:17 +1300
>>>> Ian Collins <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>>       
>>>>         
>>>>> [EMAIL PROTECTED] wrote:
>>>>>
>>>>>         
>>>>>           
>>>>>>  > WD Caviar Black drive [...] Intel E7200 2.53GHz 3MB L2
>>>>>>  > The P45 based boards are a no-brainer
>>>>>>
>>>>>> 16G of DDR2-1066 with P45 or
>>>>>>   8G of ECC DDR2-800 with 3210 based boards
>>>>>>
>>>>>> That is the question.
>>>>>>
>>>>>>
>>>>>>           
>>>>>>             
>>>>> I guess the answer is how valuable is your data?
>>>>>
>>>>>         
>>>>>           
>>>> I disagree. The answer is: go for the 16G and make backups. The 16G
>>>> system will work far more "easy" and I may be lucky but in the past
>>>> years I did not have ZFS issues with my non-ECC ram ;-)
>>>>
>>>>       
>>>>         
>>> You are lucky.  I recommend ECC RAM for any data that you care
>>> about.  Remember, if there is a main memory corruption, that may
>>> impact the data that ZFS writes which will negate any on-disk
>>> redundancy.  And yes, this does occur -- check the archives for the
>>> tales of woe.
>>>     
>>>       
>> I agree with your recommendation Richard.  OTOH I've built/used a
>> bunch of systems over several years that were mostly non ECC equipped
>> and only lost one DIMM along the way.  So I guess I've been lucky also
>> - but IMHO the failure rate for RAM these days is pretty small[1].
>> I've also been around hundreds of SPARC boxes and, again, very, few
>> RAM failures (one is all that I can remember).
>>
>>   
>>     
> I think the situation will change with the current expansion in RAM
> sizes.  Five years ago with mainly 32 bit x86 systems, 4G of ram was a
> lot (even on most Sparc boxes).  Today 32 and 64GB are becoming common. 
> Desktop systems have seen similar growth.
>   

Let's do some math.  A generally accepted Soft Error Rate (SER) for 
DRAMs is
1,000 FITs or an Annualized Failure Rate (AFR) of 0.88%.  If a non-ECC DIMM
has 8 chips then your AFR is 7%, or 14% for 16 chip DIMMs.  My desktop
has 4 DIMMs at 16-chips each, so I should expect an AFR of 56%.  Since these
are soft errors, a RAM test program may not detect it.

ECC will dramatically reduce the system-level effects of SERs.  Extended ECC
will further reduce this by about 2 orders of magnitude.

> ZFS also uses system RAM in a way it hasn't been used before.  Memory
> that would have been unused or holding static pages is now churning
> rapidly, in a way similar memory testers like memtest86. Random patterns
> are cycling though RAM like never before, greatly increasing the chances
> for hitting a bad bit or addressing error.  I've had RAM faults that
> have taken hours with memtest86 to hit the trigger bit pattern that
> would have gone unnoticed for years if I hadn't seen data corruption
> with ZFS.
>
> ZFS may turn out to be the ultimate RAM soak tester!
>   

:-)  no, not really.  SERs are more of a problem for idle DRAM because the
probability of a SER affecting you is a function of the time the data 
has been
sitting in RAM waiting to be affected by upsets.

Note: there are some studies suggesting a correlation between SERs and
hard faults.  In practice, it doesn't really matter why or how the fault
occurred, the solution is ECC, Extended ECC, or memory mirroring.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to