Ian Collins wrote: > Al Hopper wrote: > >> On Sat, Nov 15, 2008 at 9:26 AM, Richard Elling <[EMAIL PROTECTED]> wrote: >> >> >>> dick hoogendijk wrote: >>> >>> >>>> On Sat, 15 Nov 2008 18:49:17 +1300 >>>> Ian Collins <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> >>>> >>>>> [EMAIL PROTECTED] wrote: >>>>> >>>>> >>>>> >>>>>> > WD Caviar Black drive [...] Intel E7200 2.53GHz 3MB L2 >>>>>> > The P45 based boards are a no-brainer >>>>>> >>>>>> 16G of DDR2-1066 with P45 or >>>>>> 8G of ECC DDR2-800 with 3210 based boards >>>>>> >>>>>> That is the question. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> I guess the answer is how valuable is your data? >>>>> >>>>> >>>>> >>>> I disagree. The answer is: go for the 16G and make backups. The 16G >>>> system will work far more "easy" and I may be lucky but in the past >>>> years I did not have ZFS issues with my non-ECC ram ;-) >>>> >>>> >>>> >>> You are lucky. I recommend ECC RAM for any data that you care >>> about. Remember, if there is a main memory corruption, that may >>> impact the data that ZFS writes which will negate any on-disk >>> redundancy. And yes, this does occur -- check the archives for the >>> tales of woe. >>> >>> >> I agree with your recommendation Richard. OTOH I've built/used a >> bunch of systems over several years that were mostly non ECC equipped >> and only lost one DIMM along the way. So I guess I've been lucky also >> - but IMHO the failure rate for RAM these days is pretty small[1]. >> I've also been around hundreds of SPARC boxes and, again, very, few >> RAM failures (one is all that I can remember). >> >> >> > I think the situation will change with the current expansion in RAM > sizes. Five years ago with mainly 32 bit x86 systems, 4G of ram was a > lot (even on most Sparc boxes). Today 32 and 64GB are becoming common. > Desktop systems have seen similar growth. >
Let's do some math. A generally accepted Soft Error Rate (SER) for DRAMs is 1,000 FITs or an Annualized Failure Rate (AFR) of 0.88%. If a non-ECC DIMM has 8 chips then your AFR is 7%, or 14% for 16 chip DIMMs. My desktop has 4 DIMMs at 16-chips each, so I should expect an AFR of 56%. Since these are soft errors, a RAM test program may not detect it. ECC will dramatically reduce the system-level effects of SERs. Extended ECC will further reduce this by about 2 orders of magnitude. > ZFS also uses system RAM in a way it hasn't been used before. Memory > that would have been unused or holding static pages is now churning > rapidly, in a way similar memory testers like memtest86. Random patterns > are cycling though RAM like never before, greatly increasing the chances > for hitting a bad bit or addressing error. I've had RAM faults that > have taken hours with memtest86 to hit the trigger bit pattern that > would have gone unnoticed for years if I hadn't seen data corruption > with ZFS. > > ZFS may turn out to be the ultimate RAM soak tester! > :-) no, not really. SERs are more of a problem for idle DRAM because the probability of a SER affecting you is a function of the time the data has been sitting in RAM waiting to be affected by upsets. Note: there are some studies suggesting a correlation between SERs and hard faults. In practice, it doesn't really matter why or how the fault occurred, the solution is ECC, Extended ECC, or memory mirroring. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss