On Jul 24, 2009, at 3:18 AM, Michael McCandless wrote:

I've read in numerous threads that it's important to use ECC RAM in a
ZFS file server.

It is important to use ECC RAM.  The embedded market and
server market demand ECC RAM. It is only the el-cheapo PC
market that does not. Going back to some of the early studies
by IBM on errors in PC memory, it is really a shame that the
market has not moved on.

My question is: is there any technical reason, in ZFS's design, that
makes it particularly important for ZFS to require ECC RAM?

No.

Is ZFS especially vulnerable, moreso than other filesystems, to bit
errors in RAM?

No.  Except that ZFS actual does check data integrity. So ZFS can
detect if you had a problem.  Other file systems can be blissfully
ignorant of data corruption.

For example, if the wrong bit flips at the wrong time, could I lose my
entire RAID-Z pool instead of, say, corrupting one file's contents or
metadata?  Is there such a possibility?

Not likely, but I don't think anyone has done such low-level
analysis to prove it.

(Assume the rest of the hardware stack "behaves", eg an fsync to the
drive won't return until the bytes are written to stable storage).

I had assumed that a bit error from RAM would only have a localized
effect (eg, corrupt the contents or metadata of file or directory)
each time it "struck", but now I'm wondering if the failure could be
global because of something in ZFS's design, and that's why the
recommendation for ECC RAM is always so "strong".

IMHO, the reason this gets discussed on zfs-discuss so frequently
is because ZFS detects data corruption and people start to
speculate about the source.

NB many hard disk drives and controllers have only parity protected
memory. So even if your main memory is ECC, it is unlikely that the
entire data path is ECC protected.


Some of the posts in this thread ("Another user loses his pool..."):

 http://opensolaris.org/jive/thread.jspa?threadID=108213&tstart=0

make me think ZFS may in fact "require" ECC RAM.

The root cause of this thread's woes have absolutely nothing to
do with ECC RAM. It has everything to do with VirtualBox configuration.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to