On Jul 24, 2009, at 3:18 AM, Michael McCandless wrote:
I've read in numerous threads that it's important to use ECC RAM in a ZFS file server.
It is important to use ECC RAM. The embedded market and server market demand ECC RAM. It is only the el-cheapo PC market that does not. Going back to some of the early studies by IBM on errors in PC memory, it is really a shame that the market has not moved on.
My question is: is there any technical reason, in ZFS's design, that makes it particularly important for ZFS to require ECC RAM?
No.
Is ZFS especially vulnerable, moreso than other filesystems, to bit errors in RAM?
No. Except that ZFS actual does check data integrity. So ZFS can detect if you had a problem. Other file systems can be blissfully ignorant of data corruption.
For example, if the wrong bit flips at the wrong time, could I lose my entire RAID-Z pool instead of, say, corrupting one file's contents or metadata? Is there such a possibility?
Not likely, but I don't think anyone has done such low-level analysis to prove it.
(Assume the rest of the hardware stack "behaves", eg an fsync to the drive won't return until the bytes are written to stable storage). I had assumed that a bit error from RAM would only have a localized effect (eg, corrupt the contents or metadata of file or directory) each time it "struck", but now I'm wondering if the failure could be global because of something in ZFS's design, and that's why the recommendation for ECC RAM is always so "strong".
IMHO, the reason this gets discussed on zfs-discuss so frequently is because ZFS detects data corruption and people start to speculate about the source. NB many hard disk drives and controllers have only parity protected memory. So even if your main memory is ECC, it is unlikely that the entire data path is ECC protected.
Some of the posts in this thread ("Another user loses his pool..."): http://opensolaris.org/jive/thread.jspa?threadID=108213&tstart=0 make me think ZFS may in fact "require" ECC RAM.
The root cause of this thread's woes have absolutely nothing to do with ECC RAM. It has everything to do with VirtualBox configuration. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss