Re: [zfs-discuss] The importance of ECC RAM for ZFS

Ian Collins Fri, 24 Jul 2009 16:06:08 -0700

Frank Middleton wrote:

On 07/24/09 04:35 PM, Bob Friesenhahn wrote:

 Regardless, it [VirtualBox] has committed a crime.

But ZFS is a journalled file system!

Even a journalled file system has to trust the journal. If the storagesays the journal is committed and its isn't, all bets are off.

The issue we see here with ZFS appears to be the lack of a means ofrewinding to a known sane state when this happens.

The whole question of the requirement for ECC depends on your
tolerance for loss of files vs. errors in files. As Richard
Elling points out, there are other sources of error (e.g.,
no checking of PCI parity). But that isn't relevant to the ECC
on main memory question. You can disable checksumming, and then
ZFS is no worse in this regard than any other file system; bad
files get read and you either notice or you don't, but you won't
lose any because of fatal checksum errors and you still have all
the other great features of ZFS,

That's probably the root of the issues we see here, ZFS does a great jobof telling you when something is irrevocably broken, but doesn't (yet)offer a means of fixing the problem. I guess ZFS is a bit like a singlebit parity scheme that reports, but does not correct (gross) errors.When these are used in an on the wire protocol bad packets can either bedropped or retransmitted. With a file system, only the former option isavailable, the original is lost.

Transmission protocols are always designed to manage data errors.Filesystems have traditionally been designed to ignore them, assumingthe round trip from CPU to storage and back is 100% reliable. ZFS haschanged the rules.


--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] The importance of ECC RAM for ZFS

Reply via email to