Frank Middleton wrote:
On 07/24/09 04:35 PM, Bob Friesenhahn wrote:
 Regardless, it [VirtualBox] has committed a crime.

But ZFS is a journalled file system!
Even a journalled file system has to trust the journal. If the storage says the journal is committed and its isn't, all bets are off.

The issue we see here with ZFS appears to be the lack of a means of rewinding to a known sane state when this happens.

The whole question of the requirement for ECC depends on your
tolerance for loss of files vs. errors in files. As Richard
Elling points out, there are other sources of error (e.g.,
no checking of PCI parity). But that isn't relevant to the ECC
on main memory question. You can disable checksumming, and then
ZFS is no worse in this regard than any other file system; bad
files get read and you either notice or you don't, but you won't
lose any because of fatal checksum errors and you still have all
the other great features of ZFS,

That's probably the root of the issues we see here, ZFS does a great job of telling you when something is irrevocably broken, but doesn't (yet) offer a means of fixing the problem. I guess ZFS is a bit like a single bit parity scheme that reports, but does not correct (gross) errors. When these are used in an on the wire protocol bad packets can either be dropped or retransmitted. With a file system, only the former option is available, the original is lost.

Transmission protocols are always designed to manage data errors. Filesystems have traditionally been designed to ignore them, assuming the round trip from CPU to storage and back is 100% reliable. ZFS has changed the rules.

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to