On 07/24/09 04:35 PM, Bob Friesenhahn wrote:
Regardless, it [VirtualBox] has committed a crime.
But ZFS is a journalled file system! Any hardware can lose a flush; it's just more likely in a VM, especially when anything Microsoft is involved, and the whole point of journalling is to prevent things like this happening. However the issue is moot since CR 6667683 is being addressed. Here's a related thought - does it make sense to mirror ZFS on iscsi if the host drives are themselves ZFS mirrors? The whole question of the requirement for ECC depends on your tolerance for loss of files vs. errors in files. As Richard Elling points out, there are other sources of error (e.g., no checking of PCI parity). But that isn't relevant to the ECC on main memory question. You can disable checksumming, and then ZFS is no worse in this regard than any other file system; bad files get read and you either notice or you don't, but you won't lose any because of fatal checksum errors and you still have all the other great features of ZFS, If you don't mirror, all bets are off. You should set copies=2 or higher and cross your fingers. You should also disable file checksumming in ZFS and in that sense degenerate to the behavior of lesser file systems. However mirroring doesn't buy you much here because it evidently doesn't double buffer the write before calculating the checksum, so a stray bitflip can cause metatdata or data corruption, causing a mirrored file to have an unrecoverable checksum failure (of course there are many other reasons to mirror). The real question is - what is the probability of this occurring? IMO the typical SOHO user has a 1 in 10 to 1 in 100 chance of this happening in a year of reasonably constant operation (a few dozen writes/day). I believe that this can be mitigated by setting copies=2, a good idea anyway if you have biggish disks since, as Richard Elling has pointed out in his excellent blogs, if you need to resilver after a disk failure you have a rather large possibility of a disk read error causing file loss and copies=2 also mitigates this. Note that hopefully fixing CR 6667683 should eliminate any possibility of losing an entire mirrored or raidz pool. So, it seem to me ZFS has a definite dependency on ECC for reliable operation. However, for non-commercial uses (i.e., less than an hour or so a day of writes) the probability of losing a file is fairly small and can be mitigated still further by setting copies=2. But to eliminate the possibility entirely, you must have ECC. You should also make sure that the buses have at least parity if not ECC and that this is actually checked - maybe Richard can comment on this since I believe he thinks this is a more likely source of errors. HTH -- Frank _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss