My understanding of the root cause of these issues is that the vast majority are happening with consumer grade hardware that is reporting to ZFS that writes have succeeded, when in fact they are still in the cache.
When that happens, ZFS believes the data is safely written, but a power cut or crash can cause severe problems with the pool. This is (I think) the reason for comments about this being a system engineering, not design problem - ZFS assumes the disks are telling the truth and has been designed this way. It is up to the administrator to engineer the server from components that accurately report their status. However, while the majority of these cases are with consumer hardware, the BBC have reported that they hit this problem using Sun T2000 servers and commodity SATA drives, so unless somebody from Sun can say otherwise, I feel that there is still some risk of this occurring on Sun hardware. I feel the ZFS marketing and documentation is very misleading in that it completely ignores the issue of your entire pool being at risk unless you are careful about the hardware used, leading to a lot of stories like this from enthusiasts and early adopters. I also believe ZFS needs recovery tools as a matter of urgency, to protect its reputation if nothing else. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss