Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

Ross Tue, 21 Jul 2009 00:17:36 -0700

My understanding of the root cause of these issues is that the vast majority 
are happening with consumer grade hardware that is reporting to ZFS that writes 
have succeeded, when in fact they are still in the cache.


When that happens, ZFS believes the data is safely written, but a power cut or 
crash can cause severe problems with the pool.  This is (I think) the reason 
for comments about this being a system engineering, not design problem - ZFS 
assumes the disks are telling the truth and has been designed this way.  It is 
up to the administrator to engineer the server from components that accurately 
report their status.

However, while the majority of these cases are with consumer hardware, the BBC 
have reported that they hit this problem using Sun T2000 servers and commodity 
SATA drives, so unless somebody from Sun can say otherwise, I feel that there 
is still some risk of this occurring on Sun hardware.

I feel the ZFS marketing and documentation is very misleading in that it 
completely ignores the issue of your entire pool being at risk unless you are 
careful about the hardware used, leading to a lot of stories like this from 
enthusiasts and early adopters.  I also believe ZFS needs recovery tools as a 
matter of urgency, to protect its reputation if nothing else.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

Reply via email to