So, I had a fun ZFS learning experience a few months ago.  A server of mine 
suddenly dropped off the network, or so it seemed.  It was an OpenSolaris 
2008.05 box serving up samba shares from a ZFS pool, but it noticed too many 
checksum errors and so decided it was time to take the pool down so as to save 
the (apparently) dying disk from further damage.  Seemed inconvenient at the 
time, but a in hindsight that's a cool feature.  Haven't actually found any 
problems with the drive (an SSD), which has worked fine ever since.  Bit rot?  
Power failure (we had a lot of those for a while)?  Who knows.  At first I was 
afraid my ZFS pool had corrupted itself until I realized that it was a unique 
feature of ZFS actually protecting me from further damage rather than ZFS 
itself being the problem.

At any rate, in this case, the corruption managed to make it over to my backup 
server replicated with SNDR.  One of the corrupted blocks happened to be 
referenced by every single one of my daily snapshots going back nearly a year.  
I had no mirrored storage and copies set to 1.  Arguably a bad setup, I'm sure, 
but that's why I had a replicated server.  At any rate, I didn't care about the 
file referencing the corrupt block.  I would just as well have deleted it, but 
it was still referenced by all the snapshots.  It was a crisis at the time, so 
I just switched over to my replicated server (in case the drive on the primary 
server actually was bad) and deleted the files containing corrupt blocks and 
then deleted all the snapshots so zfs would quit unmounting the pool and just 
to get going again.  Things have been fine ever since, but I still wonder - is 
there something different that I could have done to get rid of the corrupt 
blocks without losing all my snapshots (could have r
 estored them from backup, but it would have taken forever).  I guess I could 
just do clones and then have the capability of deleting stuff, but then I don't 
believe I'd be able to back the thing up - if I don't do incremental zfs 
send/recv, the backup takes over 24 hours since there's so many snapshots, and 
I wouldn't think clones work with incremental zfs send/recv (especially if you 
start deleting files willy-nilly).  Am I just missing something altogether, or 
is restoring from backup the only option?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to