Re: [zfs-discuss] more ZFS recovery

Victor Latushkin Thu, 07 Aug 2008 14:01:26 -0700

Miles Nordin пишет:
>>>>>> "r" == Ross  <[EMAIL PROTECTED]> writes:
> 
>      r> Tom wrote "There was a problem with the SAS bus which caused
>      r> various errors including the inevitable kernel panic".  It's
>      r> the various errors part that catches my eye,
> 
> yeah, possibly, but there are checksums on the SAS bus, and its
> confirmation of what CDB's have completed should always be accurate.


But there's more than that - there's storage controller behind the SAS 
bus with it's cache and loads of disks behind, and even though there are 
checksums on SAS bus, and storage controller should have not lost or 
damage any of its cache, there's still a possibility for a disk to drop 
write on the floor silently or misdirect it, or storage controller 
itself to be configured in a such way that it does not guarantee data 
protection all the time...

> If the problem was ``another machine booted up, and I told the other
> machine to 'zpool import -f' '' then maybe you have some point.  but
> just tripping over a cable shouldn't qualify as weird, nor should
> Erik's problem of the FC array losing power or connectivity.  These
> are both within the ``unclean shutdown'' category handled by UFS+log,
> FFS+softdep, ext3, reiser, xfs, vxfs, jfs, HFS+, ...

Does forceful removal of power count as unclean shutdown? If yes, I do 
it several times a day to my notebook with ZFS root. I'm typing this 
from it booted just fine from ZFS after another unclean shutdown.


>      r> Can fsck always recover a disk?  Or if the corruption is
>      r> severe enough, are there times when even that fails?  
> 
> This question is obviously silly.  write zeroes over the disk, and now
> the corruption is severe enough.  However fsck can always recover a
> disk from a kernel panic, or a power failure of the host or of the
> disks, because these things don't randomly scribble over the disk.

I have an image of UFS filesystem which passes fsck just fine but then 
panics system as soon as writes are started.


> Reports of zpool
> corruption on single vdev's mounted over SAN's would benefit from
> further investigation, or at least a healthily-suspicious scientific
> attitude that encourages someone to investigate this if it happens in
> more favorable conditions, such as inside Sun, or to someone with a
> support contract and enough time to work on a case (maybe Tom?),

The problem is such reports often do not have enough details and 
investigation in such can take lots of time and yield nothing...

> or
> someone who knows ZFS well like Pavel.  Also, there is enough concern
> for people designing paranoid systems to approach them with the view,
> ``ZFS is not always-consistent-on-disk unless it has working
> redundancy''

Again, always-consistent-on-disk is not related to redundancy. On-disk 
consistency is achieved by not writing new blocks over currently 
allocated ones regardless of redundancy of underlying vdevs. If 
underlying vdevs are redundant, you have better chance of surviving 
corruption of data stored on disk.

> Now there is another tool Anton mentioned, a recovery tool or forensic
> tool:  one that leaves the filesystem unmounted, treats the disks as
> read-only, and tries to copy data out of it onto a new filesystem.  If
> there were going to be a separate tool---say, something to handle disks
> that have been scribbled on, or fixes for problems that are really
> tricky or logically inappropriate to deal with on the mounted
> filesystem---I think a forensic/recovery tool makes more sense than an
> fsck.  If this odd stuff isn't supposed to happen, and it has happened
> anyway, you want a tool you can run more than once.  You want the
> chance to improve the tool and run it again, or to try an older
> version of the tool if the current one keeps crashing.

Reads in ZFS can be broadly classified into two types:

- ones that are not critical from the ZFS perspective meaning reads of 
user data and associated metadata where it can safely return I/O error 
in case of checksum failure,

- ones that are critical from ZFS perspectives - this is reads of ZFS 
metadata required to perform writes; depending on context it may be 
impossible to return I/O error and it has either to panic or act 
according to failmode property setting.

So for some cases of non-redundant (and even redundant, where all 
redundant copies are corrupted, e.g. simultaneous import of a pool from 
two hosts) pool corruption it may be enough to import pool in pure 
read-only mode not trying to write anything into the pool (hence not 
having to read any metadata required to do so) to be able to save all 
the data which can be read. There's an RFE for this feature but i do not 
have the number handy.


victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] more ZFS recovery

Reply via email to