Re: [zfs-discuss] Recovering FAULTED zpool

Victor Latushkin Wed, 18 Nov 2009 15:58:45 -0800

Peter Jeremy wrote:

I have a zpool on a JBOD SE3320 that I was using for data with Solaris
10 (the root/usr/var filesystems were all UFS).  Unfortunately, we had
a bit of a mixup with SCSI cabling and I believe that we created a
SCSI target clash.  The system was unloaded and nothing happened until
I ran "zpool status" at which point things broke.  After correcting
all the cabling, Solaris panic'd before reaching single user.


Do you have crash dump of this panic saved?

Sun Support could only suggest restoring from backups - but
unfortunately, we do not have backups of some of the data that we
would like to recover.

Since OpenSolaris has a much newer version of ZFS, I thought I would
give OpenSolaris a try and it looks slightly more promising, though I
still can't access the pool.  The following is using snv125 on a T2000.

r...@als253:~# zpool import -F data
Nov 17 15:26:46 opensolaris zfs: WARNING: can't open objset for data/backup
r...@als253:~# zpool status -v data
  pool: data
 state: FAULTED
status: An intent log record could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
        or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        data         FAULTED      0     0     3  bad intent log
          raidz2-0   DEGRADED     0     0    18
            c2t8d0   FAULTED      0     0     0  too many errors
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0
            c2t13d0  ONLINE       0     0     0
            c3t8d0   ONLINE       0     0     0
            c3t9d0   ONLINE       0     0     0
            c3t10d0  ONLINE       0     0     0
            c3t11d0  ONLINE       0     0     0
            c3t12d0  DEGRADED     0     0     0  too many errors
            c3t13d0  ONLINE       0     0     0
r...@als253:~# zpool online data c2t8d0
Nov 17 15:28:42 opensolaris zfs: WARNING: can't open objset for data/backup
cannot open 'data': pool is unavailable
r...@als253:~# zpool clear data
cannot clear errors for data: one or more devices is currently unavailable
r...@als253:~# zpool clear -F data
cannot open '-F': name must begin with a letter

Option -F is new one added with pool recovery support, so it'll beavailable in build 128 only

r...@als253:~# zpool status data
  pool: data
 state: FAULTED
status: One or more devices are faulted in response to persistent errors.  
There are insufficient replicas for the pool to
        continue functioning.
action: Destroy and re-create the pool from a backup source.  Manually marking 
the device
        repaired using 'zpool clear' may allow some data to be recovered.
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        data         FAULTED      0     0     1  corrupted data
          raidz2-0   FAULTED      0     0     6  corrupted data
            c2t8d0   FAULTED      0     0     0  too many errors
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0
            c2t13d0  ONLINE       0     0     0
            c3t8d0   ONLINE       0     0     0
            c3t9d0   ONLINE       0     0     0
            c3t10d0  ONLINE       0     0     0
            c3t11d0  ONLINE       0     0     0
            c3t12d0  DEGRADED     0     0     0  too many errors
            c3t13d0  ONLINE       0     0     0
r...@als253:~#

Annoyingly, data/backup is not a filesystem I'm especially worried
about - I'd just like to get access to the other filesystems on it.

I think it should be possible at least in readonly mode. I cannot tellif full recovery will be possible, but at least there's good chance toget some data back.

You can try build 128 as soon as it becomes available, or you can try tobuild BFU archives from source and apply to your build 125 BE.

Is is possible to hack the pool to make data/backup just disappear.
For that matter:
1) Why is the whole pool faulted when n-2 vdevs are online?

RAID-Z2 should survive 2 disk failures. But in this case as you saythere was some misconfiguration on the storage side that as yo mentionmight cause SCSI target crash.

ZFS verifies checksums and in this case it looks like some criticalmetadata block(s) in the most recent state fails checksum verification,so corruption is present on some of the online disks too, but as onedisk is faulted ad another degraded ZFS is not able to identify whatother disk has problem by using combinatorial reconstruction.

2) Given that metadata is triplicated, where did the objset go?

Metadata replication helps to protect against failures localized inspace, but as all copies of metadata are written at the same time, itcannot protect against failures localized in time.


regards,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovering FAULTED zpool

Reply via email to