Peter Jeremy wrote:
I have a zpool on a JBOD SE3320 that I was using for data with Solaris 10 (the root/usr/var filesystems were all UFS). Unfortunately, we had a bit of a mixup with SCSI cabling and I believe that we created a SCSI target clash. The system was unloaded and nothing happened until I ran "zpool status" at which point things broke. After correcting all the cabling, Solaris panic'd before reaching single user.
Do you have crash dump of this panic saved?
Sun Support could only suggest restoring from backups - but unfortunately, we do not have backups of some of the data that we would like to recover. Since OpenSolaris has a much newer version of ZFS, I thought I would give OpenSolaris a try and it looks slightly more promising, though I still can't access the pool. The following is using snv125 on a T2000. r...@als253:~# zpool import -F data Nov 17 15:26:46 opensolaris zfs: WARNING: can't open objset for data/backup r...@als253:~# zpool status -v data pool: data state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM data FAULTED 0 0 3 bad intent log raidz2-0 DEGRADED 0 0 18 c2t8d0 FAULTED 0 0 0 too many errors c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c3t12d0 DEGRADED 0 0 0 too many errors c3t13d0 ONLINE 0 0 0 r...@als253:~# zpool online data c2t8d0 Nov 17 15:28:42 opensolaris zfs: WARNING: can't open objset for data/backup cannot open 'data': pool is unavailable r...@als253:~# zpool clear data cannot clear errors for data: one or more devices is currently unavailable r...@als253:~# zpool clear -F data cannot open '-F': name must begin with a letter
Option -F is new one added with pool recovery support, so it'll be available in build 128 only
r...@als253:~# zpool status data pool: data state: FAULTED status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. Manually marking the device repaired using 'zpool clear' may allow some data to be recovered. scrub: none requested config: NAME STATE READ WRITE CKSUM data FAULTED 0 0 1 corrupted data raidz2-0 FAULTED 0 0 6 corrupted data c2t8d0 FAULTED 0 0 0 too many errors c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c3t12d0 DEGRADED 0 0 0 too many errors c3t13d0 ONLINE 0 0 0 r...@als253:~# Annoyingly, data/backup is not a filesystem I'm especially worried about - I'd just like to get access to the other filesystems on it.
I think it should be possible at least in readonly mode. I cannot tell if full recovery will be possible, but at least there's good chance to get some data back.
You can try build 128 as soon as it becomes available, or you can try to build BFU archives from source and apply to your build 125 BE.
Is is possible to hack the pool to make data/backup just disappear. For that matter: 1) Why is the whole pool faulted when n-2 vdevs are online?
RAID-Z2 should survive 2 disk failures. But in this case as you say there was some misconfiguration on the storage side that as yo mention might cause SCSI target crash.
ZFS verifies checksums and in this case it looks like some critical metadata block(s) in the most recent state fails checksum verification, so corruption is present on some of the online disks too, but as one disk is faulted ad another degraded ZFS is not able to identify what other disk has problem by using combinatorial reconstruction.
2) Given that metadata is triplicated, where did the objset go?
Metadata replication helps to protect against failures localized in space, but as all copies of metadata are written at the same time, it cannot protect against failures localized in time.
regards, victor _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss