SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Apr 17 12:25:49 PDT 2007
PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: ce624168-b522-e35b-d4e8-a8e4b9169ad1
DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for
more information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: The pool data is unavailable
REC-ACTION: Run 'zpool status -x' and either attach the missing device or
restore from backup.
twinkie># zpool status -x
all pools are healthy
twinkie># zpool status -v
pool: tank
state: FAULTED
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz2 UNAVAIL 0 0 0 corrupted data
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c3t0d0s0 ONLINE 0 0 0
c3t0d0s1 ONLINE 0 0 0
c3t1d0s0 ONLINE 0 0 0
c3t1d0s1 ONLINE 0 0 0
c3t2d0s0 ONLINE 0 0 0
c3t2d0s1 ONLINE 0 0 0
c3t3d0s0 ONLINE 0 0 0
c3t3d0s1 ONLINE 0 0 0
c3t4d0s0 ONLINE 0 0 0
c3t4d0s1 ONLINE 0 0 0
c3t5d0s0 ONLINE 0 0 0
c3t5d0s1 ONLINE 0 0 0
c3t6d0s0 ONLINE 0 0 0
c3t6d0s1 ONLINE 0 0 0
c3t7d0s0 ONLINE 0 0 0
c3t7d0s1 ONLINE 0 0 0
c3t16d0s0 ONLINE 0 0 0
c3t16d0s1 ONLINE 0 0 0
c3t17d0s0 ONLINE 0 0 0
c3t17d0s1 ONLINE 0 0 0
c3t18d0s0 ONLINE 0 0 0
c3t18d0s1 ONLINE 0 0 0
c3t19d0s0 ONLINE 0 0 0
c3t19d0s1 ONLINE 0 0 0
c3t20d0s0 ONLINE 0 0 0
c3t20d0s1 ONLINE 0 0 0
c3t21d0s0 ONLINE 0 0 0
c3t21d0s1 ONLINE 0 0 0
c3t22d0s0 ONLINE 0 0 0
c3t22d0s1 ONLINE 0 0 0
c3t23d0s0 ONLINE 0 0 0
c3t23d0s1 ONLINE 0 0 0
-->>I 'm current on all patches as of yesterday:
SunOS twinkie 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-880
-->>I'll put the rest in an attachment as this will be a long post.
This message posted from opensolaris.org
-->>This problem was with a V880 with attached storage, an A5200. The 880 is
configured with 2 disk bays and the A5200 is connected via fiber to 2nd channel
on the the pci-x dual host bus adapter.
-->>I have to admit I probably, unknowingly at the time, caused my own problem,
however I've now seen the zfs status out of sync on 2 different occasions.
1st time:
>---------------------------------<
SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Mon Apr 16 15:36:14 PDT 2007
PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 9448ced6-4dea-c3ba-e13a-f9028f14b328
DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for
more information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: The pool data is unavailable
REC-ACTION: Run 'zpool status -x' and either attach the missing device or
restore from backup.
twinkie console login: root
[twinkie] # bash
twinkie># zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c3t0d0s0 ONLINE 0 0 0
c3t0d0s1 ONLINE 0 0 0
c3t1d0s0 ONLINE 0 0 0
c3t1d0s1 ONLINE 0 0 0
c3t2d0s0 ONLINE 0 0 0
c3t2d0s1 ONLINE 0 0 0
c3t3d0s0 ONLINE 0 0 0
c3t3d0s1 ONLINE 0 0 0
c3t4d0s0 ONLINE 0 0 0
c3t4d0s1 ONLINE 0 0 0
c3t5d0s0 ONLINE 0 0 0
c3t5d0s1 ONLINE 0 0 0
c3t6d0s0 ONLINE 0 0 0
c3t6d0s1 ONLINE 0 0 0
c3t7d0s0 ONLINE 0 0 0
c3t7d0s1 ONLINE 0 0 0
c3t16d0s0 ONLINE 0 0 0
c3t16d0s1 ONLINE 0 0 0
c3t17d0s0 ONLINE 0 0 0
c3t17d0s1 ONLINE 0 0 0
c3t18d0s0 ONLINE 0 0 0
c3t18d0s1 ONLINE 0 0 0
c3t19d0s0 ONLINE 0 0 0
c3t19d0s1 ONLINE 0 0 0
c3t20d0s0 ONLINE 0 0 0
c3t20d0s1 ONLINE 0 0 0
c3t21d0s0 ONLINE 0 0 0
c3t21d0s1 ONLINE 0 0 0
c3t22d0s0 ONLINE 0 0 0
c3t22d0s1 ONLINE 0 0 0
c3t23d0s0 ONLINE 0 0 0
c3t23d0s1 ONLINE 0 0 0
errors: No known data errors
twinkie># zpool status -x
all pools are healthy
-->>I checked Sunsolve for any reason for this and saw someone with a similar
issue using 10_11-06 and at kernel rev -36 so I patched the system to current
and didn't see the error again.
2nd time
-------------------------
-->>Then looking for a reason/solution to wait time with Oracle I upgraded
firmware on my disks. After applying the firmware and rebooting I got an error
on the console:
WARNING: /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/SUNW,[EMAIL
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 (ssd5):
Corrupt label; wrong magic number
WARNING: /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/SUNW,[EMAIL
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 (ssd3):
Corrupt label; wrong magic number
->>Upon logging in I ran format
twinkie># format
Searching for disks...done
c2t9d0: configured with capacity of 33.92GB
c2t13d0: configured with capacity of 33.92GB
-->> I used format to label the disks and then ran zpool status.
twinkie># zpool status
pool: tank
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t9d0 UNAVAIL 0 0 0 cannot open
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 UNAVAIL 0 0 0 cannot open
c3t0d0s0 ONLINE 0 0 0
c3t0d0s1 ONLINE 0 0 0
c3t1d0s0 ONLINE 0 0 0
c3t1d0s1 ONLINE 0 0 0
c3t2d0s0 ONLINE 0 0 0
c3t2d0s1 ONLINE 0 0 0
c3t3d0s0 ONLINE 0 0 0
c3t3d0s1 ONLINE 0 0 0
c3t4d0s0 ONLINE 0 0 0
c3t4d0s1 ONLINE 0 0 0
c3t5d0s0 ONLINE 0 0 0
c3t5d0s1 ONLINE 0 0 0
c3t6d0s0 ONLINE 0 0 0
c3t6d0s1 ONLINE 0 0 0
c3t7d0s0 ONLINE 0 0 0
c3t7d0s1 ONLINE 0 0 0
c3t16d0s0 ONLINE 0 0 0
c3t16d0s1 ONLINE 0 0 0
c3t17d0s0 ONLINE 0 0 0
c3t17d0s1 ONLINE 0 0 0
c3t18d0s0 ONLINE 0 0 0
c3t18d0s1 ONLINE 0 0 0
c3t19d0s0 ONLINE 0 0 0
c3t19d0s1 ONLINE 0 0 0
c3t20d0s0 ONLINE 0 0 0
c3t20d0s1 ONLINE 0 0 0
c3t21d0s0 ONLINE 0 0 0
c3t21d0s1 ONLINE 0 0 0
c3t22d0s0 ONLINE 0 0 0
c3t22d0s1 ONLINE 0 0 0
c3t23d0s0 ONLINE 0 0 0
c3t23d0s1 ONLINE 0 0 0
errors: No known data errors
-->>So I attempted to online one of the disks
twinkie># zpool online tank c2t9d0
Bringing device c2t9d0 online
twinkie># zpool status
pool: tank
state: FAULTED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed with 19 errors on Tue Apr 17 12:08:25 2007
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz2 UNAVAIL 0 0 0 corrupted data
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 32 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 24 0
c3t0d0s0 ONLINE 0 0 0
c3t0d0s1 ONLINE 0 0 0
c3t1d0s0 ONLINE 0 0 0
c3t1d0s1 ONLINE 0 0 0
c3t2d0s0 ONLINE 0 0 0
c3t2d0s1 ONLINE 0 0 0
c3t3d0s0 ONLINE 0 0 0
c3t3d0s1 ONLINE 0 0 0
c3t4d0s0 ONLINE 0 0 0
c3t4d0s1 ONLINE 0 0 0
c3t5d0s0 ONLINE 0 0 0
c3t5d0s1 ONLINE 0 0 0
c3t6d0s0 ONLINE 0 0 0
c3t6d0s1 ONLINE 0 0 0
c3t7d0s0 ONLINE 0 0 0
c3t7d0s1 ONLINE 0 0 0
c3t16d0s0 ONLINE 0 0 0
c3t16d0s1 ONLINE 0 0 0
c3t17d0s0 ONLINE 0 0 0
c3t17d0s1 ONLINE 0 0 0
c3t18d0s0 ONLINE 0 0 0
c3t18d0s1 ONLINE 0 0 0
c3t19d0s0 ONLINE 0 0 0
c3t19d0s1 ONLINE 0 0 0
c3t20d0s0 ONLINE 0 0 0
c3t20d0s1 ONLINE 0 0 0
c3t21d0s0 ONLINE 0 0 0
c3t21d0s1 ONLINE 0 0 0
c3t22d0s0 ONLINE 0 0 0
c3t22d0s1 ONLINE 0 0 0
c3t23d0s0 ONLINE 0 0 0
c3t23d0s1 ONLINE 0 0 0
-->>This is when the system panicked and rebooted.
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Apr 17 12:08:25 PDT 2007
PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: a96cb915-8e49-65b9-d575-b0c8ba271891
DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.
panic[cpu1]/thread=2a100e4dcc0: assertion failed: 0 == dmu_buf_hold_array(os,
object, offset, size, FALSE, FTAG, &numbufs, &dbp), file:
../../common/fs/zfs/dmu.c, line: 394
000002a100e4d560 genunix:assfail+74 (7b64e330, 7b64e380, 18a, 183d800, 11ed400,
0)
%l0-3: 0000000000000000 000000000000000f 000000000000000a 0000000000000000
%l4-7: 00000000011ed400 0000000000000000 000000000186fc00 0000000000000000
000002a100e4d610 zfs:zfsctl_ops_root+b1a9fb0 (300043021a8, f, 11a450, 10,
30009089000, 30007f37640)
%l0-3: 0000000000000001 000000000000000f 0000000000000007 0000000000000502
%l4-7: 0000030000074300 0000000000000000 0000000000000501 0000000000000000
000002a100e4d6e0 zfs:space_map_sync+278 (300036366f8, 3, 300036364a0, 10, 2, 48)
%l0-3: 0000000000000010 0000030009089000 0000030009089010 0000030009089048
%l4-7: 00007fffffffffff 0000000000007fff 0000000000000006 0000000000000010
000002a100e4d7d0 zfs:metaslab_sync+200 (30003636480, 805de, 8, 30007f37640,
30004583040, 30003774dc0)
%l0-3: 00000300043021a8 00000300036364b8 00000300036364a0 0000030003636578
%l4-7: 00000300036366f8 0000030003636698 00000300036367b8 0000000000000006
000002a100e4d890 zfs:vdev_sync+90 (30004583040, 805de, 805dd, 30003636480,
30004583288, d)
%l0-3: 00000000018a7550 0000000000000007 0000030003774ea8 0000000000000002
%l4-7: 0000030004583040 0000030003774dc0 0000000000000000 0000000000000000
000002a100e4d940 zfs:spa_sync+1d0 (30003774dc0, 805de, 1, 0, 2a100e4dcc4, 1)
%l0-3: 0000030003774f80 0000030003774f90 0000030003774ea8 000003000851b880
%l4-7: 0000000000000000 0000030004582b00 00000300043c6740 0000030003774f40
000002a100e4da00 zfs:txg_sync_thread+134 (300043c6740, 805de, 0, 2a100e4dab0,
300043c6850, 300043c6852)
%l0-3: 00000300043c6860 00000300043c6810 0000000000000000 00000300043c6818
%l4-7: 00000300043c6856 00000300043c6854 00000300043c6808 00000000000805de
-->> Upon reboot the status -x and status -v didn't sync up.
twinkie># zpool status -v
pool: tank
state: FAULTED
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz2 UNAVAIL 0 0 0 corrupted data
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c3t0d0s0 ONLINE 0 0 0
c3t0d0s1 ONLINE 0 0 0
c3t1d0s0 ONLINE 0 0 0
c3t1d0s1 ONLINE 0 0 0
c3t2d0s0 ONLINE 0 0 0
c3t2d0s1 ONLINE 0 0 0
c3t3d0s0 ONLINE 0 0 0
c3t3d0s1 ONLINE 0 0 0
c3t4d0s0 ONLINE 0 0 0
c3t4d0s1 ONLINE 0 0 0
c3t5d0s0 ONLINE 0 0 0
c3t5d0s1 ONLINE 0 0 0
c3t6d0s0 ONLINE 0 0 0
c3t6d0s1 ONLINE 0 0 0
c3t7d0s0 ONLINE 0 0 0
c3t7d0s1 ONLINE 0 0 0
c3t16d0s0 ONLINE 0 0 0
c3t16d0s1 ONLINE 0 0 0
c3t17d0s0 ONLINE 0 0 0
c3t17d0s1 ONLINE 0 0 0
c3t18d0s0 ONLINE 0 0 0
c3t18d0s1 ONLINE 0 0 0
c3t19d0s0 ONLINE 0 0 0
c3t19d0s1 ONLINE 0 0 0
c3t20d0s0 ONLINE 0 0 0
c3t20d0s1 ONLINE 0 0 0
c3t21d0s0 ONLINE 0 0 0
c3t21d0s1 ONLINE 0 0 0
c3t22d0s0 ONLINE 0 0 0
c3t22d0s1 ONLINE 0 0 0
c3t23d0s0 ONLINE 0 0 0
c3t23d0s1 ONLINE 0 0 0
twinkie># zpool status -x
all pools are healthy
-->>I tried clearing the faults in fmadm and rebooting and had the same
results. I was unable to bring the pool back online using any zpool commands.
-->>With the help of a friend, thanks Rick, we were able to determine that the
2 disks I had labeled were labeled with SMI instead of EFI. So after
repartitioning the 2 disks and having it labeled with EFI I got the proper
label attached. Once that was done I cleared the fmadm faults and rebooted.
Thank goodness the 1.3TB pool came back online.
-->>I'll be adding spares and redoing this, but for now I don't have to
recreate the entire thing.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss