2011-11-08 22:30, Jim Klimov wrote:
Hello all,
I have an oi_148a PC with a single root disk, and since
recently it fails to boot - hangs after the copyright
message whenever I use any of my GRUB menu options.
Thanks to my wife's sister, who is my hands and eyes near
the problematic PC, here's some ZDB output from this rpool:
# zpool import
pool: rpool
id: 17995958177810353692
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
config:
rpool ONLINE
c4t1d0s0 ONLINE
So here it is - a single-device "rpool".
There are some on-disk errors, so some of zdb walks fail:
root@openindiana:~# time zdb -bb -e 17995958177810353692
Traversing all blocks to verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file
../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)
real 0m12.184s
user 0m0.367s
sys 0m0.474s
root@openindiana:~# time zdb -bsvc -e 17995958177810353692
Traversing all blocks to verify checksums and verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file
../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)
real 0m12.019s
user 0m0.360s
sys 0m0.458s
However "-bsvL" and "-bsvcL" (with checksum-checks) do finish,
results of the former test (more complete) are listed below:
root@openindiana:~# time zdb -bsvcL -e 17995958177810353692
Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 50 reading <182, 19177, 0, 1>
DVA[0]=<0:a8c8e600:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=82L/82P fill=1
cksum=3401f5fe522b:109ee10ba48ed38c:e7f49c220f7b8bc:ff405ef051b91e65 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 19202, 0, 1>
DVA[0]=<0:a9030a00:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=82L/82P fill=1
cksum=11c4c738b0ba:7bb81bce3313913:8f85a7abf1b9e34:58e8746d63119393 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 24924, 0, 0>
DVA[0]=<0:b1aaec00:14a00> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=14a00L/14a00P birth=85L/85P fill=1
cksum=270679cd905d:6119a969a134566:6f0f7da64c4d2d90:3ab86aa985abef02 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 24944, 0, 0>
DVA[0]=<0:b1cdf000:10800> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=10800L/10800P birth=85L/85P fill=1
cksum=1ebb4d1ae9f5:3cf5f42afa9a332:757613fc2d2de7b3:5f197017333a4f89 --
skipping
zdb_blkptr_cb: Got error 50 reading <493, 947, 0, 165>
DVA[0]=<0:b3efc200:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=26691L/26691P fill=1
cksum=2cdc2ae22d10:b33d31bcbc0d8da:f1571c9975e151b0:a037073594569635 --
skipping
Error counts:
errno count
50 5
block traversal size 11986202624 != alloc 11986203136 (unreachable 512)
bp count: 405927
bp logical: 15030449664 avg: 37027
bp physical: 12995855872 avg: 32015 compression: 1.16
bp allocated: 13172434944 avg: 32450 compression: 1.14
bp deduped: 1186232320 ref>1: 12767 deduplication: 1.09
SPA allocated: 11986203136 used: 56.17%
Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 4K 12.0K 6.00K 8.00 0.00 object directory
3 1.50K 1.50K 4.50K 1.50K 1.00 0.00 object array
1 16K 1.50K 4.50K 4.50K 10.67 0.00 packed nvlist
- - - - - - - packed nvlist size
197 24.2M 1.87M 5.61M 29.2K 12.92 0.04 bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map
header
1.27K 6.79M 3.25M 9.8M 7.70K 2.09 0.08 SPA space map
8 144K 144K 144K 18.0K 1.00 0.00 ZIL intent log
26.6K 426M 91.1M 182M 6.86K 4.67 1.45 DMU dnode
75 150K 39.0K 80.0K 1.07K 3.85 0.00 DMU objset
- - - - - - - DSL directory
23 12.0K 11.5K 34.5K 1.50K 1.04 0.00 DSL directory
child map
21 11.5K 10.5K 31.5K 1.50K 1.10 0.00 DSL dataset
snap map
49 707K 79.5K 239K 4.87K 8.89 0.00 DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
321K 12.0G 10.5G 10.5G 33.4K 1.14 85.46 ZFS plain file
26.8K 41.5M 19.1M 38.2M 1.42K 2.17 0.30 ZFS directory
18 17.5K 9.00K 18.0K 1K 1.94 0.00 ZFS master node
50 84.5K 25.0K 50.0K 1K 3.38 0.00 ZFS delete queue
12.1K 1.50G 1.50G 1.50G 127K 1.00 12.22 zvol object
1 1K 512 1K 1K 2.00 0.00 zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent
error log
2 256K 44.0K 132K 66.0K 5.82 0.00 SPA history
- - - - - - - SPA history offsets
1 512 512 1.50K 1.50K 1.00 0.00 Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
2 2K 1K 3.00K 1.50K 2.00 0.00 DSL dataset
next clones
- - - - - - - scan work queue
146 103K 73.0K 146K 1K 1.40 0.00 ZFS user/group used
- - - - - - - ZFS user/group
quota
1 512 512 1.50K 1.50K 1.00 0.00 snapshot
refcount tags
7.14K 28.6M 17.5M 52.6M 7.37K 1.63 0.42 DDT ZAP algorithm
2 32K 4K 12.0K 6.00K 8.00 0.00 DDT statistics
- - - - - - - System attributes
18 9.00K 9.00K 18.0K 1K 1.00 0.00 SA master node
18 27.0K 9.00K 18.0K 1K 3.00 0.00 SA attr
registration
44 704K 77.0K 154K 3.50K 9.14 0.00 SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
133 71.0K 66.5K 200K 1.50K 1.07 0.00 DSL deadlist map
- - - - - - - DSL deadlist
map hdr
3 2.50K 1.50K 4.50K 1.50K 1.67 0.00 DSL dir clones
27 3.38M 122K 365K 13.5K 28.44 0.00 bpobj subobj
144 1.42M 228K 683K 4.74K 6.37 0.01 deferred free
4 130K 130K 130K 32.5K 1.00 0.00 dedup ditto
396K 14.0G 12.1G 12.3G 31.7K 1.16 100.00 Total
capacity operations bandwidth ----
errors ----
description used avail read write read write read
write cksum
rpool 11.2G 8.71G 308 0 11.2M 0 0
0 5
/dev/dsk/c4t1d0s0 11.2G 8.71G 308 0 11.2M 0 0
0 10
real 38m56.588s
user 4m15.708s
sys 0m56.255s
I see a non-empty deferred-free list and, apparently,
blocks with checksum errors. If I read this right, four
blocks are from old generations (TXGs 82 and 85?), and
one is more recent (26691). What else does a trained eye
see which I don't?
According to "zdb -l" below, current TXG numbers are in
560 million range...
root@openindiana:~# zdb -l /dev/dsk/c4t1d0s0
--------------------------------------------
LABEL 0
--------------------------------------------
version: 28
name: 'rpool'
state: 0
txg: 560647931
pool_guid: 17995958177810353692
hostid: 13583512
hostname: ''
top_guid: 3656218981390172871
guid: 3656218981390172871
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 3656218981390172871
path: '/dev/dsk/c4t1d0s0'
devid: 'id1,sd@SATA_____ST3808110AS_________________5LR557KB/a'
phys_path: '/pci@0,0/pci8086,2847@1c,4/pci1043,81e4@0/disk@1,0:a'
whole_disk: 0
metaslab_array: 30
metaslab_shift: 27
ashift: 9
asize: 21430272000
is_log: 0
DTL: 4098
create_txg: 4
--------------------------------------------
LABEL 1
--------------------------------------------
version: 28
name: 'rpool'
state: 0
txg: 560647931
pool_guid: 17995958177810353692
hostid: 13583512
hostname: ''
top_guid: 3656218981390172871
guid: 3656218981390172871
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 3656218981390172871
path: '/dev/dsk/c4t1d0s0'
devid: 'id1,sd@SATA_____ST3808110AS_________________5LR557KB/a'
phys_path: '/pci@0,0/pci8086,2847@1c,4/pci1043,81e4@0/disk@1,0:a'
whole_disk: 0
metaslab_array: 30
metaslab_shift: 27
ashift: 9
asize: 21430272000
is_log: 0
DTL: 4098
create_txg: 4
--------------------------------------------
LABEL 2
--------------------------------------------
version: 28
name: 'rpool'
state: 0
txg: 560647931
pool_guid: 17995958177810353692
hostid: 13583512
hostname: ''
top_guid: 3656218981390172871
guid: 3656218981390172871
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 3656218981390172871
path: '/dev/dsk/c4t1d0s0'
devid: 'id1,sd@SATA_____ST3808110AS_________________5LR557KB/a'
phys_path: '/pci@0,0/pci8086,2847@1c,4/pci1043,81e4@0/disk@1,0:a'
whole_disk: 0
metaslab_array: 30
metaslab_shift: 27
ashift: 9
asize: 21430272000
is_log: 0
DTL: 4098
create_txg: 4
--------------------------------------------
LABEL 3
--------------------------------------------
version: 28
name: 'rpool'
state: 0
txg: 560647931
pool_guid: 17995958177810353692
hostid: 13583512
hostname: ''
top_guid: 3656218981390172871
guid: 3656218981390172871
vdev_children: 1
vdev_tree:
type: 'disk'
id: 0
guid: 3656218981390172871
path: '/dev/dsk/c4t1d0s0'
devid: 'id1,sd@SATA_____ST3808110AS_________________5LR557KB/a'
phys_path: '/pci@0,0/pci8086,2847@1c,4/pci1043,81e4@0/disk@1,0:a'
whole_disk: 0
metaslab_array: 30
metaslab_shift: 27
ashift: 9
asize: 21430272000
is_log: 0
DTL: 4098
create_txg: 4
Any ideas as to whether this rpool can be recovered into
mountable state, or recreation is my only option now? ;)
In particular, I'm currently testing with LiveUSB oi_148a
as that is what they have at the broken PC. Should we
expect for zpool import and fixup to work better with
oi_151a, oi_dev, or Solaris 11 (Express or Release)?
It might be problematic to record another bootable
device remotely, so if no related code has changed...
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss