2011-12-02 18:25, Steve Gonczi пишет:
Hi Jim,
Try to run a "zdb -bbbbb poolname" ..
This should report any leaked or double allocated blocks.
(It may or may not run, it tends to run out of memory and crash on large
datasets)
I would be curious what zdb reports, and whether you are able to run it w/o
crashing with "out of memory".
Ok, when/if it completes scrubbing the pool, I'll try that.
But it is likely to fail, unless there are some new failsafe
workarounds for such failures in oi_151a.
In the meanwhile, here are copies of zdb walks which I did
a couple of weeks ago while repairing (finally replacing)
the rpool on this box. At that time it was booted with
oi_148a LiveUSB. Some of the walks (those WITH leak-checks
not disabled) never completed:
root@openindiana:~# time zdb -bb -e 1601233584937321596
Traversing all blocks to verify nothing leaked ...
(box hung: LAN Disconnected; RAM/SWAP used up according to "vmstat 1")
root@openindiana:~# time zdb -bsvc -e 1601233584937321596
Traversing all blocks to verify checksums and verify nothing leaked ...
Assertion failed: zio_wait(zio_claim(0L, zcb->zcb_spa, refcnt ? 0 :
spa_first_txg(zcb->zcb_spa), bp, 0L, 0L, ZIO_FLAG_CANFAIL)) == 0 (0x2 ==
0x0), file ../zdb.c, line 1950
Abort
real 7197m41.288s
user 291m39.256s
sys 25m48.133s
This took most of the week just to fail.
And a walk without leak checks took half a day to find
some discrepancies and "unreachable" blocks:
root@openindiana:~# time zdb -bsvL -e 1601233584937321596
Traversing all blocks ...
block traversal size 9044729487360 != alloc 9044729499648
(unreachable 12288)
bp count: 85245222
bp logical: 8891466103808 avg: 104304
bp physical: 7985508591104 avg: 93676 compression:
1.11
bp allocated: 12429007810560 avg: 145802 compression:
0.72
bp deduped: 3384278323200 ref>1: 13909855
deduplication: 1.27
SPA allocated: 9044729499648 used: 75.64%
Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 4K 72.0K 36.0K 8.00 0.00 object directory
3 1.50K 1.50K 108K 36.0K 1.00 0.00 object array
2 32K 2.50K 72.0K 36.0K 12.80 0.00 packed nvlist
- - - - - - - packed nvlist size
7.80K 988M 208M 1.12G 147K 4.75 0.01 bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map
header
183K 753M 517M 6.49G 36.3K 1.46 0.06 SPA space map
22 1020K 1020K 1.58M 73.6K 1.00 0.00 ZIL intent log
933K 14.6G 3.11G 25.2G 27.6K 4.69 0.22 DMU dnode
1.75K 3.50M 896K 42.0M 24.0K 4.00 0.00 DMU objset
- - - - - - - DSL directory
390 243K 200K 13.7M 36.0K 1.21 0.00 DSL directory
child map
388 298K 208K 13.6M 36.0K 1.43 0.00 DSL dataset
snap map
715 10.2M 1.14M 25.1M 36.0K 8.92 0.00 DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
76.1M 8.06T 7.25T 11.2T 150K 1.11 98.67 ZFS plain file
2.17M 2.76G 1.33G 52.7G 24.3K 2.08 0.46 ZFS directory
341 314K 171K 7.99M 24.0K 1.84 0.00 ZFS master node
857 25.5M 1.16M 20.1M 24.1K 21.94 0.00 ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent
error log
33 4.02M 763K 4.46M 139K 5.39 0.00 SPA history
- - - - - - - SPA history offsets
1 512 512 36.0K 36.0K 1.00 0.00 Pool properties
- - - - - - - DSL permissions
17.1K 12.7M 8.63M 411M 24.0K 1.48 0.00 ZFS ACL
- - - - - - - ZFS SYSACL
5 80.0K 5.00K 120K 24.0K 16.00 0.00 FUID table
- - - - - - - FUID table size
1.37K 723K 705K 49.3M 36.0K 1.03 0.00 DSL dataset
next clones
- - - - - - - scan work queue
2.69K 2.57M 1.36M 64.6M 24.0K 1.89 0.00 ZFS user/group used
- - - - - - - ZFS user/group
quota
- - - - - - - snapshot
refcount tags
1.87M 7.48G 4.41G 67.4G 36.0K 1.70 0.58 DDT ZAP algorithm
2 32K 4.50K 72.0K 36.0K 7.11 0.00 DDT statistics
21 10.5K 10.5K 504K 24.0K 1.00 0.00 System attributes
288 144K 144K 6.75M 24.0K 1.00 0.00 SA master node
288 432K 144K 6.75M 24.0K 3.00 0.00 SA attr
registration
576 9.00M 1008K 13.5M 24.0K 9.14 0.00 SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
1.84K 4.73M 1.20M 66.3M 36.0K 3.95 0.00 DSL deadlist map
- - - - - - - DSL deadlist
map hdr
94 68.0K 50.0K 3.30M 36.0K 1.36 0.00 DSL dir clones
11 1.38M 49.5K 792K 72.0K 28.44 0.00 bpobj subobj
- - - - - - - deferred free
815 22.0M 10.3M 23.8M 29.9K 2.13 0.00 dedup ditto
81.3M 8.09T 7.26T 11.3T 142K 1.11 100.00 Total
capacity operations bandwidth ----
errors ----
description used avail read write read write read
write cksum
pool 8.23T 2.65T 88 0 416K 0 0
0 0
raidz2 8.23T 2.65T 88 0 416K 0 0
0 0
/dev/dsk/c6t0d0s0 8 0 495K 0 0
0 0
/dev/dsk/c6t1d0s0 13 0 528K 0 0
0 0
/dev/dsk/c6t2d0s0 9 0 476K 0 0
0 0
/dev/dsk/c6t3d0s0 8 0 493K 0 0
0 0
/dev/dsk/c6t4d0s0 13 0 528K 0 0
0 0
/dev/dsk/c6t5d0s0 9 0 479K 0 0
0 0
real 635m10.257s
user 313m16.279s
sys 5m5.792s
----
Just for posterity, I had some interesting zdb failures on
that non-redundant rpool which I finally recreated with
copies=2:
root@openindiana:~# time zdb -bb -e 17995958177810353692
Traversing all blocks to verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file
../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)
real 0m12.184s
user 0m0.367s
sys 0m0.474s
root@openindiana:~# time zdb -bsvc -e 17995958177810353692
Traversing all blocks to verify checksums and verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file
../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)
real 0m12.019s
user 0m0.360s
sys 0m0.458s
The rpool did have 5 checksum errors, all in files (also
encountered while cpio'ing data from it), but it could
not get imported in a read-write mode (box hung quickly).
root@openindiana:~# time zdb -bsvcL -e 17995958177810353692
Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 50 reading <182, 19177, 0, 1>
DVA[0]=<0:a8c8e600:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=82L/82P fill=1
cksum=3401f5fe522b:109ee10ba48ed38c:e7f49c220f7b8bc:ff405ef051b91e65 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 19202, 0, 1>
DVA[0]=<0:a9030a00:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=82L/82P fill=1
cksum=11c4c738b0ba:7bb81bce3313913:8f85a7abf1b9e34:58e8746d63119393 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 24924, 0, 0>
DVA[0]=<0:b1aaec00:14a00> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=14a00L/14a00P birth=85L/85P fill=1
cksum=270679cd905d:6119a969a134566:6f0f7da64c4d2d90:3ab86aa985abef02 --
skipping
zdb_blkptr_cb: Got error 50 reading <182, 24944, 0, 0>
DVA[0]=<0:b1cdf000:10800> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=10800L/10800P birth=85L/85P fill=1
cksum=1ebb4d1ae9f5:3cf5f42afa9a332:757613fc2d2de7b3:5f197017333a4f89 --
skipping
zdb_blkptr_cb: Got error 50 reading <493, 947, 0, 165>
DVA[0]=<0:b3efc200:20000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=26691L/26691P fill=1
cksum=2cdc2ae22d10:b33d31bcbc0d8da:f1571c9975e151b0:a037073594569635 --
skipping
Error counts:
errno count
50 5
block traversal size 11986202624 != alloc 11986203136 (unreachable 512)
bp count: 405927
bp logical: 15030449664 avg: 37027
bp physical: 12995855872 avg: 32015 compression: 1.16
bp allocated: 13172434944 avg: 32450 compression: 1.14
bp deduped: 1186232320 ref>1: 12767 deduplication: 1.09
SPA allocated: 11986203136 used: 56.17%
...
The error you see could be because of a double allocation.
Should never happen but it evidently did...
(2 different files and ddt entries end up referencing the same physical
block on the disk,
The 2nd object mistakenly thinks this block is free, but it is actually
in use by the 1st object)
when the 2nd user writes the block, the 1st ddt's checksum becomes
mis-matched with the actual block content).
Interesting theory, sounds feasible.
I forget which metadata file is object id 0, I think it is the meta
dnode file.
It is kinda important.
Kinda makes sense ;(
Steve
Thanks,
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss