Hello Robert,
> Which would happen if you have problem with HW and you're getting
> wring checksums on both side of your mirrors. Maybe PS?
>
> Try memtest anyway or sunvts
Unfortunately, SunVTS doesn't run on non-Sun/OEM hardware. And memtest requires 
too much downtime which I cannot afford right now.

However, I've interesting observations and now I can reproduce crash. It seems 
that I've bad checksum(s) and ZFS crashes each time when it tries to read it. 
Below are two cases:



Case1: I've got a checksum error not striped over mirrors, this time it was 
checksum for a file and not <0x0>. I tried to read file twice. First try 
returned I/O error, second try caused panic. Here's the log:




core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     2
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     2
            c2d1    ONLINE       0     0     4
            c1d1    ONLINE       0     0     4
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# ll /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
-rw-------   1 user group       489 Apr 20  2006 
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
cat: input error on 
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file: I/O error

core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     4
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     4
            c2d1    ONLINE       0     0     8
            c1d1    ONLINE       0     0     8
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
(Kernel Panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 
addr=fffffe80882b7000)
...
(after system boot up)
core# rm /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        box5:<0x4a049a>

core# mdb unix.17 vmcore.17
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp 
ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp 
crypto ptm ]
> ::status
debugging crash dump vmcore.17 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 
addr=fffffe80882b7000
dump content: kernel pages only
> ::stack
fletcher_2_native+0x13()
zio_checksum_verify+0x27()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_cache_read+0x14c()
vdev_disk_io_start+0x135()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
zio_vdev_io_start+0x116()
zio_next_stage+0x65()
zio_ready+0xec()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_ready+0x15()
zio_next_stage_async+0xae()
zio_nowait+9()
arc_read+0x414()
dbuf_read_impl+0x1a0()
dbuf_read+0x95()
dmu_buf_hold_array_by_dnode+0x217()
dmu_buf_hold_array+0x81()
dmu_read_uio+0x49()
zfs_read+0x13c()
fop_read+0x31()
read+0x188()
read32+0xe()
_sys_sysenter_post_swapgs+0x14b()
> ::msgbuf
MESSAGE                                                              
...
 
panic[cpu0]/thread=ffffffff8b98e240:
BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000
 
 
cat:
#pf Page fault
Bad kernel fault at addr=0xfffffe80882b7000
pid=17993, pc=0xfffffffff1192923, sp=0xfffffe8001112588, eflags=0x10287
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
cr2: fffffe80882b7000 cr3: df609000 cr8: c
        rdi: fffffe80882b7000 rsi: fffffe80884ae200 rdx: fffffe80011125c0
        rcx: 30fceb4cd1a3ca3f  r8: 93e4f7b6a3198113  r9: 73457f74abfdcdf5
        rax:   dc3bf3d311149f rbx: ffffffff8f68a700 rbp: fffffe8001112630
        r10: ffffffff8ecb5c30 r11:                0 r12: fffffe80884ae200
        r13:               c0 r14:           200200 r15: fffffe80882ae000
        fsb: ffffffff80000000 gsb: fffffffffbc24e40  ds:               43
         es:               43  fs:                0  gs:              1c3
        trp:                e err:                0 rip: fffffffff1192923
         cs:               28 rfl:            10287 rsp: fffffe8001112588
         ss:               30
 
fffffe80011123a0 unix:real_mode_end+71e1 ()
fffffe8001112480 unix:trap+5e6 ()
fffffe8001112490 unix:cmntrap+140 ()
fffffe8001112630 zfs:zfsctl_ops_root+30d9707b ()
fffffe8001112650 zfs:zio_checksum_verify+27 ()
fffffe8001112660 zfs:zio_next_stage+65 ()
fffffe8001112690 zfs:zio_wait_for_children+49 ()
fffffe80011126a0 zfs:zio_wait_children_done+15 ()
fffffe80011126b0 zfs:zio_next_stage+65 ()
fffffe80011126f0 zfs:zio_vdev_io_assess+84 ()
fffffe8001112700 zfs:zio_next_stage+65 ()
fffffe80011127e0 zfs:vdev_cache_read+14c ()
fffffe8001112820 zfs:vdev_disk_io_start+135 ()
fffffe8001112830 zfs:vdev_io_start+12 ()
fffffe8001112860 zfs:zio_vdev_io_start+7b ()
fffffe8001112870 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112880 zfs:zio_nowait+9 ()
fffffe80011128e0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011128f0 zfs:vdev_io_start+12 ()
fffffe8001112920 zfs:zio_vdev_io_start+7b ()
fffffe8001112930 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112940 zfs:zio_nowait+9 ()
fffffe80011129a0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011129d0 zfs:zio_vdev_io_start+116 ()
fffffe80011129e0 zfs:zio_next_stage+65 ()
fffffe8001112a00 zfs:zio_ready+ec ()
fffffe8001112a10 zfs:zio_next_stage+65 ()
fffffe8001112a40 zfs:zio_wait_for_children+49 ()
fffffe8001112a50 zfs:zio_wait_children_ready+15 ()
fffffe8001112a60 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112a70 zfs:zio_nowait+9 ()
fffffe8001112af0 zfs:arc_read+414 ()
fffffe8001112b70 zfs:dbuf_read_impl+1a0 ()
fffffe8001112bb0 zfs:zfsctl_ops_root+30d812dd ()
fffffe8001112c20 zfs:dmu_buf_hold_array_by_dnode+217 ()
fffffe8001112c70 zfs:dmu_buf_hold_array+81 ()
fffffe8001112cd0 zfs:zfsctl_ops_root+30d84641 ()
fffffe8001112d30 zfs:zfs_read+13c ()
fffffe8001112d80 genunix:fop_read+31 ()
fffffe8001112eb0 genunix:read+188 ()
fffffe8001112ec0 genunix:read32+e ()
fffffe8001112f10 unix:brand_sys_sysenter+1f2 ()
 
syncing file systems...
 14
 10
 8
...
 8
 done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel
> ^D


-----------------------------------------

Case2: At this point I've started to thinking how to reproduce such error. 
Obviously, I need to read all blocks to stumble upon bad checksum(s). So I 
decided to backup whole box5 pool using zfs send/receive to another pool I had 
in the system - box7. I've got very interesting results:



# zpool status -v
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
 
  pool: box7
 state: ONLINE
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box7        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
 
errors: No known data errors

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
box5   175G   282G   175G  /u02
box7  4.26G   681G  4.26G  /box7

# zfs snapshot [EMAIL PROTECTED]
# zfs send [EMAIL PROTECTED] | zfs receive box7/backup1 &
[1] 11056

# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
box5           175G   282G   175G  /u02
[EMAIL PROTECTED]  2.07M      -   175G  -
box7          4.26G   681G  4.26G  /box7
box7/backup1  1.50K   681G  1.50K  /box7/backup1
(Kernel Panic. See panic message below)
...
(after system boot up)
# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
box5           175G   282G   175G  /u02
[EMAIL PROTECTED]  24.2M      -   175G  -
box7          6.35G   679G  6.35G  /box7

# mdb unix.20 vmcore.20
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp 
ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp 
crypto ptm ]
> ::status
debugging crash dump vmcore.20 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 
4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE 
contiguous birth=10088934 fill=19 
cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
dump content: kernel pages only
> ::stack
vpanic()
0xfffffffff11b1af4()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_mirror_io_done+0xc1()
zio_vdev_io_done+0x14()
taskq_thread+0xbc()
thread_start+8()
> ::msgbuf
MESSAGE
...
 
panic[cpu0]/thread=fffffe800081dc80:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 
4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE 
contiguous birth=10088934 fill=19 
cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
 
 
fffffe800081dac0 zfs:zfsctl_ops_root+30db624c ()
fffffe800081dad0 zfs:zio_next_stage+65 ()
fffffe800081db00 zfs:zio_wait_for_children+49 ()
fffffe800081db10 zfs:zio_wait_children_done+15 ()
fffffe800081db20 zfs:zio_next_stage+65 ()
fffffe800081db60 zfs:zio_vdev_io_assess+84 ()
fffffe800081db70 zfs:zio_next_stage+65 ()
fffffe800081dbd0 zfs:vdev_mirror_io_done+c1 ()
fffffe800081dbe0 zfs:zio_vdev_io_done+14 ()
fffffe800081dc60 genunix:taskq_thread+bc ()
fffffe800081dc70 unix:thread_start+8 ()
 
syncing file systems...
 5
 3
 1
...
 1
 done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel
>


You can see checksum type fletcher4 (with lzjb compression?) in panic message. 
However, all of the ZFS file systems are fletcher2 w/o compression. So this is 
obviously corrupted checksum. I've run backup (send/receive) test described 
above 3 times. Every time I've got the same error and each time system panic 
approx. 45 minutes after start.

>From all these I can conclude that it happens when "zfs send" tries to read 
>the same bad checksum. Bad checksum could appear in the result of software or 
>hardware bug. However, it's obvious that zfs cannot properly handle and fix 
>this checksum problem.

I tried also to run zdb, but it run out of the memory:

# zdb box5
    version=4
    name='box5'
    state=0
    txg=11057710
    pool_guid=989471958079851180
    vdev_tree
        type='root'
        id=0
        guid=989471958079851180
        children[0]
                type='mirror'
                id=0
                guid=9879820675701757866
                metaslab_array=16
                metaslab_shift=31
                ashift=9
                asize=250045530112
                children[0]
                        type='disk'
                        id=0
                        guid=6440359594760637663
                        path='/dev/dsk/c1d0s0'
                        devid='id1,[EMAIL PROTECTED]/a'
                        whole_disk=1
                        DTL=125
                children[1]
                        type='disk'
                        id=1
                        guid=80751879044845160
                        path='/dev/dsk/c2d0s0'
                        devid='id1,[EMAIL PROTECTED]/a'
                        whole_disk=1
                        DTL=23
        children[1]
                type='mirror'
                id=1
                guid=4781215615782453677
                whole_disk=0
                metaslab_array=13
                metaslab_shift=31
                ashift=9
                asize=250045530112
                children[0]
                        type='disk'
                        id=0
                        guid=4849048357332929360
                        path='/dev/dsk/c2d1s0'
                        devid='id1,[EMAIL PROTECTED]/a'
                        whole_disk=1
                        DTL=21
                children[1]
                        type='disk'
                        id=1
                        guid=15140711491939156235
                        path='/dev/dsk/c1d1s0'
                        devid='id1,[EMAIL PROTECTED]/a'
                        whole_disk=1
                        DTL=19
Uberblock
 
        magic = 0000000000bab10c
        version = 4
        txg = 11061627
        guid_sum = 5267891425222527909
        timestamp = 1209925302 UTC = Sun May  4 23:21:42 2008
 
Dataset mos [META], ID 0, cr_txg 4, 176M, 201 objects
Dataset [EMAIL PROTECTED] [ZPL], ID 199, cr_txg 11060758, 175G, 4997266 objects
Dataset box5 [ZPL], ID 5, cr_txg 4, 175G, 4998785 objects
 
Traversing all blocks to verify checksums and verify nothing leaked ...

out of memory -- generating core dump
Abort (core dumped)


So what I'm supposed to do now with all these 5 million objects? I cannot even 
backup them.


Regards,
Rustam.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to