Hello Robert, > Which would happen if you have problem with HW and you're getting > wring checksums on both side of your mirrors. Maybe PS? > > Try memtest anyway or sunvts Unfortunately, SunVTS doesn't run on non-Sun/OEM hardware. And memtest requires too much downtime which I cannot afford right now.
However, I've interesting observations and now I can reproduce crash. It seems that I've bad checksum(s) and ZFS crashes each time when it tries to read it. Below are two cases: Case1: I've got a checksum error not striped over mirrors, this time it was checksum for a file and not <0x0>. I tried to read file twice. First try returned I/O error, second try caused panic. Here's the log: core# zpool status -xv pool: box5 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 2 mirror ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 mirror ONLINE 0 0 2 c2d1 ONLINE 0 0 4 c1d1 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: box5:<0x0> /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file core# ll /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file -rw------- 1 user group 489 Apr 20 2006 /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file cat: input error on /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file: I/O error core# zpool status -xv pool: box5 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 4 mirror ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 mirror ONLINE 0 0 4 c2d1 ONLINE 0 0 8 c1d1 ONLINE 0 0 8 errors: Permanent errors have been detected in the following files: box5:<0x0> /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file (Kernel Panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000) ... (after system boot up) core# rm /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file core# zpool status -xv pool: box5 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2d1 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: box5:<0x0> box5:<0x4a049a> core# mdb unix.17 vmcore.17 Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto ptm ] > ::status debugging crash dump vmcore.17 (64-bit) from core operating system: 5.10 Generic_127128-11 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000 dump content: kernel pages only > ::stack fletcher_2_native+0x13() zio_checksum_verify+0x27() zio_next_stage+0x65() zio_wait_for_children+0x49() zio_wait_children_done+0x15() zio_next_stage+0x65() zio_vdev_io_assess+0x84() zio_next_stage+0x65() vdev_cache_read+0x14c() vdev_disk_io_start+0x135() vdev_io_start+0x12() zio_vdev_io_start+0x7b() zio_next_stage_async+0xae() zio_nowait+9() vdev_mirror_io_start+0xa9() vdev_io_start+0x12() zio_vdev_io_start+0x7b() zio_next_stage_async+0xae() zio_nowait+9() vdev_mirror_io_start+0xa9() zio_vdev_io_start+0x116() zio_next_stage+0x65() zio_ready+0xec() zio_next_stage+0x65() zio_wait_for_children+0x49() zio_wait_children_ready+0x15() zio_next_stage_async+0xae() zio_nowait+9() arc_read+0x414() dbuf_read_impl+0x1a0() dbuf_read+0x95() dmu_buf_hold_array_by_dnode+0x217() dmu_buf_hold_array+0x81() dmu_read_uio+0x49() zfs_read+0x13c() fop_read+0x31() read+0x188() read32+0xe() _sys_sysenter_post_swapgs+0x14b() > ::msgbuf MESSAGE ... panic[cpu0]/thread=ffffffff8b98e240: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000 cat: #pf Page fault Bad kernel fault at addr=0xfffffe80882b7000 pid=17993, pc=0xfffffffff1192923, sp=0xfffffe8001112588, eflags=0x10287 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse> cr2: fffffe80882b7000 cr3: df609000 cr8: c rdi: fffffe80882b7000 rsi: fffffe80884ae200 rdx: fffffe80011125c0 rcx: 30fceb4cd1a3ca3f r8: 93e4f7b6a3198113 r9: 73457f74abfdcdf5 rax: dc3bf3d311149f rbx: ffffffff8f68a700 rbp: fffffe8001112630 r10: ffffffff8ecb5c30 r11: 0 r12: fffffe80884ae200 r13: c0 r14: 200200 r15: fffffe80882ae000 fsb: ffffffff80000000 gsb: fffffffffbc24e40 ds: 43 es: 43 fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff1192923 cs: 28 rfl: 10287 rsp: fffffe8001112588 ss: 30 fffffe80011123a0 unix:real_mode_end+71e1 () fffffe8001112480 unix:trap+5e6 () fffffe8001112490 unix:cmntrap+140 () fffffe8001112630 zfs:zfsctl_ops_root+30d9707b () fffffe8001112650 zfs:zio_checksum_verify+27 () fffffe8001112660 zfs:zio_next_stage+65 () fffffe8001112690 zfs:zio_wait_for_children+49 () fffffe80011126a0 zfs:zio_wait_children_done+15 () fffffe80011126b0 zfs:zio_next_stage+65 () fffffe80011126f0 zfs:zio_vdev_io_assess+84 () fffffe8001112700 zfs:zio_next_stage+65 () fffffe80011127e0 zfs:vdev_cache_read+14c () fffffe8001112820 zfs:vdev_disk_io_start+135 () fffffe8001112830 zfs:vdev_io_start+12 () fffffe8001112860 zfs:zio_vdev_io_start+7b () fffffe8001112870 zfs:zfsctl_ops_root+30db75c6 () fffffe8001112880 zfs:zio_nowait+9 () fffffe80011128e0 zfs:vdev_mirror_io_start+a9 () fffffe80011128f0 zfs:vdev_io_start+12 () fffffe8001112920 zfs:zio_vdev_io_start+7b () fffffe8001112930 zfs:zfsctl_ops_root+30db75c6 () fffffe8001112940 zfs:zio_nowait+9 () fffffe80011129a0 zfs:vdev_mirror_io_start+a9 () fffffe80011129d0 zfs:zio_vdev_io_start+116 () fffffe80011129e0 zfs:zio_next_stage+65 () fffffe8001112a00 zfs:zio_ready+ec () fffffe8001112a10 zfs:zio_next_stage+65 () fffffe8001112a40 zfs:zio_wait_for_children+49 () fffffe8001112a50 zfs:zio_wait_children_ready+15 () fffffe8001112a60 zfs:zfsctl_ops_root+30db75c6 () fffffe8001112a70 zfs:zio_nowait+9 () fffffe8001112af0 zfs:arc_read+414 () fffffe8001112b70 zfs:dbuf_read_impl+1a0 () fffffe8001112bb0 zfs:zfsctl_ops_root+30d812dd () fffffe8001112c20 zfs:dmu_buf_hold_array_by_dnode+217 () fffffe8001112c70 zfs:dmu_buf_hold_array+81 () fffffe8001112cd0 zfs:zfsctl_ops_root+30d84641 () fffffe8001112d30 zfs:zfs_read+13c () fffffe8001112d80 genunix:fop_read+31 () fffffe8001112eb0 genunix:read+188 () fffffe8001112ec0 genunix:read32+e () fffffe8001112f10 unix:brand_sys_sysenter+1f2 () syncing file systems... 14 10 8 ... 8 done (not all i/o completed) dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel > ^D ----------------------------------------- Case2: At this point I've started to thinking how to reproduce such error. Obviously, I need to read all blocks to stumble upon bad checksum(s). So I decided to backup whole box5 pool using zfs send/receive to another pool I had in the system - box7. I've got very interesting results: # zpool status -v pool: box5 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2d1 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: box5:<0x0> pool: box7 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM box7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 errors: No known data errors # zfs list NAME USED AVAIL REFER MOUNTPOINT box5 175G 282G 175G /u02 box7 4.26G 681G 4.26G /box7 # zfs snapshot [EMAIL PROTECTED] # zfs send [EMAIL PROTECTED] | zfs receive box7/backup1 & [1] 11056 # zfs list NAME USED AVAIL REFER MOUNTPOINT box5 175G 282G 175G /u02 [EMAIL PROTECTED] 2.07M - 175G - box7 4.26G 681G 4.26G /box7 box7/backup1 1.50K 681G 1.50K /box7/backup1 (Kernel Panic. See panic message below) ... (after system boot up) # zfs list NAME USED AVAIL REFER MOUNTPOINT box5 175G 282G 175G /u02 [EMAIL PROTECTED] 24.2M - 175G - box7 6.35G 679G 6.35G /box7 # mdb unix.20 vmcore.20 Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto ptm ] > ::status debugging crash dump vmcore.20 (64-bit) from core operating system: 5.10 Generic_127128-11 (i86pc) panic message: ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE contiguous birth=10088934 fill=19 cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50 dump content: kernel pages only > ::stack vpanic() 0xfffffffff11b1af4() zio_next_stage+0x65() zio_wait_for_children+0x49() zio_wait_children_done+0x15() zio_next_stage+0x65() zio_vdev_io_assess+0x84() zio_next_stage+0x65() vdev_mirror_io_done+0xc1() zio_vdev_io_done+0x14() taskq_thread+0xbc() thread_start+8() > ::msgbuf MESSAGE ... panic[cpu0]/thread=fffffe800081dc80: ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE contiguous birth=10088934 fill=19 cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50 fffffe800081dac0 zfs:zfsctl_ops_root+30db624c () fffffe800081dad0 zfs:zio_next_stage+65 () fffffe800081db00 zfs:zio_wait_for_children+49 () fffffe800081db10 zfs:zio_wait_children_done+15 () fffffe800081db20 zfs:zio_next_stage+65 () fffffe800081db60 zfs:zio_vdev_io_assess+84 () fffffe800081db70 zfs:zio_next_stage+65 () fffffe800081dbd0 zfs:vdev_mirror_io_done+c1 () fffffe800081dbe0 zfs:zio_vdev_io_done+14 () fffffe800081dc60 genunix:taskq_thread+bc () fffffe800081dc70 unix:thread_start+8 () syncing file systems... 5 3 1 ... 1 done (not all i/o completed) dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel > You can see checksum type fletcher4 (with lzjb compression?) in panic message. However, all of the ZFS file systems are fletcher2 w/o compression. So this is obviously corrupted checksum. I've run backup (send/receive) test described above 3 times. Every time I've got the same error and each time system panic approx. 45 minutes after start. >From all these I can conclude that it happens when "zfs send" tries to read >the same bad checksum. Bad checksum could appear in the result of software or >hardware bug. However, it's obvious that zfs cannot properly handle and fix >this checksum problem. I tried also to run zdb, but it run out of the memory: # zdb box5 version=4 name='box5' state=0 txg=11057710 pool_guid=989471958079851180 vdev_tree type='root' id=0 guid=989471958079851180 children[0] type='mirror' id=0 guid=9879820675701757866 metaslab_array=16 metaslab_shift=31 ashift=9 asize=250045530112 children[0] type='disk' id=0 guid=6440359594760637663 path='/dev/dsk/c1d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=125 children[1] type='disk' id=1 guid=80751879044845160 path='/dev/dsk/c2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=23 children[1] type='mirror' id=1 guid=4781215615782453677 whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=250045530112 children[0] type='disk' id=0 guid=4849048357332929360 path='/dev/dsk/c2d1s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=21 children[1] type='disk' id=1 guid=15140711491939156235 path='/dev/dsk/c1d1s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=19 Uberblock magic = 0000000000bab10c version = 4 txg = 11061627 guid_sum = 5267891425222527909 timestamp = 1209925302 UTC = Sun May 4 23:21:42 2008 Dataset mos [META], ID 0, cr_txg 4, 176M, 201 objects Dataset [EMAIL PROTECTED] [ZPL], ID 199, cr_txg 11060758, 175G, 4997266 objects Dataset box5 [ZPL], ID 5, cr_txg 4, 175G, 4998785 objects Traversing all blocks to verify checksums and verify nothing leaked ... out of memory -- generating core dump Abort (core dumped) So what I'm supposed to do now with all these 5 million objects? I cannot even backup them. Regards, Rustam. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss