Re: [zfs-discuss] Saving scrub results before scrub completes

Siegfried Nikolaivich Thu, 28 Dec 2006 00:52:32 -0800


On 27-Dec-06, at 9:45 PM, George Wilson wrote:

Siegfried,
Can you provide the panic string that you are seeing? We should beable to pull out the persistent error log information from thecorefile. You can take a look at spa_get_errlog() function as astarting point.


This is the panic string that I am seeing:

Dec 26 18:55:51 FServe unix: [ID 836849 kern.notice]
Dec 26 18:55:51 FServe ^Mpanic[cpu1]/thread=fffffe8000929c80:

Dec 26 18:55:51 FServe genunix: [ID 683410 kern.notice] BAD TRAP:type=e (#pf Page fault) rp=fffffe8000929980 addr=ffffff00b3e621f0

Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]
Dec 26 18:55:51 FServe unix: [ID 839527 kern.notice] sched:
Dec 26 18:55:51 FServe unix: [ID 753105 kern.notice] #pf Page fault

Dec 26 18:55:51 FServe unix: [ID 532287 kern.notice] Bad kernel faultat addr=0xffffff00b3e621f0Dec 26 18:55:51 FServe unix: [ID 243837 kern.notice] pid=0,pc=0xfffffffff3eaa2b0, sp=0xfffffe8000929a78, eflags=0x10282Dec 26 18:55:51 FServe unix: [ID 211416 kern.notice] cr0:8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>Dec 26 18:55:51 FServe unix: [ID 354241 kern.notice] cr2:ffffff00b3e621f0 cr3: a3ec000 cr8: cDec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rdi:fffffe80dd69ad40 rsi: ffffff00b3e62040 rdx: 0Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rcx:ffffffff9c6bd6ce r8: 1 r9: ffffffffDec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rax:ffffff00b3e62208 rbx: ffffff00b3e62040 rbp: fffffe8000929ab0Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r10:ffffffff982421c8 r11: 1 r12: ffffff00b3e62208Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r13:ffffffff81204468 r14: 1c8 r15: fffffe80dd69ad40Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] fsb:ffffffff80000000 gsb: ffffffff80f1d000 ds: 43Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice]es: 43 fs: 0 gs: 1c3Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice]trp: e err: 0 rip: fffffffff3eaa2b0Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice]cs: 28 rfl: 10282 rsp: fffffe8000929a78Dec 26 18:55:51 FServe unix: [ID 266532 kern.notice]ss: 30

Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]

Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929890 unix:real_mode_end+6ad1 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929970 unix:trap+d77 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929980 unix:cmntrap+13f ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929ab0 zfs:vdev_queue_offset_compare+0 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929ae0 genunix:avl_add+1f ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929b60 zfs:vdev_queue_io_to_issue+1ec ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929ba0 zfs:zfsctl_ops_root+33bc48b1 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929bc0 zfs:vdev_disk_io_done+11 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929bd0 zfs:vdev_io_done+12 ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929be0 zfs:zio_vdev_io_done+1b ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929c60 genunix:taskq_thread+bc ()Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice]fffffe8000929c70 unix:thread_start+8 ()

Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]

Dec 26 18:55:51 FServe genunix: [ID 672855 kern.notice] syncing filesystems...

Dec 26 18:55:51 FServe genunix: [ID 733762 kern.notice]  3
Dec 26 18:55:52 FServe genunix: [ID 904073 kern.notice]  done

Dec 26 18:55:53 FServe genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1d0s1, offset 1719074816, content: kernel

Additionally, but perhaps not related, I came across this whilelooking at the logs:

Dec 26 17:53:00 FServe marvell88sx: [ID 812950 kern.warning] WARNING:marvell88sx0: error on port 1:Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info]SError interruptDec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] EDMAself disabledDec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info]command request queue parity error

Dec 26 17:53:00 FServe marvell88sx: [ID 131198 kern.info]       SErrors:

Dec 26 17:53:00 FServe marvell88sx: [ID 517869kern.info] Recovered communication errorDec 26 17:53:00 FServe marvell88sx: [ID 517869kern.info] PHY ready changeDec 26 17:53:00 FServe marvell88sx: [ID 517869kern.info] 10-bit to 8-bit decode errorDec 26 17:53:00 FServe marvell88sx: [ID 517869kern.info] Disparity error

This happened right before a system hang. I have this other strangeproblem where if I send certain files over the network (CIFS or NFS),the machine slows to a crawl until it is "hung". This isreproducible every time with the same "special" files, but it doesnot happen locally, only over the network. I already posted aboutthis in network-discuss and am currently investigating the issue.

Additionally, you can look at the corefile using mdb and take alook at the vdev error stats. Here's an example (hopefully theformatting doesn't get messed up):

Excellent information, thanks! It looks like there are no read/write/chksum errors.

I now at least have a way of checking the scrub results until thepanic is fixed (hopefully someday).



Siegfried

> ::spa -v
ADDR                 STATE NAME
0000060004473680    ACTIVE test

    ADDR             STATE     AUX          DESCRIPTION
    0000060004bcb500 HEALTHY   -            root
    0000060004bcafc0 HEALTHY   -              /dev/dsk/c0t2d0s0

> 0000060004bcb500::vdev -re
ADDR             STATE     AUX          DESCRIPTION
0000060004bcb500 HEALTHY   -            root
READ WRITE FREE CLAIMIOCTLOPS 0 0 00 0BYTES 0 0 00 0
    EREAD         0
    EWRITE        0
    ECKSUM        0

0000060004bcafc0 HEALTHY   -              /dev/dsk/c0t2d0s0
READ WRITE FREE CLAIMIOCTLOPS 0x17 0x1d2 00 0BYTES 0x19c000 0x11da00 00 0
    EREAD         0
    EWRITE        0
    ECKSUM        0

This will show you and read/write/cksum errors.

Thanks,
George


Siegfried Nikolaivich wrote:
Hello All,
I am wondering if there is a way to save the scrub results rightbefore the scrub is complete.After upgrading to Solaris 10U3 I still have ZFS panicing right asthe scrub completes. The scrub results seem to be "cleared" whensystem boots back up, so I never get a chance to see them.
Does anyone know of a simple way?
  This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Saving scrub results before scrub completes

Reply via email to