On 27-Dec-06, at 9:45 PM, George Wilson wrote:

Siegfried,

Can you provide the panic string that you are seeing? We should be able to pull out the persistent error log information from the corefile. You can take a look at spa_get_errlog() function as a starting point.


This is the panic string that I am seeing:

Dec 26 18:55:51 FServe unix: [ID 836849 kern.notice]
Dec 26 18:55:51 FServe ^Mpanic[cpu1]/thread=fffffe8000929c80:
Dec 26 18:55:51 FServe genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe8000929980 addr=ffffff00b3e621f0
Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]
Dec 26 18:55:51 FServe unix: [ID 839527 kern.notice] sched:
Dec 26 18:55:51 FServe unix: [ID 753105 kern.notice] #pf Page fault
Dec 26 18:55:51 FServe unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff00b3e621f0 Dec 26 18:55:51 FServe unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffff3eaa2b0, sp=0xfffffe8000929a78, eflags=0x10282 Dec 26 18:55:51 FServe unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse> Dec 26 18:55:51 FServe unix: [ID 354241 kern.notice] cr2: ffffff00b3e621f0 cr3: a3ec000 cr8: c Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rdi: fffffe80dd69ad40 rsi: ffffff00b3e62040 rdx: 0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rcx: ffffffff9c6bd6ce r8: 1 r9: ffffffff Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rax: ffffff00b3e62208 rbx: ffffff00b3e62040 rbp: fffffe8000929ab0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r10: ffffffff982421c8 r11: 1 r12: ffffff00b3e62208 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r13: ffffffff81204468 r14: 1c8 r15: fffffe80dd69ad40 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] fsb: ffffffff80000000 gsb: ffffffff80f1d000 ds: 43 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] es: 43 fs: 0 gs: 1c3 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffff3eaa2b0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] cs: 28 rfl: 10282 rsp: fffffe8000929a78 Dec 26 18:55:51 FServe unix: [ID 266532 kern.notice] ss: 30
Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]
Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929890 unix:real_mode_end+6ad1 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929970 unix:trap+d77 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929980 unix:cmntrap+13f () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ab0 zfs:vdev_queue_offset_compare+0 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ae0 genunix:avl_add+1f () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929b60 zfs:vdev_queue_io_to_issue+1ec () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ba0 zfs:zfsctl_ops_root+33bc48b1 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929bc0 zfs:vdev_disk_io_done+11 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929bd0 zfs:vdev_io_done+12 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929be0 zfs:zio_vdev_io_done+1b () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929c60 genunix:taskq_thread+bc () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929c70 unix:thread_start+8 ()
Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice]
Dec 26 18:55:51 FServe genunix: [ID 672855 kern.notice] syncing file systems...
Dec 26 18:55:51 FServe genunix: [ID 733762 kern.notice]  3
Dec 26 18:55:52 FServe genunix: [ID 904073 kern.notice]  done
Dec 26 18:55:53 FServe genunix: [ID 111219 kern.notice] dumping to / dev/dsk/c1d0s1, offset 1719074816, content: kernel


Additionally, but perhaps not related, I came across this while looking at the logs:

Dec 26 17:53:00 FServe marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx0: error on port 1: Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] SError interrupt Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] EDMA self disabled Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] command request queue parity error
Dec 26 17:53:00 FServe marvell88sx: [ID 131198 kern.info]       SErrors:
Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] Recovered communication error Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] PHY ready change Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] Disparity error

This happened right before a system hang. I have this other strange problem where if I send certain files over the network (CIFS or NFS), the machine slows to a crawl until it is "hung". This is reproducible every time with the same "special" files, but it does not happen locally, only over the network. I already posted about this in network-discuss and am currently investigating the issue.


Additionally, you can look at the corefile using mdb and take a look at the vdev error stats. Here's an example (hopefully the formatting doesn't get messed up):


Excellent information, thanks! It looks like there are no read/write/ chksum errors.

I now at least have a way of checking the scrub results until the panic is fixed (hopefully someday).


Siegfried



> ::spa -v
ADDR                 STATE NAME
0000060004473680    ACTIVE test

    ADDR             STATE     AUX          DESCRIPTION
    0000060004bcb500 HEALTHY   -            root
    0000060004bcafc0 HEALTHY   -              /dev/dsk/c0t2d0s0

> 0000060004bcb500::vdev -re
ADDR             STATE     AUX          DESCRIPTION
0000060004bcb500 HEALTHY   -            root

READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0
    EREAD         0
    EWRITE        0
    ECKSUM        0

0000060004bcafc0 HEALTHY   -              /dev/dsk/c0t2d0s0

READ WRITE FREE CLAIM IOCTL OPS 0x17 0x1d2 0 0 0 BYTES 0x19c000 0x11da00 0 0 0
    EREAD         0
    EWRITE        0
    ECKSUM        0

This will show you and read/write/cksum errors.

Thanks,
George


Siegfried Nikolaivich wrote:
Hello All,
I am wondering if there is a way to save the scrub results right before the scrub is complete. After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes. The scrub results seem to be "cleared" when system boots back up, so I never get a chance to see them.
Does anyone know of a simple way?
  This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to