Hi all, Since Google can be your friend and this good article at http://cuddletech.com/blog/pivot/entry.php?id=965, by Ben Rockwood, i have new informations, and hopefully someone might be able to see something interesting in this. So based on what i can understand a thread ffffff001f7f3c60 running in cpu 4 caused a panic (freeing a free IOMMU page: paddr=0xccca2000) and this thread belongs to a process called zpool-TEST .
Now things gets more "dark" to me, since the pools available in the system are : zpool list (filtered info) NAME RAID10 RAIDZ2 rpool So, i have no zpool called TEST, however i had it on the past, and basically i exported the zpool and imported it with a different name. 2010-02-23.08:33:47 zpool export TEST 2010-02-23.08:34:05 zpool import TEST RAID10 Now..can this rename thing lead to this type or errors, or i'm completely wrong? Thanks in advance for all your time, Bruno Detailed info : mdb -k unix.0 vmcore.0 mdb: warning: dump is from SunOS 5.11 snv_132; dcmds and macros may not match kernel implementation Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd sockfs ip hook neti sctp arp usba uhci fctl stmf md lofs idm nfs random sppp fcip cpc crypto logindmux ptm nsctl ufs ipc ] ::status debugging crash dump vmcore.0 (64-bit) from san01 operating system: 5.11 snv_132 (i86pc) panic message: Freeing a free IOMMU page: paddr=0xccca2000 dump content: kernel pages only ::stack vpanic() iommu_page_free+0xcb(ffffff04e3da5000, ccca2000) iommu_free_page+0x15(ffffff04e3da5000, ccca2000) iommu_setup_level_table+0xa0(ffffff054406d000, ffffff0543b99000, 8) iommu_setup_page_table+0xa0(ffffff054406d000, 100c000) iommu_map_page_range+0x6a(ffffff054406d000, 100c000, 3c2329000, 3c2329000, 2) iommu_map_dvma+0x50(ffffff054406d000, 100c000, 3c2329000, 1000, ffffff001f7f31d0) intel_iommu_map_sgl+0x22f(ffffff0553b43e00, ffffff001f7f31d0, 41) rootnex_coredma_bindhdl+0x11e(ffffff04e3ef5cb0, ffffff04e607f540, ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8) rootnex_dma_bindhdl+0x36(ffffff04e3ef5cb0, ffffff04e607f540, ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8) ddi_dma_buf_bind_handle+0x117(ffffff0553b43e00, ffffff055860cd00, a, 0, 0, ffffff0553efdc50) scsi_dma_buf_bind_attr+0x48(ffffff0553efdb90, ffffff055860cd00, a, 0, 0) scsi_init_cache_pkt+0x2d0(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0) scsi_init_pkt+0x5c(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0) vhci_bind_transport+0x54d(ffffff0543191c58, ffffff055d2f8968, 40000, 0) vhci_scsi_init_pkt+0x160(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0) scsi_init_pkt+0x5c(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0) sd_setup_rw_pkt+0x12a(ffffff0543b9d080, ffffff001f7f3688, ffffff055860cd00, 40000, fffffffff7a91b80, ffffff0543b9d080) sd_initpkt_for_buf+0xad(ffffff055860cd00, ffffff001f7f36f8) sd_start_cmds+0x197(ffffff0543b9d080, 0) sd_core_iostart+0x186(4, ffffff0543b9d080, ffffff055860cd00) sd_mapblockaddr_iostart+0x306(3, ffffff0543b9d080, ffffff055860cd00) sd_xbuf_strategy+0x50(ffffff055860cd00, ffffff0544cf0a00, ffffff0543b9d080) xbuf_iostart+0x1e5(ffffff04f21cce80) ddi_xbuf_qstrategy+0xd3(ffffff055860cd00, ffffff04f21cce80) sdstrategy+0x101(ffffff055860cd00) bdev_strategy+0x75(ffffff055860cd00) ldi_strategy+0x59(ffffff04f29a4df8, ffffff055860cd00) vdev_disk_io_start+0xd0(ffffff055c2379a0) zio_vdev_io_start+0x17d(ffffff055c2379a0) zio_execute+0x8d(ffffff055c2379a0) vdev_queue_io_done+0x92(ffffff055c2fe680) zio_vdev_io_done+0x62(ffffff055c2fe680) zio_execute+0x8d(ffffff055c2fe680) taskq_thread+0x248(ffffff0543a086a0) thread_start+8() ::msgbuf panic[cpu4]/thread=ffffff001f7f3c60: Freeing a free IOMMU page: paddr=0xccca2000 ffffff001f7f2e90 rootnex:iommu_page_free+cb () ffffff001f7f2eb0 rootnex:iommu_free_page+15 () ffffff001f7f2f10 rootnex:iommu_setup_level_table+a0 () ffffff001f7f2f50 rootnex:iommu_setup_page_table+a0 () ffffff001f7f2fd0 rootnex:iommu_map_page_range+6a () ffffff001f7f3020 rootnex:iommu_map_dvma+50 () ffffff001f7f30e0 rootnex:intel_iommu_map_sgl+22f () ffffff001f7f3180 rootnex:rootnex_coredma_bindhdl+11e () ffffff001f7f31c0 rootnex:rootnex_dma_bindhdl+36 () ffffff001f7f3260 genunix:ddi_dma_buf_bind_handle+117 () ffffff001f7f32c0 scsi:scsi_dma_buf_bind_attr+48 () ffffff001f7f3350 scsi:scsi_init_cache_pkt+2d0 () ffffff001f7f33d0 scsi:scsi_init_pkt+5c () ffffff001f7f3480 scsi_vhci:vhci_bind_transport+54d () ffffff001f7f3500 scsi_vhci:vhci_scsi_init_pkt+160 () ffffff001f7f3580 scsi:scsi_init_pkt+5c () ffffff001f7f3660 sd:sd_setup_rw_pkt+12a () ffffff001f7f36d0 sd:sd_initpkt_for_buf+ad () ffffff001f7f3740 sd:sd_start_cmds+197 () ::panicinfo cpu 4 thread ffffff001f7f3c60 message Freeing a free IOMMU page: paddr=0xccca2000 rdi fffffffff78ede80 rsi ffffff001f7f2e10 rdx ccca2000 rcx 1 r8 ffffff001f7f2d60 r9 ffffff001f7f2e60 rax 0 rbx 3 rbp ffffff001f7f2e50 r10 ffffff0561edd000 r10 ffffff0561edd000 r11 ffffff0000003000 r12 fffffffff78ede80 r13 ffffff04e3da5000 r14 0 r15 ccca2000 fsbase 0 gsbase ffffff04f32e0000 ds 4b es 4b fs 0 gs 1c3 trapno 0 err 0 rip fffffffffb862550 cs 30 rflags 246 rsp ffffff001f7f2d58 ss 38 gdt_hi 0 gdt_lo b00001ef idt_hi 0 idt_lo 20000fff ldt 0 task 70 cr0 8005003b cr2 fe6e971b cr3 4000000 cr4 6f8 ::cpuinfo -v 0 fffffffffbc2f9e0 1f 1 0 -1 no no t-0 ffffff001e805c60 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 ffffff00202a2c60 sched QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 ffffff04f32e8040 1f 0 0 99 no no t-0 ffffff001fbadc60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 2 ffffff04f32e6b00 1f 0 0 99 no no t-0 ffffff001fbc5c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 3 ffffff04f32e1500 1f 1 0 -1 no no t-0 ffffff001f0e3c60 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 ffffff001e985c60 sched QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 4 fffffffffbc3a000 1b 0 0 99 no no t-0 ffffff001f7f3c60 zpool-TEST | RUNNING <--+ READY EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 5 ffffff04f32dcac0 1f 0 0 99 no no t-0 ffffff001f7d5c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 6 ffffff04f3897b00 1f 0 0 104 no no t-0 ffffff001f413c60 sched | | RUNNING <--+ +--> PIL THREAD READY 5 ffffff001f413c60 QUIESCED - ffffff001ff99c60 sched EXISTS ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 7 ffffff04f3894500 1f 0 0 99 no no t-0 ffffff001f7e1c60 zpool-TEST | RUNNING <--+ READY QUIESCED EXISTS ENABLE On 13-4-2010 11:42, Bruno Sousa wrote: > Hi all, > > Recently one of the servers , a Dell R710, attached to 2 J4400 started > to crash quite often. > Finally i got a message in /var/adm/messages that might point to > something usefull, but i don't have the expertise to start to > troubleshooting this problem, so any help would be highly valuable. > > Best regards, > Bruno > > > The significant messages are : > > Apr 13 11:12:04 san01 savecore: [ID 570001 auth.error] reboot after > panic: Freeing a free IOMMU page: paddr=0xccca2000 > Apr 13 11:12:04 san01 savecore: [ID 385089 auth.error] Saving compressed > system crash dump in /var/crash/san01/vmdump.0 > > I also noticed other "interesting" messages like : > > Apr 13 11:11:10 san01 unix: [ID 378719 kern.info] NOTICE: cpu_acpi: _PSS > package evaluation failed for with status 5 for CPU 0. > Apr 13 11:11:10 san01 unix: [ID 388705 kern.info] NOTICE: cpu_acpi: > error parsing _PSS for CPU 0 > Apr 13 11:11:10 san01 unix: [ID 928200 kern.info] NOTICE: SpeedStep > support is being disabled due to errors parsing ACPI P-state objects > exported by BIOS > > Apr 13 11:10:50 san01 scsi: [ID 243001 kern.info] > /p...@0,0/pci8086,3...@4/pci1028,1...@0 (mpt0): > Apr 13 11:10:50 san01 DMA restricted below 4GB boundary due to errata > > Apr 13 11:11:32 san01 scsi: [ID 243001 kern.info] > /p...@0,0/pci8086,3...@9/pci1000,3...@0 (mpt2): > Apr 13 11:11:32 san01 DMA restricted below 4GB boundary due to errata > > > > Relevant specs of the machine : > > SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris > > rpool boot drives attached to a Dell SAS6/iR Integrated RAID Controller > (mpt0 Firmware version v0.25.47.0 (IR) ) > 2 HBA LSI 1068E, each connect to a J4400 jbod (mpt1 Firmware version > v1.26.0.0 (IT) ) > > multipath enabled and working > > 2 Quad-Cores, 16Gb ram > > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss