Looks like flaky or broken hardware to me.  It could be a
power supply issue, those tend to rear their ugly head when
workloads get heavy and they are usually the easiest to
replace.
 -- richard

Kent Watsen wrote:
>
>
> Below I create zpools isolating one card at a time
>   - when just card#1 - it works
>   - when just card #2 - it fails
>   - when just card #3 - it works
>
> And then again using the two cards that seem to work:
>   - when cards #1 and #3 - it fails
>
> So, at first I thought I narrowed it down to a card, but my last test 
> shows that it still fails when the zpool uses two cards that succeed 
> individually...
>
> The only thing I can think to point out here is that those two cards 
> on on different buses - one connected to a NECuPD720400 and the other 
> connected to a AIC-7902, which itself is then connected to the 
> NECuPD720400
>
> Any ideas?
>
> Thanks,
> Kent
>
>
>
>
>
> OK, doing it again using just card #1 (i.e. "c3") works!
>
>     # zpool destroy tank
>     # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0
>     # cp -r /usr /tank/usr
>     cp: cycle detected: /usr/ccs/lib/link_audit/32
>     cp: cannot access /usr/lib/amd64/libdbus-1.so.2
>
>
> Doing it again using just card #2 (i.e. "c4") still fails:
>
>     # zpool destroy tank
>     # zpool create tank raidz2 c4t0d0 c4t4d0 c4t1d0 c4t5d0   
>     # cp -r /usr /tank/usr
>     cp: cycle detected: /usr/ccs/lib/link_audit/32
>     cp: cannot access /usr/lib/amd64/libdbus-1.so.2
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>     WARNING: marvell88sx1: error on port 1:
>             ATA UDMA data parity error
>
>     SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
>     EVENT-TIME: 0x478f6148.0x376ebd4b (0xbf8f86652d)
>     PLATFORM: i86pc, CSN: -, HOSTNAME: san
>     SOURCE: SunOS, REV: 5.11 snv_78
>     DESC: Errors have been detected that require a reboot to ensure system
>     integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more
>     information.
>     AUTO-RESPONSE: Solaris will attempt to save and diagnose the error
>     telemetry
>     IMPACT: The system will sync files, save a crash dump if needed,
>     and reboot
>     REC-ACTION: Save the error summary below in case telemetry cannot
>     be saved
>
>
>     panic[cpu3]/thread=ffffff000f7bcc80: pcie_pci-0: PCI(-X) Express
>     Fatal Error
>
>     ffffff000f7bcbc0 pcie_pci:pepb_err_msi_intr+d2 ()
>     ffffff000f7bcc20 unix:av_dispatch_autovect+78 ()
>     ffffff000f7bcc60 unix:dispatch_hardint+2f ()
>     ffffff000f786ac0 unix:switch_sp_and_call+13 ()
>     ffffff000f786b10 unix:do_interrupt+a0 ()
>     ffffff000f786b20 unix:cmnint+ba ()
>     ffffff000f786c10 unix:mach_cpu_idle+b ()
>     ffffff000f786c40 unix:cpu_idle+c8 ()
>     ffffff000f786c60 unix:idle+10e ()
>     ffffff000f786c70 unix:thread_start+8 ()
>
>     syncing file systems... done
>     ereport.io.pciex.rc.fe-msg ena=bf8f828ea700c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=800007c
>     source-id=200
>      source-valid=1
>
>     ereport.io.pciex.rc.mue-msg ena=bf8f828ea700c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=800007c
>
>     ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
>     pci-bdg-ctrl=3
>
>     ereport.io.pci.sec-ma ena=bf8f828ea700c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
>     pci-bdg-ctrl=3
>
>     ereport.io.pciex.bdg.sec-perr ena=bf8f828ea700c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
>     sue-status=1800
>      source-id=200 source-valid=1
>
>     ereport.io.pciex.bdg.sec-serr ena=bf8f828ea700c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
>     sue-status=1800
>
>     ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED]" ]
>     pci-sec-status=6420
>      pci-bdg-ctrl=7
>
>     dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel
>     NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]:
>      port 0: device reset
>
>     100% done:
>
>
> And doing it again using just card #3 (i.e. "c5") works!
>
>     # zpool destroy tank
>     cannot open 'tank': no such pool                                 
>              (interesting)
>     # zpool create tank raidz2 c5t0d0 c5t4d0 c5t1d0 c5t5d0   
>     # cp -r /usr /tank/usr
>
>
>
>
> And doing it again using cards #1 and #3 (i.e. "c3" and "c5") fails!
>
>     # zpool destroy tank
>     # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 raidz2
>     c5t0d0 c5t4d0 c5t1d0 c5t5d0
>     # cp -r /usr /tank/usr
>     cp: cycle detected: /usr/ccs/lib/link_audit/32
>     cp: cannot access /usr/lib/amd64/libdbus-1.so.2
>     WARNING: marvell88sx2: error on port 4:
>             ATA UDMA data parity error
>     WARNING: marvell88sx2: error on port 4:
>             ATA UDMA data parity error
>     WARNING: marvell88sx2: error on port 4:
>             ATA UDMA data parity error
>     WARNING: marvell88sx2: error on port 4:
>             ATA UDMA data parity error
>     WARNING: marvell88sx2: error on port 4:
>             ATA UDMA data parity error
>     WARNING: marvell88sx2: error on port 4:
>
>     SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
>     EVENT-TIME: 0x478f6307.0x20c8668b (0x643e118fd4)
>     PLATFORM: i86pc, CSN: -, HOSTNAME: san
>     SOURCE: SunOS, REV: 5.11 snv_78
>     DESC: Errors have been detected that require a reboot to ensure system
>     integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more
>     information.
>     AUTO-RESPONSE: Solaris will attempt to save and diagnose the error
>     telemetry
>     IMPACT: The system will sync files, save a crash dump if needed,
>     and reboot
>     REC-ACTION: Save the error summary below in case telemetry cannot
>     be saved
>
>
>     panic[cpu3]/thread=ffffff000f7c2c80: pcie_pci-0: PCI(-X) Express
>     Fatal Error
>
>     ffffff000f7c2bc0 pcie_pci:pepb_err_msi_intr+d2 ()
>     ffffff000f7c2c20 unix:av_dispatch_autovect+78 ()
>     ffffff000f7c2c60 unix:dispatch_hardint+2f ()
>     ffffff000f78cac0 unix:switch_sp_and_call+13 ()
>     ffffff000f78cb10 unix:do_interrupt+a0 ()
>     ffffff000f78cb20 unix:cmnint+ba ()
>     ffffff000f78cc10 unix:mach_cpu_idle+b ()
>     ffffff000f78cc40 unix:cpu_idle+c8 ()
>     ffffff000f78cc60 unix:idle+10e ()
>     ffffff000f78cc70 unix:thread_start+8 ()
>
>     syncing file systems... done
>     ereport.io.pciex.rc.fe-msg ena=643e0d446400c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=800007c
>     source-id=201
>      source-valid=1
>
>     ereport.io.pciex.rc.mue-msg ena=643e0d446400c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> rc-status=800007c
>
>     ereport.io.pci.sec-rserr ena=643e0d446400c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
>     pci-bdg-ctrl=3
>
>     ereport.io.pci.sec-ma ena=643e0d446400c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] 
> pci-sec-status=6000
>     pci-bdg-ctrl=3
>
>     ereport.io.pciex.bdg.sec-perr ena=643e0d446400c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED],1" ]
>     sue-status=1800
>      source-id=201 source-valid=1
>
>     ereport.io.pciex.bdg.sec-serr ena=643e0d446400c01 detector=[
>     version=0 scheme=
>      "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED],1" ]
>     sue-status=1800
>
>     ereport.io.pci.sec-rserr ena=643e0d446400c01 detector=[ version=0
>     scheme="dev"
>      device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL 
> PROTECTED]/pci1033,[EMAIL PROTECTED],1" ]
>     pci-sec-status=6420
>      pci-bdg-ctrl=7
>
>     dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel
>     NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]:
>      port 0: device reset
>
>     100% done: 178114 pages dumped, compression ratio 2.44, dump succeeded
>     rebooting...
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to