Looks like flaky or broken hardware to me. It could be a power supply issue, those tend to rear their ugly head when workloads get heavy and they are usually the easiest to replace. -- richard
Kent Watsen wrote: > > > Below I create zpools isolating one card at a time > - when just card#1 - it works > - when just card #2 - it fails > - when just card #3 - it works > > And then again using the two cards that seem to work: > - when cards #1 and #3 - it fails > > So, at first I thought I narrowed it down to a card, but my last test > shows that it still fails when the zpool uses two cards that succeed > individually... > > The only thing I can think to point out here is that those two cards > on on different buses - one connected to a NECuPD720400 and the other > connected to a AIC-7902, which itself is then connected to the > NECuPD720400 > > Any ideas? > > Thanks, > Kent > > > > > > OK, doing it again using just card #1 (i.e. "c3") works! > > # zpool destroy tank > # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 > # cp -r /usr /tank/usr > cp: cycle detected: /usr/ccs/lib/link_audit/32 > cp: cannot access /usr/lib/amd64/libdbus-1.so.2 > > > Doing it again using just card #2 (i.e. "c4") still fails: > > # zpool destroy tank > # zpool create tank raidz2 c4t0d0 c4t4d0 c4t1d0 c4t5d0 > # cp -r /usr /tank/usr > cp: cycle detected: /usr/ccs/lib/link_audit/32 > cp: cannot access /usr/lib/amd64/libdbus-1.so.2 > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > WARNING: marvell88sx1: error on port 1: > ATA UDMA data parity error > > SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major > EVENT-TIME: 0x478f6148.0x376ebd4b (0xbf8f86652d) > PLATFORM: i86pc, CSN: -, HOSTNAME: san > SOURCE: SunOS, REV: 5.11 snv_78 > DESC: Errors have been detected that require a reboot to ensure system > integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more > information. > AUTO-RESPONSE: Solaris will attempt to save and diagnose the error > telemetry > IMPACT: The system will sync files, save a crash dump if needed, > and reboot > REC-ACTION: Save the error summary below in case telemetry cannot > be saved > > > panic[cpu3]/thread=ffffff000f7bcc80: pcie_pci-0: PCI(-X) Express > Fatal Error > > ffffff000f7bcbc0 pcie_pci:pepb_err_msi_intr+d2 () > ffffff000f7bcc20 unix:av_dispatch_autovect+78 () > ffffff000f7bcc60 unix:dispatch_hardint+2f () > ffffff000f786ac0 unix:switch_sp_and_call+13 () > ffffff000f786b10 unix:do_interrupt+a0 () > ffffff000f786b20 unix:cmnint+ba () > ffffff000f786c10 unix:mach_cpu_idle+b () > ffffff000f786c40 unix:cpu_idle+c8 () > ffffff000f786c60 unix:idle+10e () > ffffff000f786c70 unix:thread_start+8 () > > syncing file systems... done > ereport.io.pciex.rc.fe-msg ena=bf8f828ea700c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > rc-status=800007c > source-id=200 > source-valid=1 > > ereport.io.pciex.rc.mue-msg ena=bf8f828ea700c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > rc-status=800007c > > ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > pci-sec-status=6000 > pci-bdg-ctrl=3 > > ereport.io.pci.sec-ma ena=bf8f828ea700c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > pci-sec-status=6000 > pci-bdg-ctrl=3 > > ereport.io.pciex.bdg.sec-perr ena=bf8f828ea700c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED]" ] > sue-status=1800 > source-id=200 source-valid=1 > > ereport.io.pciex.bdg.sec-serr ena=bf8f828ea700c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED]" ] > sue-status=1800 > > ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED]" ] > pci-sec-status=6420 > pci-bdg-ctrl=7 > > dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel > NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]: > port 0: device reset > > 100% done: > > > And doing it again using just card #3 (i.e. "c5") works! > > # zpool destroy tank > cannot open 'tank': no such pool > (interesting) > # zpool create tank raidz2 c5t0d0 c5t4d0 c5t1d0 c5t5d0 > # cp -r /usr /tank/usr > > > > > And doing it again using cards #1 and #3 (i.e. "c3" and "c5") fails! > > # zpool destroy tank > # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 raidz2 > c5t0d0 c5t4d0 c5t1d0 c5t5d0 > # cp -r /usr /tank/usr > cp: cycle detected: /usr/ccs/lib/link_audit/32 > cp: cannot access /usr/lib/amd64/libdbus-1.so.2 > WARNING: marvell88sx2: error on port 4: > ATA UDMA data parity error > WARNING: marvell88sx2: error on port 4: > ATA UDMA data parity error > WARNING: marvell88sx2: error on port 4: > ATA UDMA data parity error > WARNING: marvell88sx2: error on port 4: > ATA UDMA data parity error > WARNING: marvell88sx2: error on port 4: > ATA UDMA data parity error > WARNING: marvell88sx2: error on port 4: > > SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major > EVENT-TIME: 0x478f6307.0x20c8668b (0x643e118fd4) > PLATFORM: i86pc, CSN: -, HOSTNAME: san > SOURCE: SunOS, REV: 5.11 snv_78 > DESC: Errors have been detected that require a reboot to ensure system > integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more > information. > AUTO-RESPONSE: Solaris will attempt to save and diagnose the error > telemetry > IMPACT: The system will sync files, save a crash dump if needed, > and reboot > REC-ACTION: Save the error summary below in case telemetry cannot > be saved > > > panic[cpu3]/thread=ffffff000f7c2c80: pcie_pci-0: PCI(-X) Express > Fatal Error > > ffffff000f7c2bc0 pcie_pci:pepb_err_msi_intr+d2 () > ffffff000f7c2c20 unix:av_dispatch_autovect+78 () > ffffff000f7c2c60 unix:dispatch_hardint+2f () > ffffff000f78cac0 unix:switch_sp_and_call+13 () > ffffff000f78cb10 unix:do_interrupt+a0 () > ffffff000f78cb20 unix:cmnint+ba () > ffffff000f78cc10 unix:mach_cpu_idle+b () > ffffff000f78cc40 unix:cpu_idle+c8 () > ffffff000f78cc60 unix:idle+10e () > ffffff000f78cc70 unix:thread_start+8 () > > syncing file systems... done > ereport.io.pciex.rc.fe-msg ena=643e0d446400c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > rc-status=800007c > source-id=201 > source-valid=1 > > ereport.io.pciex.rc.mue-msg ena=643e0d446400c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > rc-status=800007c > > ereport.io.pci.sec-rserr ena=643e0d446400c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > pci-sec-status=6000 > pci-bdg-ctrl=3 > > ereport.io.pci.sec-ma ena=643e0d446400c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]" ] > pci-sec-status=6000 > pci-bdg-ctrl=3 > > ereport.io.pciex.bdg.sec-perr ena=643e0d446400c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED],1" ] > sue-status=1800 > source-id=201 source-valid=1 > > ereport.io.pciex.bdg.sec-serr ena=643e0d446400c01 detector=[ > version=0 scheme= > "dev" device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED],1" ] > sue-status=1800 > > ereport.io.pci.sec-rserr ena=643e0d446400c01 detector=[ version=0 > scheme="dev" > device-path="/[EMAIL PROTECTED],0/pci10de,[EMAIL > PROTECTED]/pci1033,[EMAIL PROTECTED],1" ] > pci-sec-status=6420 > pci-bdg-ctrl=7 > > dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel > NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]: > port 0: device reset > > 100% done: 178114 pages dumped, compression ratio 2.44, dump succeeded > rebooting... > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss