Well, ok, the msi=0 thing didn't help after all.  A few minutes after my last 
message a few errors showed
up in iostat, and then in a few minutes more the machine was locked up hard...  
Maybe I will try just
doing a scrub instead of my rsync process and see how that does.

Chad


On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
> I don't think the hardware has any problems, it only started having errors 
> when I upgraded OpenSolaris.
> It's still working fine again now after a reboot.  Actually, I reread one of 
> your earlier messages,
> and I didn't realize at first when you said "non-Sun JBOD" that this didn't 
> apply to me (in regards to
> the msi=0 fix) because I didn't realize JBOD was shorthand for an external 
> expander device.  Since
> I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
> should apply to me based on
> what you wrote earlier, anyway I've put 
>       set mpt:mpt_enable_msi = 0
> now in /etc/system and rebooted as it was suggested earlier.  I've resumed my 
> rsync, and so far there
> have been no errors, but it's only been 20 minutes or so.  I should have a 
> good idea by tomorrow if this
> definitely fixed the problem (since even when the machine was not crashing it 
> was tallying up iostat errors
> fairly rapidly)
> 
> Thanks again for your help.  Sorry for wasting your time if the previously 
> posted workaround fixes things.
> I'll let you know tomorrow either way.
> 
> Chad
> 
> On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
> > Chad Cantwell wrote:
> > >After another crash I checked the syslog and there were some different 
> > >errors than the ones
> > >I saw previously during operation:
> > ...
> > 
> > >Nov 30 20:59:13 the-vault       LSI PCI device (1000,ffff) not supported.
> > ...
> > >Nov 30 20:59:13 the-vault       mpt_config_space_init failed
> > ...
> > >Nov 30 20:59:15 the-vault       mpt_restart_ioc failed
> > ....
> > 
> > >Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
> > >PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
> > >Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
> > >Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
> > >System-Serial-Number, HOSTNAME: the-vault
> > >Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
> > >Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63
> > >Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid 
> > >request.
> > >Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R for 
> > >more information.
> > >Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may 
> > >be disabled
> > >Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device 
> > >instances associated with this fault
> > >Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and 
> > >patches are installed. Otherwise schedule a repair procedure to replace 
> > >the affected device(s).  Us
> > >e fmadm faulty to identify the devices or contact Sun for support.
> > 
> > 
> > Sorry to have to tell you, but that HBA is dead. Or at
> > least dying horribly. If you can't init the config space
> > (that's the pci bus config space), then you've got about
> > 1/2 the nails in the coffin hammered in. Then the failure
> > to restart the IOC (io controller unit) == the rest of
> > the lid hammered down.
> > 
> > 
> > best regards,
> > James C. McPherson
> > --
> > Senior Kernel Software Engineer, Solaris
> > Sun Microsystems
> > http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to