Well, ok, the msi=0 thing didn't help after all. A few minutes after my last message a few errors showed up in iostat, and then in a few minutes more the machine was locked up hard... Maybe I will try just doing a scrub instead of my rsync process and see how that does.
Chad On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote: > I don't think the hardware has any problems, it only started having errors > when I upgraded OpenSolaris. > It's still working fine again now after a reboot. Actually, I reread one of > your earlier messages, > and I didn't realize at first when you said "non-Sun JBOD" that this didn't > apply to me (in regards to > the msi=0 fix) because I didn't realize JBOD was shorthand for an external > expander device. Since > I'm just using baremetal, and passive backplanes, I think the msi=0 fix > should apply to me based on > what you wrote earlier, anyway I've put > set mpt:mpt_enable_msi = 0 > now in /etc/system and rebooted as it was suggested earlier. I've resumed my > rsync, and so far there > have been no errors, but it's only been 20 minutes or so. I should have a > good idea by tomorrow if this > definitely fixed the problem (since even when the machine was not crashing it > was tallying up iostat errors > fairly rapidly) > > Thanks again for your help. Sorry for wasting your time if the previously > posted workaround fixes things. > I'll let you know tomorrow either way. > > Chad > > On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: > > Chad Cantwell wrote: > > >After another crash I checked the syslog and there were some different > > >errors than the ones > > >I saw previously during operation: > > ... > > > > >Nov 30 20:59:13 the-vault LSI PCI device (1000,ffff) not supported. > > ... > > >Nov 30 20:59:13 the-vault mpt_config_space_init failed > > ... > > >Nov 30 20:59:15 the-vault mpt_restart_ioc failed > > .... > > > > >Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: > > >PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major > > >Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 > > >Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: > > >System-Serial-Number, HOSTNAME: the-vault > > >Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 > > >Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 > > >Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid > > >request. > > >Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R for > > >more information. > > >Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may > > >be disabled > > >Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device > > >instances associated with this fault > > >Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and > > >patches are installed. Otherwise schedule a repair procedure to replace > > >the affected device(s). Us > > >e fmadm faulty to identify the devices or contact Sun for support. > > > > > > Sorry to have to tell you, but that HBA is dead. Or at > > least dying horribly. If you can't init the config space > > (that's the pci bus config space), then you've got about > > 1/2 the nails in the coffin hammered in. Then the failure > > to restart the IOC (io controller unit) == the rest of > > the lid hammered down. > > > > > > best regards, > > James C. McPherson > > -- > > Senior Kernel Software Engineer, Solaris > > Sun Microsystems > > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss