[sbruno_comment_blocks == 4] > > The identify function in 7.x has no such check: > > static void > ipmi_isa_identify(driver_t *driver, device_t parent) > { > struct ipmi_get_info info; > uint32_t devid; > > if (ipmi_smbios_identify(&info) && info.iface_type != SSIF_MODE && > device_find_child(parent, "ipmi", -1) == NULL) {
Ok then what is this ^^^^^^^^^ ? Doesn't this mean that if device_find_child() returns a child node that we should abort? Is that not the same as what I'm going on about? > > The only check in 7 was the one you just moved in ipmi_isa_attach(): > > static int > ipmi_isa_attach(device_t dev) > { > struct ipmi_softc *sc = device_get_softc(dev); > struct ipmi_get_info info; > const char *mode; > int count, error, i, type; > > /* > * Pull info out of the SMBIOS table. If that doesn't work, use > * hints to enumerate a device. > */ > if (!ipmi_smbios_identify(&info) && > !ipmi_hint_identify(dev, &info)) > return (ENXIO); > > /* > * Give other drivers precedence. Unfortunately, this doesn't > * work if we have an SMBIOS table that duplicates a PCI device > * that's later on the bus than the PCI-ISA bridge. > */ > if (ipmi_attached) > return (EBUSY); > ... > } > > > Am I just confused on the bus relationship here? > > > > We've gone over this a couple of times in different emails on different > > lists. I've just never sat down and walked through the code. If you > > see a better way to keep ipmi(4) from erroneously attaching to the ISA > > interface, let me know. > > I haven't seen any mention of this problem before. I've seen threads about > the watchdog issue (trying to set watchdog on shutdown which wants to use > threads, etc.), but not this. > http://lists.freebsd.org/pipermail/freebsd-stable/2012-March/067023.html The thread gets broken apart by the mail list software though. Somewhere in there I point at ipmi1 things. But hell if I can find it anymore. > Also, can you provide the console messages without this patch? The previous > check in ipmi_isa_attach() is long before we touch the BMC or ever get > around to creating /dev/ipmi1. (Just because you see ipmi1: in dmesg doesn't > mean it's created /dev/ipmi1.) > Definitely does *not* create /dev/ipmi1. > > > > Move the check for ipmi_attached out of the ipmi_isa_attach function > > > > and into > > > > the ipmi_isa_identify function. Remove the check of the device tree > > > > for > > > > ipmi devices attached. > > > > > > > > This probing appears to make Broadcom management firmware on Dell > > > > machines > > > > crash and emit NMI EISA warnings at various times requiring power > > > > cycles > > > > of the machines to restore. > > > > > > This makes no sense. All you are doing is skipping ipmi_smbios_identify() > > > which just looks at the SMBIOS table in RAM. It doesn't try to probe the > > > BMC at all (no register accesses, etc.). If just reading a table in > > > memory > > > causes side effects, then running dmidecode in userland should be hosing > > > your > > > machines as well. > > > > > > > Probably right. I'm not exactly sure what is making the Broadcom > > firmware fall over and die. But when I see the console emitting "NMI > > EISA" error messages, this ususally requires a full reboot as the > > network interface has crashed. Hopefully, I can dig up more "fact" soon > > as testing continues. > > I'd rather be sure this is the right fix, and if it isn't I'd prefer to > revert this as I don't think it is actually fixing anything. > It definitely does *not* have the effect that I advertised in my commit message. the commit DOES: -- remove any attempt to do anything in ipmi_isa_* functions. -- does not emit any errors on attach failure (which are noisy and distracting). -- make attaching to ipmi0 more "reliable" by blindly raising the timeout value to 6 seconds. (6 seconds is the totally empirical value I came up with in testing that never failed to attach across 100+ reboots). I disagree that it should be reverted. We can argue about it if you wish and I'm open to modifying back to the original code. I don't think I'd agree with removing the error messages on attachment failure though. I view the attachment failures as "sysadmin noise" but they should be there *if* there is a real attach failure. sean
signature.asc
Description: This is a digitally signed message part