ipmi

Sean Bruno Mon, 29 Jul 2013 15:51:45 -0700

[sbruno_comment_blocks == 4]

> 
> The identify function in 7.x has no such check:
> 
> static void
> ipmi_isa_identify(driver_t *driver, device_t parent)
> {
>       struct ipmi_get_info info;
>       uint32_t devid;
> 
>       if (ipmi_smbios_identify(&info) && info.iface_type != SSIF_MODE &&
>           device_find_child(parent, "ipmi", -1) == NULL) {


Ok then what is this ^^^^^^^^^ ?  Doesn't this mean that if
device_find_child() returns a child node that we should abort?  Is that
not the same as what I'm going on about?

> 
> The only check in 7 was the one you just moved in ipmi_isa_attach():
> 
> static int
> ipmi_isa_attach(device_t dev)
> {
>       struct ipmi_softc *sc = device_get_softc(dev);
>       struct ipmi_get_info info;
>       const char *mode;
>       int count, error, i, type;
> 
>       /*
>        * Pull info out of the SMBIOS table.  If that doesn't work, use
>        * hints to enumerate a device.
>        */
>       if (!ipmi_smbios_identify(&info) &&
>           !ipmi_hint_identify(dev, &info))
>               return (ENXIO);
> 
>       /*
>        * Give other drivers precedence.  Unfortunately, this doesn't
>        * work if we have an SMBIOS table that duplicates a PCI device
>        * that's later on the bus than the PCI-ISA bridge.
>        */
>       if (ipmi_attached)
>               return (EBUSY);
>       ...
> }
> 
> > Am I just confused on the bus relationship here?
> > 
> > We've gone over this a couple of times in different emails on different
> > lists.  I've just never sat down and walked through the code.  If you
> > see a better way to keep ipmi(4) from erroneously attaching to the ISA
> > interface, let me know.
> 
> I haven't seen any mention of this problem before.  I've seen threads about
> the watchdog issue (trying to set watchdog on shutdown which wants to use
> threads, etc.), but not this.
> 

http://lists.freebsd.org/pipermail/freebsd-stable/2012-March/067023.html

The thread gets broken apart by the mail list software though.
Somewhere in there I point at ipmi1 things.  But hell if I can find it
anymore.

> Also, can you provide the console messages without this patch?  The previous
> check in ipmi_isa_attach() is long before we touch the BMC or ever get
> around to creating /dev/ipmi1.  (Just because you see ipmi1: in dmesg doesn't
> mean it's created /dev/ipmi1.)
> 
Definitely does *not* create /dev/ipmi1.

> > > >   Move the check for ipmi_attached out of the ipmi_isa_attach function 
> > > > and into
> > > >   the ipmi_isa_identify function.  Remove the check of the device tree 
> > > > for
> > > >   ipmi devices attached.
> > > >   
> > > >   This probing appears to make Broadcom management firmware on Dell 
> > > > machines
> > > >   crash and emit NMI EISA warnings at various times requiring power 
> > > > cycles
> > > >   of the machines to restore.
> > > 
> > > This makes no sense.  All you are doing is skipping ipmi_smbios_identify()
> > > which just looks at the SMBIOS table in RAM.  It doesn't try to probe the
> > > BMC at all (no register accesses, etc.).  If just reading a table in 
> > > memory
> > > causes side effects, then running dmidecode in userland should be hosing 
> > > your
> > > machines as well.
> > > 
> > 
> > Probably right.  I'm not exactly sure what is making the Broadcom
> > firmware fall over and die.  But when I see the console emitting "NMI
> > EISA" error messages, this ususally requires a full reboot as the
> > network interface has crashed.  Hopefully, I can dig up more "fact" soon
> > as testing continues.
> 
> I'd rather be sure this is the right fix, and if it isn't I'd prefer to
> revert this as I don't think it is actually fixing anything.
> 
It definitely does *not* have the effect that I advertised in my commit
message.

the commit DOES:
-- remove any attempt to do anything in ipmi_isa_* functions.
-- does not emit any errors on attach failure (which are noisy and
   distracting).
-- make attaching to ipmi0 more "reliable" by blindly raising the
   timeout value to 6 seconds.  (6 seconds is the totally empirical
   value I came up with in testing that never failed to attach across
   100+ reboots).

I disagree that it should be reverted.  We can argue about it if you
wish and I'm open to modifying back to the original code.  I don't think
I'd agree with removing the error messages on attachment failure though.
I view the attachment failures as "sysadmin noise" but they should be
there *if* there is a real attach failure.

sean

signature.asc
Description: This is a digitally signed message part

Re: svn commit: r253708 - head/sys/dev/ipmi

Reply via email to