My understanding is that IPMB and KCS are different physical-layer interfaces 
to the BMC, and only one of them is active on any given motherboard. So if the 
kernel is using KCS, then so is userland. This can be grepped out of 
/var/run/dmesg.boot:

    ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
    ipmi0: KCS mode found at io 0xca2 on acpi
    ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device support 
mask 0xbf
    ipmi0: Number of channels 2
    ipmi0: Attached watchdog
    ipmi0: Establishing power cycle handler

-Ravi (rpokala@)

-----Original Message-----
From: Chuck Silvers <c...@chuq.com>
Date: 2022-11-01, Tuesday at 14:55
To: Ravi Pokala <rpok...@freebsd.org>
Cc: <src-committ...@freebsd.org>, <dev-commits-src-all@FreeBSD.org>, 
<dev-commits-src-m...@freebsd.org>
Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests 
when possible

    Hi Ravi,

    the usual path (the IPMI_SYSTEM_INTERFACE_ADDR_TYPE case) for an 
"application" request
    (ie. one from userland) is:

    ipmi_polled_enqueue_request()
    sc->ipmi_enqueue_request()
    ipmi_ioctl()

    which uses the regular (low priority) queue.

    in the non-IPMI_SYSTEM_INTERFACE_ADDR_TYPE case (aka. the IPMB case),
    ipmi_ioctl() does issue two driver requests to set up the IPMB stuff before
    then issuing the real application request with sc->ipmi_enqueue_request() 
as above.
    I don't know much about this IPMB thing, it looks like a kind of 
pass-through
    interface for communicating with another device that sits behind the main
    BMC IPMI controller.

    I suppose that it would make sense to change these two IPMB setup driver 
requests
    to use the regular (low priority) queue (since sleeping is ok in that path),
    but I don't think I have any way to test that since our systems don't use 
this path.

    -Chuck


    On Tue, Nov 01, 2022 at 11:53:57AM -0700, Ravi Pokala wrote:
    > Hi Chuck,
    > 
    > As best as I can tell, the stack for handling a request from userland 
(e.g. `ipmitool') looks like this:
    > 
    > ipmi_polled_enqueue_request_highpri()
    > kcs_driver_request_queue() /* assuming not panicked or dumping core */
    > kcs_driver_request()
    > sc->ipmi_driver_request()
    > ipmi_submit_driver_request()
    > ipmi_ioctl()
    > 
    > And the stack for handling kernel-internal requests (i.e. watchdog) is 
essentially the same, except for calling ipmi_submit_driver_request() directly 
rather than going through ipmi_ioctl(). Specifically, there doesn't seem to be 
anything that would put the two different types of requests on two different 
queues; it looks like everything goes onto the highpri queue.
    > 
    > What am I missing?
    > 
    > Thanks,
    > 
    > Ravi (rpokala@)
    > 
    > -----Original Message-----
    > From: <owner-src-committ...@freebsd.org> on behalf of Chuck Silvers 
<c...@freebsd.org>
    > Date: 2022-11-01, Tuesday at 10:55
    > To: <src-committ...@freebsd.org>, <dev-commits-src-all@FreeBSD.org>, 
<dev-commits-src-m...@freebsd.org>
    > Subject: git: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver 
requests when possible
    > 
    >     The branch main has been updated by chs:
    > 
    >     URL: 
https://cgit.FreeBSD.org/src/commit/?id=f0f3e3e961d311f1cd938f1319385e7f454525f1
    > 
    >     commit f0f3e3e961d311f1cd938f1319385e7f454525f1
    >     Author:     Chuck Silvers <c...@freebsd.org>
    >     AuthorDate: 2022-11-01 17:55:14 +0000
    >     Commit:     Chuck Silvers <c...@freebsd.org>
    >     CommitDate: 2022-11-01 17:55:14 +0000
    > 
    >         ipmi: use a queue for kcs driver requests when possible
    > 
    >         The ipmi watchdog pretimeout action can trigger unintentionally in
    >         certain rare, complicated situations.  What we have seen at 
Netflix
    >         is that the BMC can sometimes be sent a continuous stream of
    >         writes to port 0x80, and due to what is a bug or misconfiguration
    >         in the BMC software, this results in the BMC running out of 
memory,
    >         becoming very slow to respond to KCS requests, and eventually 
being
    >         rebooted by its own internal watchdog.  While that is going on in
    >         the BMC, back in the host OS, a number of requests are pending in
    >         the ipmi request queue, and the kcs_loop thread is working on
    >         processing these requests.  All of the KCS accesses to process
    >         those requests are timing out and eventually failing because the
    >         BMC is responding very slowly or not at all, and the kcs_loop 
thread
    >         is holding the IPMI_IO_LOCK the whole time that is going on.
    >         Meanwhile the watchdogd process in the host is trying to pat the
    >         BMC watchdog, and this process is sleeping waiting to get the
    >         IPMI_IO_LOCK.  It's not entirely clear why the watchdogd process
    >         is sleeping for this lock, because the intention is that a thread
    >         holding the IPMI_IO_LOCK should not sleep and thus any thread
    >         that wants the lock should just spin to wait for it.  My best 
guess
    >         is that the kcs_loop thread is spinning waiting for the BMC to
    >         respond for so long that it is eventually preempted, and during
    >         the brief interval when the kcs_loop thread is not running,
    >         the watchdogd thread notices that the lock holder is not running
    >         and sleeps.  When the kcs_loop thread eventually finishes 
processing
    >         one request, it drops the IPMI_IO_LOCK and then immediately takes 
the
    >         lock again so it can process the next request in the queue.
    >         Because the watchdogd thread is sleeping at this point, the 
kcs_loop
    >         always wins the race to acquire the IPMI_IO_LOCK, thus starving
    >         the watchdogd thread.  The callout for the watchdog pretimeout
    >         would be reset by the watchdogd thread after its request to the 
BMC
    >         watchdog completes, but since that request never processed, the
    >         pretimeout callout eventually fires, even though there is nothing
    >         actually wrong with the host.
    > 
    >         To prevent this saga from unfolding:
    > 
    >          - when kcs_driver_request() is called in a context where it can 
sleep,
    >            queue the request and let the worker thread process it rather 
than
    >            trying to process in the original thread.
    >          - add a new high-priority queue for driver requests, so that the
    >            watchdog patting requests will be processed as quickly as 
possible
    >            even if lots of application requests have already been queued.
    > 
    >         With these two changes, the watchdog pretimeout action does not 
trigger
    >         even if the BMC is completely out to lunch for long periods of 
time
    >         (as long as the watchdogd check command does not also get stuck).
    > 
    >         Sponsored by:   Netflix
    >         Reviewed by:    imp
    >         Differential Revision:  https://reviews.freebsd.org/D36555
    >     ---
    >      sys/dev/ipmi/ipmi.c     | 33 ++++++++++++++++++++++++++++++---
    >      sys/dev/ipmi/ipmi_kcs.c | 28 +++++++++++++++++++++++++++-
    >      sys/dev/ipmi/ipmivars.h |  2 ++
    >      3 files changed, 59 insertions(+), 4 deletions(-)
    > 
    >     diff --git a/sys/dev/ipmi/ipmi.c b/sys/dev/ipmi/ipmi.c
    >     index d79690d55c68..b8705a81627b 100644
    >     --- a/sys/dev/ipmi/ipmi.c
    >     +++ b/sys/dev/ipmi/ipmi.c
    >     @@ -208,6 +208,15 @@ ipmi_dtor(void *arg)
    >           IPMI_LOCK(sc);
    >           if (dev->ipmi_requests) {
    >                   /* Throw away any pending requests for this device. */
    >     +             TAILQ_FOREACH_SAFE(req, 
&sc->ipmi_pending_requests_highpri, ir_link,
    >     +                 nreq) {
    >     +                     if (req->ir_owner == dev) {
    >     +                             
TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req,
    >     +                                 ir_link);
    >     +                             dev->ipmi_requests--;
    >     +                             ipmi_free_request(req);
    >     +                     }
    >     +             }
    >                   TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests, 
ir_link,
    >                       nreq) {
    >                           if (req->ir_owner == dev) {
    >     @@ -579,13 +588,19 @@ ipmi_dequeue_request(struct ipmi_softc *sc)
    > 
    >           IPMI_LOCK_ASSERT(sc);
    > 
    >     -     while (!sc->ipmi_detaching && 
TAILQ_EMPTY(&sc->ipmi_pending_requests))
    >     +     while (!sc->ipmi_detaching && 
TAILQ_EMPTY(&sc->ipmi_pending_requests) &&
    >     +         TAILQ_EMPTY(&sc->ipmi_pending_requests_highpri))
    >                   cv_wait(&sc->ipmi_request_added, 
&sc->ipmi_requests_lock);
    >           if (sc->ipmi_detaching)
    >                   return (NULL);
    > 
    >     -     req = TAILQ_FIRST(&sc->ipmi_pending_requests);
    >     -     TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link);
    >     +     req = TAILQ_FIRST(&sc->ipmi_pending_requests_highpri);
    >     +     if (req != NULL)
    >     +             TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, 
ir_link);
    >     +     else {
    >     +             req = TAILQ_FIRST(&sc->ipmi_pending_requests);
    >     +             TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link);
    >     +     }
    >           return (req);
    >      }
    > 
    >     @@ -601,6 +616,17 @@ ipmi_polled_enqueue_request(struct ipmi_softc 
*sc, struct ipmi_request *req)
    >           return (0);
    >      }
    > 
    >     +int
    >     +ipmi_polled_enqueue_request_highpri(struct ipmi_softc *sc, struct 
ipmi_request *req)
    >     +{
    >     +
    >     +     IPMI_LOCK_ASSERT(sc);
    >     +
    >     +     TAILQ_INSERT_TAIL(&sc->ipmi_pending_requests_highpri, req, 
ir_link);
    >     +     cv_signal(&sc->ipmi_request_added);
    >     +     return (0);
    >     +}
    >     +
    >      /*
    >       * Watchdog event handler.
    >       */
    >     @@ -817,6 +843,7 @@ ipmi_startup(void *arg)
    >           mtx_init(&sc->ipmi_requests_lock, "ipmi requests", NULL, 
MTX_DEF);
    >           mtx_init(&sc->ipmi_io_lock, "ipmi io", NULL, MTX_DEF);
    >           cv_init(&sc->ipmi_request_added, "ipmireq");
    >     +     TAILQ_INIT(&sc->ipmi_pending_requests_highpri);
    >           TAILQ_INIT(&sc->ipmi_pending_requests);
    > 
    >           /* Initialize interface-dependent state. */
    >     diff --git a/sys/dev/ipmi/ipmi_kcs.c b/sys/dev/ipmi/ipmi_kcs.c
    >     index df3b37614eb7..5908ec88e039 100644
    >     --- a/sys/dev/ipmi/ipmi_kcs.c
    >     +++ b/sys/dev/ipmi/ipmi_kcs.c
    >     @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$");
    >      #include <sys/param.h>
    >      #include <sys/systm.h>
    >      #include <sys/bus.h>
    >     +#include <sys/conf.h>
    >      #include <sys/condvar.h>
    >      #include <sys/eventhandler.h>
    >      #include <sys/kernel.h>
    >     @@ -490,7 +491,21 @@ kcs_startup(struct ipmi_softc *sc)
    >      }
    > 
    >      static int
    >     -kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, 
int timo)
    >     +kcs_driver_request_queue(struct ipmi_softc *sc, struct ipmi_request 
*req, int timo)
    >     +{
    >     +     int error;
    >     +
    >     +     IPMI_LOCK(sc);
    >     +     ipmi_polled_enqueue_request_highpri(sc, req);
    >     +     error = msleep(req, &sc->ipmi_requests_lock, 0, "ipmireq", 
timo);
    >     +     if (error == 0)
    >     +             error = req->ir_error;
    >     +     IPMI_UNLOCK(sc);
    >     +     return (error);
    >     +}
    >     +
    >     +static int
    >     +kcs_driver_request_poll(struct ipmi_softc *sc, struct ipmi_request 
*req)
    >      {
    >           int i, ok;
    > 
    >     @@ -504,6 +519,17 @@ kcs_driver_request(struct ipmi_softc *sc, struct 
ipmi_request *req, int timo)
    >           return (req->ir_error);
    >      }
    > 
    >     +static int
    >     +kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, 
int timo)
    >     +{
    >     +
    >     +     if (KERNEL_PANICKED() || dumping)
    >     +             return (kcs_driver_request_poll(sc, req));
    >     +     else
    >     +             return (kcs_driver_request_queue(sc, req, timo));
    >     +}
    >     +
    >     +
    >      int
    >      ipmi_kcs_attach(struct ipmi_softc *sc)
    >      {
    >     diff --git a/sys/dev/ipmi/ipmivars.h b/sys/dev/ipmi/ipmivars.h
    >     index b0548ee3d7c3..1468e86be73b 100644
    >     --- a/sys/dev/ipmi/ipmivars.h
    >     +++ b/sys/dev/ipmi/ipmivars.h
    >     @@ -111,6 +111,7 @@ struct ipmi_softc {
    >           uint8_t                 ipmi_dev_support;       /* IPMI_ADS_* */
    >           struct cdev             *ipmi_cdev;
    >           TAILQ_HEAD(,ipmi_request) ipmi_pending_requests;
    >     +     TAILQ_HEAD(,ipmi_request) ipmi_pending_requests_highpri;
    >           int                     ipmi_driver_requests_polled;
    >           eventhandler_tag        ipmi_power_cycle_tag;
    >           eventhandler_tag        ipmi_watchdog_tag;
    >     @@ -237,6 +238,7 @@ void      ipmi_complete_request(struct ipmi_softc 
*, struct ipmi_request *);
    >      struct ipmi_request *ipmi_dequeue_request(struct ipmi_softc *);
    >      void ipmi_free_request(struct ipmi_request *);
    >      int  ipmi_polled_enqueue_request(struct ipmi_softc *, struct 
ipmi_request *);
    >     +int  ipmi_polled_enqueue_request_highpri(struct ipmi_softc *, struct 
ipmi_request *);
    >      int  ipmi_submit_driver_request(struct ipmi_softc *, struct 
ipmi_request *,
    >               int);
    > 
    > 



Reply via email to