On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
John Baldwin writes:
| On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
|> John Baldwin writes:
|> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
|> |> Doug Ambrisko writes:
|> |> | John Baldwin writes:
|> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
|> |> | |> Sean Bruno writes:
|> |> | |> | Noting a failure to attach to the onboard IPMI controller
with
| this
|> | dell
|> |> | |> | R815. Not sure what to start poking at and thought I'd
though
| this
|> | over
|> |> | |> | here for comment.
|> |> | |> |
|> |> | |> | -bash-4.2$ dmesg |grep ipmi
|> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi
|> |> | |> | ipmi1:<IPMI System Interface> on isa0
|> |> | |> | device_attach: ipmi1 attach returned 16
|> |> | |> | ipmi1:<IPMI System Interface> on isa0
|> |> | |> | device_attach: ipmi1 attach returned 16
|> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID
|> |> | |>
|> |> | |> I've run into this recently. A quick hack to fix it is:
|> |> | |>
|> |> | |> Index: ipmi.c
|> |> | |>
| ===================================================================
|> |> | |> RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
|> |> | |> retrieving revision 1.14
|> |> | |> diff -u -p -r1.14 ipmi.c
|> |> | |> --- ipmi.c 14 Apr 2011 07:14:22 -0000 1.14
|> |> | |> +++ ipmi.c 31 Mar 2012 19:18:35 -0000
|> |> | |> @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
|> |> | |> if (error == EWOULDBLOCK) {
|> |> | |> device_printf(dev, "Timed out waiting for
| GET_DEVICE_ID\n");
|> |> | |> ipmi_free_request(req);
|> |> | |> - return;
|> |> | |> } else if (error) {
|> |> | |> device_printf(dev, "Failed GET_DEVICE_ID: %d\n",
error);
|> |> | |> ipmi_free_request(req);
|> |> | |>
|> |> | |> The issue is that the wakeup doesn't actually wake up the
msleep
|> |> | |> in ipmi_submit_driver_request. The error being reported is
that
|> |> | |> the msleep timed out. This doesn't seem to be critical
problem
|> |> | |> since after this things seemed to work work. I saw this on
9.X.
|> |> | |> Haven't seen it on 8.2. Not sure about -current.
|> |> | |>
|> |> | |> It doesn't happen on all machines.
|> |> | |
|> |> | | Hmm, are you seeing the KCS thread manage the request but the
| wakeup()
|> | is
|> |> | | lost?
|> |> |
|> |> | It was a couple of weeks ago that I played with it. I put
printf's
|> |> | around the msleep and wakeup. I saw the wakeup called but the
sleep
|> |> | not get it. I can try the test again later today. Right now my
main
|> |> | work machine is recovering from a power outage. This was with 9.0
|> |> | when I first saw it. This issue seems to only happen at boot
time.
|> |> | If I kldload the module after the system is booted then it seems
to
| work
|> |> | okay. The KCS part was working fine and got the data okay from
the
|> |> | request. I haven't seen or heard any issues with 8.2.
|> |>
|> |> With -current I patched ipmi.c with:
|> |> Index: ipmi.c
|> |> ===================================================================
|> |> --- ipmi.c (revision 233806)
|> |> +++ ipmi.c (working copy)
|> |> @@ -523,7 +523,11 @@
|> |> * waiter that we awaken.
|> |> */
|> |> if (req->ir_owner == NULL)
|> |> +{
|> |> +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup
|> | %d\n",__FUNCTION__,__LINE__,ticks);
|> |> wakeup(req);
|> |> +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup
|> | %d\n",__FUNCTION__,__LINE__,ticks);
|> |> +}
|> |> else {
|> |> dev = req->ir_owner;
|> |> TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests,
req,
|> | ir_link);
|> |> @@ -543,7 +547,11 @@
|> |> IPMI_LOCK(sc);
|> |> error = sc->ipmi_enqueue_request(sc, req);
|> |> if (error == 0)
|> |> +{
|> |> +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep
|> | %d\n",__FUNCTION__,__LINE__,ticks);
|> |> error = msleep(req,&sc->ipmi_lock, 0, "ipmireq",
timo);
|> |> +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep
|> | %d\n",__FUNCTION__,__LINE__,ticks);
|> |> +}
|> |> if (error == 0)
|> |> error = req->ir_error;
|> |> IPMI_UNLOCK(sc);
|> |> @@ -695,8 +703,11 @@
|> |> error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
|> |> if (error == EWOULDBLOCK) {
|> |> device_printf(dev, "Timed out waiting for
| GET_DEVICE_ID\n");
|> |> + printf("DJA\n");
|> |> +/*
|> |> ipmi_free_request(req);
|> |> return;
|> |> +*/
|> |> } else if (error) {
|> |> device_printf(dev, "Failed GET_DEVICE_ID: %d\n",
error);
|> |> ipmi_free_request(req);
|> |>
|> |> and get
|> |> # dmesg | grep ipmi
|> |> ipmi0: KCS mode found at io 0xca8 on acpi
|> |> ipmi1:<IPMI System Interface> on isa0
|> |> device_attach: ipmi1 attach returned 16
|> |> ipmi1:<IPMI System Interface> on isa0
|> |> device_attach: ipmi1 attach returned 16
|> |> ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
|> |> ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
|> |> ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
|> |> ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
|> |
|> | Actually, can you compile with:
|> |
|> | options KTR
|> | options KTR_COMPILE=KTR_SCHED
|> | options KTR_MASK=KTR_SCHED
|> |
|> | and then add a temporary hack to ipmi.c to set ktr_mask to 0 after
|> | ipmi_submit_driver_request() returns in ipmi_startup()? You can
|> | then use 'ktrdump -ct' after boot to capture a log of what the
scheduler
|> | did including if it timed out the sleep, etc. I think this would be
|> | useful for figuring out what went wrong. It does seem that it timed
|> | out after 3 seconds.
|>
|> Assuming I didn't mess up, the log should be at:
|> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt
|> again, I using ipmi(4) as module loaded via the loader.
|
| If you use "-ct" then you get a file you can feed into schedgraph.
| However, just reading the log, it seems that IRQ 20 keeps preempting
| the KCS worker thread preventing it from getting anything done. Also,
| there seem to be a lot of threads on CPU 0's runqueue waiting for a
| chance to run (load average of 12 or 13 the entire time). You can try
| just bumping up the max timeout from 3 seconds to higher perhaps. Not
| sure why IRQ 20 keeps firing though. It might be related to USB, so
| you could try fiddling with USB options in the BIOS perhaps, or disabling
| the USB drivers to see if that fixes IPMI.
Tried without USB in kernel:
http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
Hmm, it's still just running constantly (note that the idle thread is
_never_ scheduled). The lion's share of the time seems to be spent in
"xpt_thrd". Note that there are several places where nothing happens except
that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I
would maybe start debugging that to see what in the world it is doing. Maybe
it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
single bus called down into a driver and it is just spinning using polling
instead of sleeping and waiting for an interrupt).