On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote: > John Baldwin writes: > | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: > | > John Baldwin writes: > | > | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: > | > | > Doug Ambrisko writes: > | > | > | John Baldwin writes: > | > | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: > | > | > | | > Sean Bruno writes: > | > | > | | > | Noting a failure to attach to the onboard IPMI controller with > | this > | > | dell > | > | > | | > | R815. Not sure what to start poking at and thought I'd though > | this > | > | over > | > | > | | > | here for comment. > | > | > | | > | > | > | > | | > | -bash-4.2$ dmesg |grep ipmi > | > | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi > | > | > | | > | ipmi1: <IPMI System Interface> on isa0 > | > | > | | > | device_attach: ipmi1 attach returned 16 > | > | > | | > | ipmi1: <IPMI System Interface> on isa0 > | > | > | | > | device_attach: ipmi1 attach returned 16 > | > | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID > | > | > | | > > | > | > | | > I've run into this recently. A quick hack to fix it is: > | > | > | | > > | > | > | | > Index: ipmi.c > | > | > | | > > | =================================================================== > | > | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v > | > | > | | > retrieving revision 1.14 > | > | > | | > diff -u -p -r1.14 ipmi.c > | > | > | | > --- ipmi.c 14 Apr 2011 07:14:22 -0000 1.14 > | > | > | | > +++ ipmi.c 31 Mar 2012 19:18:35 -0000 > | > | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) > | > | > | | > if (error == EWOULDBLOCK) { > | > | > | | > device_printf(dev, "Timed out waiting for > | GET_DEVICE_ID\n"); > | > | > | | > ipmi_free_request(req); > | > | > | | > - return; > | > | > | | > } else if (error) { > | > | > | | > device_printf(dev, "Failed GET_DEVICE_ID: > %d\n", error); > | > | > | | > ipmi_free_request(req); > | > | > | | > > | > | > | | > The issue is that the wakeup doesn't actually wake up the msleep > | > | > | | > in ipmi_submit_driver_request. The error being reported is that > | > | > | | > the msleep timed out. This doesn't seem to be critical problem > | > | > | | > since after this things seemed to work work. I saw this on 9.X. > | > | > | | > Haven't seen it on 8.2. Not sure about -current. > | > | > | | > > | > | > | | > It doesn't happen on all machines. > | > | > | | > | > | > | | Hmm, are you seeing the KCS thread manage the request but the > | wakeup() > | > | is > | > | > | | lost? > | > | > | > | > | > | It was a couple of weeks ago that I played with it. I put printf's > | > | > | around the msleep and wakeup. I saw the wakeup called but the sleep > | > | > | not get it. I can try the test again later today. Right now my main > | > | > | work machine is recovering from a power outage. This was with 9.0 > | > | > | when I first saw it. This issue seems to only happen at boot time. > | > | > | If I kldload the module after the system is booted then it seems to > | work > | > | > | okay. The KCS part was working fine and got the data okay from the > | > | > | request. I haven't seen or heard any issues with 8.2. > | > | > > | > | > With -current I patched ipmi.c with: > | > | > Index: ipmi.c > | > | > =================================================================== > | > | > --- ipmi.c (revision 233806) > | > | > +++ ipmi.c (working copy) > | > | > @@ -523,7 +523,11 @@ > | > | > * waiter that we awaken. > | > | > */ > | > | > if (req->ir_owner == NULL) > | > | > +{ > | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup > | > | %d\n",__FUNCTION__,__LINE__,ticks); > | > | > wakeup(req); > | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup > | > | %d\n",__FUNCTION__,__LINE__,ticks); > | > | > +} > | > | > else { > | > | > dev = req->ir_owner; > | > | > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, > | > | ir_link); > | > | > @@ -543,7 +547,11 @@ > | > | > IPMI_LOCK(sc); > | > | > error = sc->ipmi_enqueue_request(sc, req); > | > | > if (error == 0) > | > | > +{ > | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep > | > | %d\n",__FUNCTION__,__LINE__,ticks); > | > | > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo); > | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep > | > | %d\n",__FUNCTION__,__LINE__,ticks); > | > | > +} > | > | > if (error == 0) > | > | > error = req->ir_error; > | > | > IPMI_UNLOCK(sc); > | > | > @@ -695,8 +703,11 @@ > | > | > error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT); > | > | > if (error == EWOULDBLOCK) { > | > | > device_printf(dev, "Timed out waiting for > | GET_DEVICE_ID\n"); > | > | > + printf("DJA\n"); > | > | > +/* > | > | > ipmi_free_request(req); > | > | > return; > | > | > +*/ > | > | > } else if (error) { > | > | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); > | > | > ipmi_free_request(req); > | > | > > | > | > and get > | > | > # dmesg | grep ipmi > | > | > ipmi0: KCS mode found at io 0xca8 on acpi > | > | > ipmi1: <IPMI System Interface> on isa0 > | > | > device_attach: ipmi1 attach returned 16 > | > | > ipmi1: <IPMI System Interface> on isa0 > | > | > device_attach: ipmi1 attach returned 16 > | > | > ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 > | > | > ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201 > | > | > ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263 > | > | > ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323 > | > | > | > | Actually, can you compile with: > | > | > | > | options KTR > | > | options KTR_COMPILE=KTR_SCHED > | > | options KTR_MASK=KTR_SCHED > | > | > | > | and then add a temporary hack to ipmi.c to set ktr_mask to 0 after > | > | ipmi_submit_driver_request() returns in ipmi_startup()? You can > | > | then use 'ktrdump -ct' after boot to capture a log of what the scheduler > | > | did including if it timed out the sleep, etc. I think this would be > | > | useful for figuring out what went wrong. It does seem that it timed > | > | out after 3 seconds. > | > > | > Assuming I didn't mess up, the log should be at: > | > http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt > | > again, I using ipmi(4) as module loaded via the loader. > | > | If you use "-ct" then you get a file you can feed into schedgraph. > | However, just reading the log, it seems that IRQ 20 keeps preempting > | the KCS worker thread preventing it from getting anything done. Also, > | there seem to be a lot of threads on CPU 0's runqueue waiting for a > | chance to run (load average of 12 or 13 the entire time). You can try > | just bumping up the max timeout from 3 seconds to higher perhaps. Not > | sure why IRQ 20 keeps firing though. It might be related to USB, so > | you could try fiddling with USB options in the BIOS perhaps, or disabling > | the USB drivers to see if that fixes IPMI. > > Tried without USB in kernel: > http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
Hmm, it's still just running constantly (note that the idle thread is _never_ scheduled). The lion's share of the time seems to be spent in "xpt_thrd". Note that there are several places where nothing happens except that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I would maybe start debugging that to see what in the world it is doing. Maybe it is polling some hardware down in xpt_action() (i.e., xpt_action() for a single bus called down into a driver and it is just spinning using polling instead of sleeping and waiting for an interrupt). -- John Baldwin _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"