8.3-PRERELEASE and ATA_CAM

2012-04-06 Thread Daniel Braniss
with the latest svn, I can't compile kernel with  options ATA_CAM:

...
linking kernel.debug
ata-disk.o(.text+0x93): In function `ad_init':
/r+d/stable/8.3/sys/dev/ata/ata-disk.c:389: undefined reference to 
`ata_setmode'
ata-disk.o(.text+0xaa):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:397: undefined 
reference to `ata_wc'
ata-disk.o(.text+0xc5):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:398: undefined 
reference to `ata_controlcmd'
ata-disk.o(.text+0x113):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:400: undefined 
reference to `ata_controlcmd'
ata-disk.o(.text+0x133):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:393: undefined 
reference to `ata_controlcmd'
ata-disk.o(.text+0x16d):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:407: undefined 
reference to `ata_controlcmd'
ata-disk.o(.text+0x21a): In function `ad_shutdown':
/r+d/stable/8.3/sys/dev/ata/ata-disk.c:196: undefined reference to 
`ata_controlcmd'
ata-disk.o(.text+0x45c): In function `ad_detach':
/r+d/stable/8.3/sys/dev/ata/ata-disk.c:182: undefined reference to 
`ata_fail_requests'
...

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Alexander Motin

On 04/04/12 21:47, John Baldwin wrote:

On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:

John Baldwin writes:
| On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
|>  John Baldwin writes:
|>  | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
|>  |>  Doug Ambrisko writes:
|>  |>  | John Baldwin writes:
|>  |>  | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
|>  |>  | |>  Sean Bruno writes:
|>  |>  | |>  | Noting a failure to attach to the onboard IPMI controller

with

| this
|>  | dell
|>  |>  | |>  | R815.  Not sure what to start poking at and thought I'd

though

| this
|>  | over
|>  |>  | |>  | here for comment.
|>  |>  | |>  |
|>  |>  | |>  | -bash-4.2$ dmesg |grep ipmi
|>  |>  | |>  | ipmi0: KCS mode found at io 0xca8 on acpi
|>  |>  | |>  | ipmi1:  on isa0
|>  |>  | |>  | device_attach: ipmi1 attach returned 16
|>  |>  | |>  | ipmi1:  on isa0
|>  |>  | |>  | device_attach: ipmi1 attach returned 16
|>  |>  | |>  | ipmi0: Timed out waiting for GET_DEVICE_ID
|>  |>  | |>
|>  |>  | |>  I've run into this recently.  A quick hack to fix it is:
|>  |>  | |>
|>  |>  | |>  Index: ipmi.c
|>  |>  | |>
| ===
|>  |>  | |>  RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
|>  |>  | |>  retrieving revision 1.14
|>  |>  | |>  diff -u -p -r1.14 ipmi.c
|>  |>  | |>  --- ipmi.c   14 Apr 2011 07:14:22 -  1.14
|>  |>  | |>  +++ ipmi.c   31 Mar 2012 19:18:35 -
|>  |>  | |>  @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
|>  |>  | |>   if (error == EWOULDBLOCK) {
|>  |>  | |>   device_printf(dev, "Timed out waiting for
| GET_DEVICE_ID\n");
|>  |>  | |>   ipmi_free_request(req);
|>  |>  | |>  -return;
|>  |>  | |>   } else if (error) {
|>  |>  | |>   device_printf(dev, "Failed GET_DEVICE_ID: %d\n",

error);

|>  |>  | |>   ipmi_free_request(req);
|>  |>  | |>
|>  |>  | |>  The issue is that the wakeup doesn't actually wake up the

msleep

|>  |>  | |>  in ipmi_submit_driver_request.  The error being reported is

that

|>  |>  | |>  the msleep timed out.  This doesn't seem to be critical

problem

|>  |>  | |>  since after this things seemed to work work.  I saw this on

9.X.

|>  |>  | |>  Haven't seen it on 8.2.  Not sure about -current.
|>  |>  | |>
|>  |>  | |>  It doesn't happen on all machines.
|>  |>  | |
|>  |>  | | Hmm, are you seeing the KCS thread manage the request but the
| wakeup()
|>  | is
|>  |>  | | lost?
|>  |>  |
|>  |>  | It was a couple of weeks ago that I played with it.  I put

printf's

|>  |>  | around the msleep and wakeup.  I saw the wakeup called but the

sleep

|>  |>  | not get it.  I can try the test again later today.  Right now my

main

|>  |>  | work machine is recovering from a power outage.  This was with 9.0
|>  |>  | when I first saw it.  This issue seems to only happen at boot

time.

|>  |>  | If I kldload the module after the system is booted then it seems

to

| work
|>  |>  | okay.  The KCS part was working fine and got the data okay from

the

|>  |>  | request.  I haven't seen or heard any issues with 8.2.
|>  |>
|>  |>  With -current I patched ipmi.c with:
|>  |>  Index: ipmi.c
|>  |>  ===
|>  |>  --- ipmi.c  (revision 233806)
|>  |>  +++ ipmi.c  (working copy)
|>  |>  @@ -523,7 +523,11 @@
|>  |>   * waiter that we awaken.
|>  |>   */
|>  |>  if (req->ir_owner == NULL)
|>  |>  +{
|>  |>  +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup
|>  | %d\n",__FUNCTION__,__LINE__,ticks);
|>  |>  wakeup(req);
|>  |>  +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup
|>  | %d\n",__FUNCTION__,__LINE__,ticks);
|>  |>  +}
|>  |>  else {
|>  |>  dev = req->ir_owner;
|>  |>  TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests,

req,

|>  | ir_link);
|>  |>  @@ -543,7 +547,11 @@
|>  |>  IPMI_LOCK(sc);
|>  |>  error = sc->ipmi_enqueue_request(sc, req);
|>  |>  if (error == 0)
|>  |>  +{
|>  |>  +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep
|>  | %d\n",__FUNCTION__,__LINE__,ticks);
|>  |>  error = msleep(req,&sc->ipmi_lock, 0, "ipmireq",

timo);

|>  |>  +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep
|>  | %d\n",__FUNCTION__,__LINE__,ticks);
|>  |>  +}
|>  |>  if (error == 0)
|>  |>  error = req->ir_error;
|>  |>  IPMI_UNLOCK(sc);
|>  |>  @@ -695,8 +703,11 @@
|>  |>  error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
|>  |>  if (error == EWOULDBLOCK) {
|>  |>  device_printf(dev, "Timed out waiting for
| GET_DEVICE_ID\n");
|>  |>  +   printf("DJA\n");
|>  |>  +/*
|>  |>  ipmi_free_request(req);
|>  |>  return;
|>  |>  +*/
|>  |>  } else if (error) {

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Doug Ambrisko
Alexander Motin writes:
[ Charset ISO-8859-1 unsupported, converting... ]
| On 04/04/12 21:47, John Baldwin wrote:
| > On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
| >> John Baldwin writes:
| >> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| >> |>  John Baldwin writes:
| >> |>  | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| >> |>  |>  Doug Ambrisko writes:
| >> |>  |>  | John Baldwin writes:
| >> |>  |>  | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| >> |>  |>  | |>  Sean Bruno writes:
| >> |>  |>  | |>  | Noting a failure to attach to the onboard IPMI controller
| > with
| >> | this
| >> |>  | dell
| >> |>  |>  | |>  | R815.  Not sure what to start poking at and thought I'd
| > though
| >> | this
| >> |>  | over
| >> |>  |>  | |>  | here for comment.
| >> |>  |>  | |>  |
| >> |>  |>  | |>  | -bash-4.2$ dmesg |grep ipmi
| >> |>  |>  | |>  | ipmi0: KCS mode found at io 0xca8 on acpi
| >> |>  |>  | |>  | ipmi1:  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi1:  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi0: Timed out waiting for GET_DEVICE_ID
| >> |>  |>  | |>
| >> |>  |>  | |>  I've run into this recently.  A quick hack to fix it is:
| >> |>  |>  | |>
| >> |>  |>  | |>  Index: ipmi.c
| >> |>  |>  | |>
[snip]
| >> | If you use "-ct" then you get a file you can feed into schedgraph.
| >> | However, just reading the log, it seems that IRQ 20 keeps preempting
| >> | the KCS worker thread preventing it from getting anything done.  Also,
| >> | there seem to be a lot of threads on CPU 0's runqueue waiting for a
| >> | chance to run (load average of 12 or 13 the entire time).  You can try
| >> | just bumping up the max timeout from 3 seconds to higher perhaps.  Not
| >> | sure why IRQ 20 keeps firing though.  It might be related to USB, so
| >> | you could try fiddling with USB options in the BIOS perhaps, or disabling
| >> | the USB drivers to see if that fixes IPMI.
| >>
| >> Tried without USB in kernel:
| >>http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
| >
| > Hmm, it's still just running constantly (note that the idle thread is
| > _never_ scheduled).  The lion's share of the time seems to be spent in
| > "xpt_thrd".  Note that there are several places where nothing happens except
| > that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks.  
I
| > would maybe start debugging that to see what in the world it is doing.  
Maybe
| > it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
| > single bus called down into a driver and it is just spinning using polling
| > instead of sleeping and waiting for an interrupt).
| 
| "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus 
| on attach and by controller driver on hot-plug events. For some 
| controllers it may be quite CPU-hungry. For example, for legacy ATA 
| controllers, where bus reset may take many seconds of hardware polling, 
| while devices just spinning up. For ahci(4) it was improved about year 
| ago to not use polling when possible, but it still may loop for some 
| time if controller is not responding on reset. What mfi(4), mentioned in 
| log, does during scanning, I am not sure.

I thought that mfi(4) could be an issue.  There are some ata controllers
with nothing attached.  I built a GENERIC with USB and mfi commented out
and then the timeout issue went away:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

Without mfi and with USB and it had issues:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
  ipmi0: Timed out waiting for GET_DEVICE_ID
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

I can post more ktrdump traces if needed.  A 1U Dell machine without
mfi also has this problem.  As John mentioned it might be good to
bump up the timeout from 3s to 6s.  I did that with the USB no mfi
kernel and that passed:

  % dmesg | grep ipmi
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_subm

Re: 8.3-PRERELEASE and ATA_CAM

2012-04-06 Thread Marius Strobl
On Fri, Apr 06, 2012 at 10:48:13AM +0300, Daniel Braniss wrote:
> with the latest svn, I can't compile kernel with  options ATA_CAM:
> 
> ...
> linking kernel.debug
> ata-disk.o(.text+0x93): In function `ad_init':
> /r+d/stable/8.3/sys/dev/ata/ata-disk.c:389: undefined reference to 
> `ata_setmode'
> ata-disk.o(.text+0xaa):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:397: undefined 
> reference to `ata_wc'
> ata-disk.o(.text+0xc5):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:398: undefined 
> reference to `ata_controlcmd'
> ata-disk.o(.text+0x113):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:400: undefined 
> reference to `ata_controlcmd'
> ata-disk.o(.text+0x133):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:393: undefined 
> reference to `ata_controlcmd'
> ata-disk.o(.text+0x16d):/r+d/stable/8.3/sys/dev/ata/ata-disk.c:407: undefined 
> reference to `ata_controlcmd'
> ata-disk.o(.text+0x21a): In function `ad_shutdown':
> /r+d/stable/8.3/sys/dev/ata/ata-disk.c:196: undefined reference to 
> `ata_controlcmd'
> ata-disk.o(.text+0x45c): In function `ad_detach':
> /r+d/stable/8.3/sys/dev/ata/ata-disk.c:182: undefined reference to 
> `ata_fail_requests'
> ...
> 

You seem to be using a mutually exclusive set of ata(4) options and
devices (previously, this erroneously wasn't a bug). When including
options ATA_CAM you do _not_ want to also include any of the following
devices:
device  atapicam
device  atadisk
device  ataraid
device  atapicd
device  atapifd
device  atapist

Instead you need the corresponding driver from the following set:
device  scbus
device  ch
device  da
device  sa
device  cd
device  pass

Marius

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Alexander Motin

On 04/06/12 20:12, Doug Ambrisko wrote:

Alexander Motin writes:
[ Charset ISO-8859-1 unsupported, converting... ]
| On 04/04/12 21:47, John Baldwin wrote:
|>  On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
|>>  John Baldwin writes:
|>>  | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
|>>  |>   John Baldwin writes:
|>>  |>   | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
|>>  |>   |>   Doug Ambrisko writes:
|>>  |>   |>   | John Baldwin writes:
|>>  |>   |>   | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
|>>  |>   |>   | |>   Sean Bruno writes:
|>>  |>   |>   | |>   | Noting a failure to attach to the onboard IPMI 
controller
|>  with
|>>  | this
|>>  |>   | dell
|>>  |>   |>   | |>   | R815.  Not sure what to start poking at and thought I'd
|>  though
|>>  | this
|>>  |>   | over
|>>  |>   |>   | |>   | here for comment.
|>>  |>   |>   | |>   |
|>>  |>   |>   | |>   | -bash-4.2$ dmesg |grep ipmi
|>>  |>   |>   | |>   | ipmi0: KCS mode found at io 0xca8 on acpi
|>>  |>   |>   | |>   | ipmi1:   on isa0
|>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
|>>  |>   |>   | |>   | ipmi1:   on isa0
|>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
|>>  |>   |>   | |>   | ipmi0: Timed out waiting for GET_DEVICE_ID
|>>  |>   |>   | |>
|>>  |>   |>   | |>   I've run into this recently.  A quick hack to fix it is:
|>>  |>   |>   | |>
|>>  |>   |>   | |>   Index: ipmi.c
|>>  |>   |>   | |>
[snip]
|>>  | If you use "-ct" then you get a file you can feed into schedgraph.
|>>  | However, just reading the log, it seems that IRQ 20 keeps preempting
|>>  | the KCS worker thread preventing it from getting anything done.  Also,
|>>  | there seem to be a lot of threads on CPU 0's runqueue waiting for a
|>>  | chance to run (load average of 12 or 13 the entire time).  You can try
|>>  | just bumping up the max timeout from 3 seconds to higher perhaps.  Not
|>>  | sure why IRQ 20 keeps firing though.  It might be related to USB, so
|>>  | you could try fiddling with USB options in the BIOS perhaps, or disabling
|>>  | the USB drivers to see if that fixes IPMI.
|>>
|>>  Tried without USB in kernel:
|>>   http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
|>
|>  Hmm, it's still just running constantly (note that the idle thread is
|>  _never_ scheduled).  The lion's share of the time seems to be spent in
|>  "xpt_thrd".  Note that there are several places where nothing happens except
|>  that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks.  
I
|>  would maybe start debugging that to see what in the world it is doing.  
Maybe
|>  it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
|>  single bus called down into a driver and it is just spinning using polling
|>  instead of sleeping and waiting for an interrupt).
|
| "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus
| on attach and by controller driver on hot-plug events. For some
| controllers it may be quite CPU-hungry. For example, for legacy ATA
| controllers, where bus reset may take many seconds of hardware polling,
| while devices just spinning up. For ahci(4) it was improved about year
| ago to not use polling when possible, but it still may loop for some
| time if controller is not responding on reset. What mfi(4), mentioned in
| log, does during scanning, I am not sure.

I thought that mfi(4) could be an issue.  There are some ata controllers
with nothing attached.  I built a GENERIC with USB and mfi commented out
and then the timeout issue went away:
   ipmi0: KCS mode found at io 0xca8 on acpi
   ipmi1:  on isa0
   device_attach: ipmi1 attach returned 16
   ipmi1:  on isa0
   device_attach: ipmi1 attach returned 16
   ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
   ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
   ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
   ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
   ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

Without mfi and with USB and it had issues:
   ipmi0: KCS mode found at io 0xca8 on acpi
   ipmi1:  on isa0
   device_attach: ipmi1 attach returned 16
   ipmi1:  on isa0
   device_attach: ipmi1 attach returned 16
   ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
   ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
   ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
   ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
   ipmi0: Timed out waiting for GET_DEVICE_ID
   ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

I can post more ktrdump traces if needed.  A 1U Dell machine without
mfi also has this problem.  As John mentioned it might be good to
bump up the timeout from 3s to 6s.  I did that with the USB no mfi
kernel and that passed:

   % dmesg | grep ipmi
   ipmi0: KCS mode found at io 0xca8 on acpi
   ipmi1:  on isa0
   devi

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Doug Ambrisko
Alexander Motin writes:
| On 04/06/12 20:12, Doug Ambrisko wrote:
| > Alexander Motin writes:
| > | On 04/04/12 21:47, John Baldwin wrote:
| > |>  On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
| > |>>  John Baldwin writes:
| > |>>  | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| > |>>  |>   John Baldwin writes:
| > |>>  |>   | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| > |>>  |>   |>   Doug Ambrisko writes:
| > |>>  |>   |>   | John Baldwin writes:
| > |>>  |>   |>   | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko 
wrote:
| > |>>  |>   |>   | |>   Sean Bruno writes:
| > |>>  |>   |>   | |>   | Noting a failure to attach to the onboard IPMI 
controller
| > |>  with
| > |>>  | this
| > |>>  |>   | dell
| > |>>  |>   |>   | |>   | R815.  Not sure what to start poking at and thought 
I'd
| > |>  though
| > |>>  | this
| > |>>  |>   | over
| > |>>  |>   |>   | |>   | here for comment.
| > |>>  |>   |>   | |>   |
| > |>>  |>   |>   | |>   | -bash-4.2$ dmesg |grep ipmi
| > |>>  |>   |>   | |>   | ipmi0: KCS mode found at io 0xca8 on acpi
| > |>>  |>   |>   | |>   | ipmi1:   on isa0
| > |>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
| > |>>  |>   |>   | |>   | ipmi1:   on isa0
| > |>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
| > |>>  |>   |>   | |>   | ipmi0: Timed out waiting for GET_DEVICE_ID
| > |>>  |>   |>   | |>
| > |>>  |>   |>   | |>   I've run into this recently.  A quick hack to fix it 
is:
| > |>>  |>   |>   | |>
| > |>>  |>   |>   | |>   Index: ipmi.c
| > |>>  |>   |>   | |>
| > [snip]
| > |>>  | If you use "-ct" then you get a file you can feed into schedgraph.
| > |>>  | However, just reading the log, it seems that IRQ 20 keeps preempting
| > |>>  | the KCS worker thread preventing it from getting anything done.  
Also,
| > |>>  | there seem to be a lot of threads on CPU 0's runqueue waiting for a
| > |>>  | chance to run (load average of 12 or 13 the entire time).  You can 
try
| > |>>  | just bumping up the max timeout from 3 seconds to higher perhaps.  
Not
| > |>>  | sure why IRQ 20 keeps firing though.  It might be related to USB, so
| > |>>  | you could try fiddling with USB options in the BIOS perhaps, or 
disabling
| > |>>  | the USB drivers to see if that fixes IPMI.
| > |>>
| > |>>  Tried without USB in kernel:
| > |>> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
| > |>
| > |>  Hmm, it's still just running constantly (note that the idle thread is
| > |>  _never_ scheduled).  The lion's share of the time seems to be spent in
| > |>  "xpt_thrd".  Note that there are several places where nothing happens 
except
| > |>  that "xpt_thrd" runs constantly (spinning) during 10's of statclock 
ticks.  I
| > |>  would maybe start debugging that to see what in the world it is doing.  
Maybe
| > |>  it is polling some hardware down in xpt_action() (i.e., xpt_action() 
for a
| > |>  single bus called down into a driver and it is just spinning using 
polling
| > |>  instead of sleeping and waiting for an interrupt).
| > |
| > | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus
| > | on attach and by controller driver on hot-plug events. For some
| > | controllers it may be quite CPU-hungry. For example, for legacy ATA
| > | controllers, where bus reset may take many seconds of hardware polling,
| > | while devices just spinning up. For ahci(4) it was improved about year
| > | ago to not use polling when possible, but it still may loop for some
| > | time if controller is not responding on reset. What mfi(4), mentioned in
| > | log, does during scanning, I am not sure.
| >
| > I thought that mfi(4) could be an issue.  There are some ata controllers
| > with nothing attached.  I built a GENERIC with USB and mfi commented out
| > and then the timeout issue went away:
| >ipmi0: KCS mode found at io 0xca8 on acpi
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
| >ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
| >ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
| >ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
| >ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0
| >
| > Without mfi and with USB and it had issues:
| >ipmi0: KCS mode found at io 0xca8 on acpi
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
| >ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
| >ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
| >ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
| >ipmi0: Timed out waiting for GET_DEVICE_ID
| >ipmi0: IPMI device rev. 0, firmware r

RE: 8.3-PRERELEASE and ATA_CAM

2012-04-06 Thread Dewayne Geraghty
Marius, 
Perhaps this mutual exclusivity issue between ATA_CAM with atapicam and
friends, should be mentioned in UPDATING as I'm sure the same question will
recur.

Thank-you for your guidance resolving the same issue that I had in 9.0
Stable.
Regards, Dewayne.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 157k interrupts per second causing 60% CPU load on idle system

2012-04-06 Thread Matt Thyer
On 5 April 2012 01:18, Freddie Cash  wrote:

> On Wed, Apr 4, 2012 at 5:19 AM, Matt Thyer  wrote:
> > So it seems that both the old and new mps driver have a problem with the
> > Western Digital WD20EARX SATA 3 drive on a SuperMicro AOC-USAS2-L8i (SAS
> > 6G) controller (flashed with -IT firmware).
>
> I wouldn't say the driver has a problem with that specific drive.
> More that it might have a problem with a mixed SATA2/SATA3 setup.
>
> Sorry, that's what I meant to say but it now seems that the 157K
interrupts per second is probably not due to the SuperMicro AOC-USAS2-L8i.

Since moving the SATA 3 disk to the onboard Intel SATA 2 controller I'm no
longer having that disk evicted from the raidz2 pool with write errors and
I thought that the high interrupt rate issue had also been solved but it's
back again.

This is on 8-STABLE at revision 230921 (before the new driver hit 8-STABLE).

So now I need to go back to trying to determine what the cause is.

I'll stop posting in this thread as I don't think it's anything to do with
either the old or new version of this driver.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 157k interrupts per second causing 60% CPU load on idle system

2012-04-06 Thread Matt Thyer
On 7 April 2012 14:31, Matt Thyer  wrote:

> On 5 April 2012 01:18, Freddie Cash  wrote:
>
>> On Wed, Apr 4, 2012 at 5:19 AM, Matt Thyer  wrote:
>> > So it seems that both the old and new mps driver have a problem with the
>> > Western Digital WD20EARX SATA 3 drive on a SuperMicro AOC-USAS2-L8i (SAS
>> > 6G) controller (flashed with -IT firmware).
>>
>> I wouldn't say the driver has a problem with that specific drive.
>> More that it might have a problem with a mixed SATA2/SATA3 setup.
>>
>> Sorry, that's what I meant to say but it now seems that the 157K
> interrupts per second is probably not due to the SuperMicro AOC-USAS2-L8i.
>
> Since moving the SATA 3 disk to the onboard Intel SATA 2 controller I'm no
> longer having that disk evicted from the raidz2 pool with write errors and
> I thought that the high interrupt rate issue had also been solved but it's
> back again.
>
> This is on 8-STABLE at revision 230921 (before the new driver hit
> 8-STABLE).
>
> So now I need to go back to trying to determine what the cause is.
>
> I'll stop posting in this thread as I don't think it's anything to do with
> either the old or new version of this driver.
>

Oops... wrong thread I thought I was replying in -CURRENT.

So on to the root cause.

vmstat -i has shown that the issue was on irq 16.

Unfortunately there seems to be a lot of things on irq 16:

$  dmesg | grep "irq 16"
pcib1:  irq 16 at device 1.0 on pci0
mps0:  port 0xee00-0xeeff mem
0xfbdfc000-0xfbdf,0xfbd8-0xfbdb irq 16 at device 0.0 on pci1
vgapci0:  port 0xff00-0xff07 mem
0xfb40-0xfb7f,0xe000-0xefff irq 16 at device 2.0 on pci0
uhci0:  port 0xfe00-0xfe1f irq 16 at device
26.0 on pci0
pcib2:  irq 16 at device 28.0 on pci0
pcib3:  irq 16 at device 28.4 on pci0
atapci0:  port
0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq
16 at device 0.0 on pci3
pcib1:  irq 16 at device 1.0 on pci0
mps0:  port 0xee00-0xeeff mem
0xfbdfc000-0xfbdf,0xfbd8-0xfbdb irq 16 at device 0.0 on pci1
vgapci0:  port 0xff00-0xff07 mem
0xfb40-0xfb7f,0xe000-0xefff irq 16 at device 2.0 on pci0
uhci0:  port 0xfe00-0xfe1f irq 16 at device
26.0 on pci0
pcib2:  irq 16 at device 28.0 on pci0
pcib3:  irq 16 at device 28.4 on pci0
atapci0:  port
0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq
16 at device 0.0 on pci3

Any idea how to isolate which bit of hardware could be triggering the
interrupts ?

Unfortunately the only device I could remove would be the SuperMicro
AOC-USAS2-L8i (so yes I could eliminate that).

My biggest problem right now is not knowing how to trigger the issue.

At this stage I'm going to upgrade to 9-STABLE and see if it returns.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"