> I have seen the situation you've described in other operating systems
> and it's often been H/W related and not due to the OS itself. In the
> situations I've seen such problems were caused by the way the bios
> assigns irq's. Though seemingly un-necesasry, I have solved similar problem
> by simply moving a PCI card to another slot.

russ set me on the right track with this suggestion.

        suppose you had a driver that, when it got a
        spurious interrupt, would trigger a real interrupt
        for itself next.  that would work fine in isolation
        and even with other, correct drivers: every time
        one of those drivers got an interrupt, the buggy
        one would see it as spurious and trigger a real
        one, but then the system would calm down.

it turned out that there weren't any spurious interrupts,
but there mv50xx had unhandled interrupts.  the story
starts with sd(3).

          Units are not accessed before the first attach.  Units may
          be individually attached using the attach specifier, [...]

the way the mv50xx driver had been interpreting this
has been to not fully configure ports before sdev->verify
is called.  however, sdev->enable is called at boot time
so until (all) the ports are first accessed, there there was
a window where the irq could scream because it could not
be serviced; accessing the drives calmed the interrupt.
a the solution is to fully configure the drives in the pnp
fn.

so the remaining question is why the red herring of
multiple sd devices on the same irq?  that is actually easy
to explain.  on a cpu server, readnvram(2) scans sd devices
until it finds a proper nvram.  on my machine, sda0 was
scanned first and has an acceptable nvram.  thus sdF
(the mv50xx controller) was not scanned.  if i disable the
orion controller, then there is no nvram and all the sd
devices including sdF are scanned.

- erik

Reply via email to