Guarded load and bus error

Micha Nelissen Thu, 22 Oct 2009 14:05:28 -0700

Hi,

I'm working on a MPC8548 processor, using its RapidIO bus. I have twokernel trees ported for a board, a linux 2.6.24-ppc, and a linux-2.6.31(powerpc) kernel. I don't think this bus behaviour is RapidIO specificthough, as also the PCI bus and local bus must handle malfunctioningdevices. The HID1[RFXE] bit is enabled.

To test bus error behaviour, I'm doing reads from mapped (RapidIO) I/Omemory (mapped cache-inhibited, guarded). 32 bit aligned accesses areworking fine, so the setup is good. A RapidIO error handler is installed(error/port-write interrupt) which printks some error bits from theRapidIO error registers and resets them. Now I'm provoking bus errors by:


1) reading from a RapidIO device that does not exist: a timeout is asserted
2) reading from an unaligned address

The MPC8548ERM mentions that interrupt latency is indeterminate forguarded loads. From this I conclude that the processor stalls until itreceives data from the bus: it is not interruptable (machine check,interrupts or critical interrupts). However the following behaviour is seen:


Linux 2.6.24 ppc:

For 1) my application gets a SIGBUS, after this, the error interrupt isrun reporting a packet timeout: good.For 2) the kernel OOPSes while running do_IRQ, getting irq number. Thekernel is not interrupt mode though: my application is killed and I maycontinue.


Linux 2.6.31 powerpc:

For 1) first some interrupt runs (apparantly), the machine check handlerprints a stack trace showing do_IRQ and retrieving the irq number. Thekernel in this instance detects it's running an interrupt and panic'sand resets immediately.

For 2) things are even worse ;-).

The case 1) may be "solved" by disabling my own RapidIO error interrupthandling (I think that's the IRQ about to be executed, but the kernelhasn't gotten far enough to read the proper registers to tell me). Ifthe error interrupt is disabled, then the application is killed.Behaviour seems proper; except I can't print my (diagnostic) errors.

With this "fix" though, the case 2) proceeds as follows: the kernelOOPSes in the machine check handler with the stack trace showing it'sexecuting instructions in the softirq handler. The softirq process iskilled (I assume). After this my application may continue, and I thinkit retries the I/O read because (after timeout) the machine check OOPSesagain, this time showing a timer interrupt in progress (which is tryingto wake the softirq process), thereby panic'ing and resetting the board.

If I "mangle" the machine check handler to print RapidIO error registersand return immediately always, then the behaviour is that I keep gettingmachine checks printing 'packet timeout' and/or 'illegal field inpacket' ... apparantly the I/O operation is retried again and again. Notparticularly nice for a so called "guarded load".

To verify the "guarded load" being really guarded, I set the timeout tomaximum (~5 seconds), and tried to read from a non-existing device.Under these circumstances, the board is not pingable anymore, and telnetsessions to it are dead. These come back to life when the timeout haspassed and a SIGBUS has killed the test application.

So, the guarded load does really seem to block external interrupts (atleast timer and ethernet), but on the other hand I'm seeing inconsistentstack traces during the machine check handling (as the last instructionwas in user space, I shouldn't be seeing stack traces down the kernel,softirq or where else).

The HID0 and HID1 registers are equal in the two kernels (except the2.6.31 sets DOZE mode, but disabling that had no effect).


How is it possible that behaviour differs between these two kernels?

How can I get my desired behaviour that my application is killed with aSIGBUS, and the rest of the kernel keeps running properly?


Thanks in advance for any insight,

Micha Nelissen
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Guarded load and bus error

Reply via email to