On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.le...@c-s.fr> wrote: > On 02/19/2020 10:39 PM, Radu Rendec wrote: > > On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.le...@c-s.fr> wrote: > >>> Interesting. > >>> > >>> 0x900 is the adress of the timer interrupt. > >>> > >>> Would the MCE occur just after the timer interrupt ? > > > > I doubt that. I'm using a small test module to artificially trigger the > > MCE. Basically it's just this (the full code is in my original post): > > > > bad_addr_base = ioremap(0xf0000000, 0x100); > > x = ioread32(bad_addr_base); > > > > I find it hard to believe that every time I load the module the lwbrx > > instruction that triggers the MCE is executed exactly after the timer > > interrupt (or that the timer interrupt always occurs close to the lwbrx > > instruction). > > Can you try to see how much time there is between your read and the MCE ? > The below should allow it, you'll see first value in r13 and the other > in r14 (mce.c is your test code) > > Also provide the timebase frequency as reported in /proc/cpuinfo
I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c. # cat /proc/cpuinfo processor : 0 cpu : e300c4 clock : 800.000004MHz revision : 1.1 (pvr 8086 1011) bogomips : 200.00 timebase : 100000000 The difference between r14 and r13 is 0x20000b. Assuming TB is incremented with 'timebase' frequency, that means 20.97 milliseconds (although the e300 manual says TB is "incremented once every four core input clock cycles"). I repeated the test twice and the absolute values were of course very different, but r14-r13 was 0x20000c and 0x200011, so it seems to be quite consistent (within just a few clock cycles). Just for the fun of it, I repeated the test once more, but with interrupts disabled. The difference was 0x200014. FWIW, I disabled interrupts before sampling TB in r13. > And what's the reason given in the Oops message for the machine check ? > Is that "Caused by (from SRR1=49030): Transfer error ack signal" or > something else ? When interrupts are enabled: Caused by (from SRR1=41000): Transfer error ack signal When interrupts are disabled: Caused by (from SRR1=41030): Transfer error ack signal > > > >> Do you use the local bus monitoring driver ? > > > > I don't. In fact, I'm not even aware of it. What driver is that? > > CONFIG_FSL_LBC OK, it seems I'm actually using it. I haven't enabled it explicitly, but it's automatically pulled by CONFIG_MTD_NAND_FSL_ELBC as a prerequisite. I looked at the code in arch/powerpc/sysdev/fsl_lbc.c and it's quite small. Most of the code is in fsl_lbc_ctrl_irq, which I guess is supposed to print a message if/when the LBC catches an error. I've never seen any of those messages being printed. Best regards, Radu