Ben,
After your hints I suspected the read of a real world i/o variable *piom which came from ioremap_nocache in the 3 line critical interrupt handler void critintr_handler(void *dev) { critintrcount++; // increment a variable iodata = *piom; // read an I/O location mtdcr(0x0c0, 0x00002000); // clear critical interrupt } is what caused the problem. Commenting it out seems to make the system stable. This led us to disable the critical interrupt when in the DataTLBError44x and InstructionTLBError44x exceptions. Now the critical interrupt handler seems to make things more stable when reading real world i/o for our application. /* Data TLB Error Interrupt */ START_EXCEPTION(DataTLBError44x) mtspr SPRN_SPRG_WSCRATCH0, r10 /* Save some working */ + mfmsr r10 /* Disable the */ + rlwinm r10,r10,0,15,13 /* MSR's CE bit */ + mtmsr r10 Do you see any potential problems with this approach? If so can you advise us on how to better take care of this. On Tue, 2013-08-20 at 06:56 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2013-08-19 at 12:00 -0700, Henry Bausley wrote: > > > > Support does appear to be present but there is a problem returning > > back to user space I suspect. > > Probably a problem with TLB misses vs. crit interrupts. > > A critical interrupt can re-enter a TLB miss. > > I can see two potential issues there: > > - A bug where we don't properly restore "something" (I thought we did > save and restore MMUCR though, but that's worth dbl checking if it works > properly) accross the crit entry/exit > > - Something in your crit code causing a TLB miss (the > kernel .text/.data/.bss should be bolted but anything else can). We > don't currently support re-entering the TLB miss that way. > > If we were to support the latter, we'd need to detect on entering a crit > that the PC is within the TLB miss handler, and setup a return context > to the original instruction (replay the miss) rather than trying to > resume it.. > > Cheers, > Ben. > > > What fails is it causes Linux user space programs to get Segmentation > > errors. > > Issuing a simple ls causes a segmentation fault sometimes. The shell > > gets terminated > > and you cannot log back in. INIT: Id "T0" respawning too fast: > > disabled for 5 minutes pops up. > > > > However, the critical interrupt handler keeps running. I know this by > > adding the reading > > of a physical I/O location in the handler and can see it is being read > > on the scope. > > > > > > The only code in the handler is below. > > > > void critintr_handler(void *dev) > > { > > critintrcount++; // increment a variable > > iodata = *piom; // read an I/O location > > mtdcr(0x0c0, 0x00002000); // clear critical interrupt > > } > > > > > > Below is a log of the type of crashes that occur: > > > > root@10.34.9.213:/opt/ppmac/ktest# ls > > Segmentation fault > > root@10.34.9.213:/opt/ppmac/ktest# ls > > Segmentation fault > > root@10.34.9.213:/opt/ppmac/ktest# ls > > Makefile ktest.c ktest.ko ktest.mod.o modules.order > > Module.symvers ktest.cbp ktest.mod.c ktest.o > > root@10.34.9.213:/opt/ppmac/ktest# ls > > > > Debian GNU/Linux 7 powerpmac ttyS0 > > > > powerpmac login: root > > > > Debian GNU/Linux 7 powerpmac ttyS0 > > > > powerpmac login: root > > > > Debian GNU/Linux 7 powerpmac ttyS0 > > > > powerpmac login: root > > > > Debian GNU/Linux 7 powerpmac ttyS0 > > > > powerpmac login: root > > Password: > > Last login: Thu Nov 30 20:42:16 UTC 1933 on ttyS0 > > Linux powerpmac 3.2.21-aspen_2.01.09 #10 Mon Aug 19 08:49:12 PDT 2013 > > ppc > > > > The programs included with the Debian GNU/Linux system are free > > software; > > the exact distribution terms for each program are described in the > > individual files in /usr/share/doc/*/copyright. > > > > Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent > > permitted by applicable law. > > INIT: Id "T0" respawning too fast: disabled for 5 minutes > > > > > > ______________________________________________________________________ > > From: "Benjamin Herrenschmidt" <b...@kernel.crashing.org> > > Sent: Saturday, August 17, 2013 3:05 PM > > To: "Kumar Gala" <ga...@kernel.crashing.org> > > Cc: linuxppc-dev@lists.ozlabs.org, hbaus...@deltatau.com > > Subject: Re: Critical Interrupt Input > > > > On Fri, 2013-08-16 at 06:04 -0500, Kumar Gala wrote: > > > The 44x low level code needs to handle exception stacks properly for > > > this to work. Since its possible to have a critical exception occur > > > while in a normal exception level, you have to have proper saving of > > > additional register state and a stack frame for the critical > > > exception, etc. I'm not sure if that was ever done for 44x. > > > > Don't 44x and FSL BookE share the same macros ? I would think 44x does > > indeed implement the same crit support as e500... > > > > What does the crash look like ? > > > > Ben. > > > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev > > > > > > > > Outbound scan for Spam or Virus by Barracuda at Delta Tau _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev