On Wed, Jun 02, 2010 at 07:45:27AM -0500, Kumar Gala wrote: > Why do we need to have emu support for all of these instructions?
Fair question. This arose in the context of the support for data breakpoint events in perf_events. Since the data breakpoint facility on our processors (DABR on server, DAC/DVC on Book 3E) interrupts before doing the access, we have to execute the instruction that caused the breakpoint without the data breakpoint set, then put the data breakpoint back and carry on. The interesting case comes when the interrupt occurs on a lwarx/ldarx. If we just single-step it, we'll lose the reservation and most likely get into an infinite loop, making no progress. So we have two alternatives: either try to arrange that we can single-step the lwarx and get to the stwcx without losing the reservation, or emulate the lwarx and all the instructions up to and including the stwcx. The first alternative seemed pretty fragile to me since it means that we have to arrange that we can single-step and take data breakpoints without using any spinlocks, mutexes or atomic ops (including bitops). Also, the architecture says that some embedded implementations might clear the reservation on taking an interrupt (which presumably could include debug interrupts). The second alternative -- emulating the lwarx/stwcx and all the instructions in between -- sounds complicated but turns out to be pretty straightforward in fact, since the code for each instruction is pretty small, easy to verify that it's correct, and has little interaction with other code. Note that we have to do this emulation both for the kernel and for user code, since a data breakpoint event could occur in the kernel or in usermode. While we can constrain what occurs between lwarx/stwcx in the kernel pretty tightly, userspace is not so well constrained, so I though it best to do all the integer ops that can be done reasonably easily and can occur in C code. The other thing I want to do is use this to replace the alignment fixup code, since they're doing very similar things now. That will need little-endian support plus implementing the rest of the Altivec and VSX loads and stores, along with dcbz, l/stswi, l/stswx, etc. Finally, emulating should be faster than single-stepping, and so extending the set of emulated instructions should improve the performance of kprobes and uprobes. Paul. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev