On Mon, 2012-12-10 at 10:04 +0000, Will Deacon wrote: > Hi Jon, > > Back-pedalling a bit here, but I'm confused by one of your points below: > > On Fri, Dec 07, 2012 at 05:45:47PM +0000, Jon Medhurst (Tixy) wrote: > > On Fri, 2012-12-07 at 12:13 -0500, Steven Rostedt wrote: > > > I'll make my question more general: > > > > > > If I have a nop, that is a size of a call (branch and link), which is > > > near the beginning of a function and not part of any conditional, and I > > > want to convert it into a call (branch and link), would adding a > > > breakpoint to it, modifying it to the call, and then removing the > > > breakpoint be possible? Of course it would require syncing in between > > > steps, but my question is, if the above is possible on a thumb2 ARM > > > processor? > > > > I believe so. The details are (repeating your earlier explanation) ... > > > > 1. Replace first half of nop with 16bit 'breakpoint' instruction. > > Sort of -- you'd actually need 2x16-bit nops to make this work.
Why? > > > 2. Sync.(cache flush to PoU + IPIs to make other cores invalidate the > > icache for changed part of the nop instruction). > > Why do you need to use IPIs for I-cache invalidation on other cores? For > ARMv7 SMP (i.e. the multi-processing extensions) doing I-cache invalidation > by MVA to PoU will be broadcast to the applicable domain for the > shareability attributes of the address. So if you do icimvau with an > inner-shareable virtual address, it will be broadcast by the hardware. > > > However, wouldn't we need any of this breakpoint malarkey, why not just > > just use a 16-bit branch instruction which branches over the second half > > of the nop? :-) > > Yes, and I think if you do use two 16-bit nops, you can even get rid of all > the intermediate `sync' operations (I guess you might want one at the end if > you want the call to become visible at a particular point). Wont work. We are replacing a 32bit call with a nop. That nop must also be 32bits, because we could eventually replace the nop(s) with a 32bit call. Basically, we can never allow the second 16bit part ever be the next instruction. If the first 16bit nop is executed, and then the task gets preempted. The nops get converted to a 32bit call. The task gets scheduled again and now is executing the second 16bits of the 32bit call and we get unexpected (probably crashing) results. By having either a 16bit breakpoint whose handler returns after the second 16bit part, or a 16bit jump that simply jumps over the second half, then all this should work. When the CPU processes a 32bit instruction, it either processes all or non of it, correct? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/