Naveen N. Rao's on May 14, 2019 6:32 pm: > Michael Ellerman wrote: >> "Naveen N. Rao" <naveen.n....@linux.ibm.com> writes: >>> Michael Ellerman wrote: >>>> Nicholas Piggin <npig...@gmail.com> writes: >>>>> The new mprofile-kernel mcount sequence is >>>>> >>>>> mflr r0 >>>>> bl _mcount >>>>> >>>>> Dynamic ftrace patches the branch instruction with a noop, but leaves >>>>> the mflr. mflr is executed by the branch unit that can only execute one >>>>> per cycle on POWER9 and shared with branches, so it would be nice to >>>>> avoid it where possible. >>>>> >>>>> This patch is a hacky proof of concept to nop out the mflr. Can we do >>>>> this or are there races or other issues with it? >>>> >>>> There's a race, isn't there? >>>> >>>> We have a function foo which currently has tracing disabled, so the mflr >>>> and bl are nop'ed out. >>>> >>>> CPU 0 CPU 1 >>>> ================================== >>>> bl foo >>>> nop (ie. not mflr) >>>> -> interrupt >>>> something else enable tracing for foo >>>> ... patch mflr and branch >>>> <- rfi >>>> bl _mcount >>>> >>>> So we end up in _mcount() but with r0 not populated. >>> >>> Good catch! Looks like we need to patch the mflr with a "b +8" similar >>> to what we do in __ftrace_make_nop(). >> >> Would that actually make it any faster though? Nick? > > Ok, how about doing this as a 2-step process? > 1. patch 'mflr r0' with a 'b +8' > synchronize_rcu_tasks() > 2. convert 'b +8' to a 'nop'
Good idea. Well the mflr r0 is harmless, so you can leave that in. You just need to ensure it's not removed before the bl is. So nop the bl _mcount, then synchronize_rcu_tasks(), then nop the mflr? Thanks, Nick