Naveen N. Rao's on May 17, 2019 4:22 am: > Nicholas Piggin wrote: >> Naveen N. Rao's on May 14, 2019 6:32 pm: >>> Michael Ellerman wrote: >>>> "Naveen N. Rao" <naveen.n....@linux.ibm.com> writes: >>>>> Michael Ellerman wrote: >>>>>> Nicholas Piggin <npig...@gmail.com> writes: >>>>>>> The new mprofile-kernel mcount sequence is >>>>>>> >>>>>>> mflr r0 >>>>>>> bl _mcount >>>>>>> >>>>>>> Dynamic ftrace patches the branch instruction with a noop, but leaves >>>>>>> the mflr. mflr is executed by the branch unit that can only execute one >>>>>>> per cycle on POWER9 and shared with branches, so it would be nice to >>>>>>> avoid it where possible. >>>>>>> >>>>>>> This patch is a hacky proof of concept to nop out the mflr. Can we do >>>>>>> this or are there races or other issues with it? >>>>>> >>>>>> There's a race, isn't there? >>>>>> >>>>>> We have a function foo which currently has tracing disabled, so the mflr >>>>>> and bl are nop'ed out. >>>>>> >>>>>> CPU 0 CPU 1 >>>>>> ================================== >>>>>> bl foo >>>>>> nop (ie. not mflr) >>>>>> -> interrupt >>>>>> something else enable tracing for foo >>>>>> ... patch mflr and branch >>>>>> <- rfi >>>>>> bl _mcount >>>>>> >>>>>> So we end up in _mcount() but with r0 not populated. >>>>> >>>>> Good catch! Looks like we need to patch the mflr with a "b +8" similar >>>>> to what we do in __ftrace_make_nop(). >>>> >>>> Would that actually make it any faster though? Nick? >>> >>> Ok, how about doing this as a 2-step process? >>> 1. patch 'mflr r0' with a 'b +8' >>> synchronize_rcu_tasks() >>> 2. convert 'b +8' to a 'nop' >> >> Good idea. Well the mflr r0 is harmless, so you can leave that in. >> You just need to ensure it's not removed before the bl is. So nop >> the bl _mcount, then synchronize_rcu_tasks(), then nop the mflr? > > The problem actually seems to be when we try to patch in the branch to > _mcount(), rather than when we are patching in the nop instructions > (i.e., the race is when we try to enable the function tracer, rather > than while disabling it). > > When we disable ftrace, we only need to ensure we patch out the branch > to _mcount() before patching out the preceding 'mflr r0'. I don't think > we need a synchronize_rcu_tasks() in that case.
That's probably right. > While enabling ftrace, we will first need to patch the preceding 'mflr > r0' (which would now be a 'nop') with 'b +8', then use > synchronize_rcu_tasks() and finally patch in 'bl _mcount()' followed by > 'mflr r0'. > > I think that's what you meant, just that my reference to patching 'mflr > r0' with a 'b +8' should have called out that the mflr would have been > nop'ed out. I meant that we don't need the b +8 anywhere, because the mflr r0 is harmless. Enabling ftrace just needs to patch in 'mflr r0', and then synchronize_rcu_tasks(), and then patch in 'bl _mcount'. I think? Thanks, Nick