Michael Ellerman wrote:
"Naveen N. Rao" <naveen.n....@linux.ibm.com> writes:
Michael Ellerman wrote:
Nicholas Piggin <npig...@gmail.com> writes:
The new mprofile-kernel mcount sequence is
mflr r0
bl _mcount
Dynamic ftrace patches the branch instruction with a noop, but leaves
the mflr. mflr is executed by the branch unit that can only execute one
per cycle on POWER9 and shared with branches, so it would be nice to
avoid it where possible.
This patch is a hacky proof of concept to nop out the mflr. Can we do
this or are there races or other issues with it?
There's a race, isn't there?
We have a function foo which currently has tracing disabled, so the mflr
and bl are nop'ed out.
CPU 0 CPU 1
==================================
bl foo
nop (ie. not mflr)
-> interrupt
something else enable tracing for foo
... patch mflr and branch
<- rfi
bl _mcount
So we end up in _mcount() but with r0 not populated.
Good catch! Looks like we need to patch the mflr with a "b +8" similar
to what we do in __ftrace_make_nop().
Would that actually make it any faster though? Nick?
Ok, how about doing this as a 2-step process?
1. patch 'mflr r0' with a 'b +8'
synchronize_rcu_tasks()
2. convert 'b +8' to a 'nop'
- Naveen