Nicholas Piggin wrote:
Naveen N. Rao's on May 14, 2019 6:32 pm:
Michael Ellerman wrote:
"Naveen N. Rao" <naveen.n....@linux.ibm.com> writes:
Michael Ellerman wrote:
Nicholas Piggin <npig...@gmail.com> writes:
The new mprofile-kernel mcount sequence is

  mflr  r0
  bl    _mcount

Dynamic ftrace patches the branch instruction with a noop, but leaves
the mflr. mflr is executed by the branch unit that can only execute one
per cycle on POWER9 and shared with branches, so it would be nice to
avoid it where possible.

This patch is a hacky proof of concept to nop out the mflr. Can we do
this or are there races or other issues with it?

There's a race, isn't there?

We have a function foo which currently has tracing disabled, so the mflr
and bl are nop'ed out.

  CPU 0                 CPU 1
  ==================================
  bl foo
  nop (ie. not mflr)
  -> interrupt
  something else        enable tracing for foo
  ...                   patch mflr and branch
  <- rfi
  bl _mcount

So we end up in _mcount() but with r0 not populated.

Good catch! Looks like we need to patch the mflr with a "b +8" similar to what we do in __ftrace_make_nop().

Would that actually make it any faster though? Nick?

Ok, how about doing this as a 2-step process?
1. patch 'mflr r0' with a 'b +8'
   synchronize_rcu_tasks()
2. convert 'b +8' to a 'nop'

Good idea. Well the mflr r0 is harmless, so you can leave that in.
You just need to ensure it's not removed before the bl is. So nop
the bl _mcount, then synchronize_rcu_tasks(), then nop the mflr?

The problem actually seems to be when we try to patch in the branch to _mcount(), rather than when we are patching in the nop instructions (i.e., the race is when we try to enable the function tracer, rather than while disabling it).

When we disable ftrace, we only need to ensure we patch out the branch to _mcount() before patching out the preceding 'mflr r0'. I don't think we need a synchronize_rcu_tasks() in that case.

While enabling ftrace, we will first need to patch the preceding 'mflr r0' (which would now be a 'nop') with 'b +8', then use synchronize_rcu_tasks() and finally patch in 'bl _mcount()' followed by 'mflr r0'.

I think that's what you meant, just that my reference to patching 'mflr r0' with a 'b +8' should have called out that the mflr would have been nop'ed out.

- Naveen

Reply via email to