Michael Ellerman wrote:
Balbir Singh <bsinghar...@gmail.com> writes:
On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
<mah...@linux.vnet.ibm.com> wrote:
From: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com>
Rebooting into a new kernel with kexec fails in trace_tlbie() which is
called from native_hpte_clear(). This happens if the running kernel has
CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
execute few RCU checks regardless of whether tracing is on or off.
We are already in the last phase of kexec sequence in real mode with
HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
causes kexec to fail.
Effectively we can't enter the trace point code after we've set
HILE_BE. Do we need
a fixes tag? Or is this a side-effect of a new generic change?
Yes I added:
Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
Cc: sta...@vger.kernel.org # v4.13+
I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
and have the condition do the right thing, but what you have for now is good.
No I think the right thing is to not call trace points from kexec code,
it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
RCU breakage.
I agree on the fragile part, though it appears to me that a
TRACE_EVENT_CONDITION() with a check for is_kexec (that needs to be
added) will prevent breakage since both the LOCKDEP block as well as the
tracepoint itself are guarded by the condition. So, none of the rcu code
should be executed as long as we set is_kexec at the right time.
However, since there are all of 1 tracepoint(s) affecting kexec, it is
probably not worth the effort at the moment.
- Naveen