On Thu, 20 Feb 2025 at 16:57, Sebastian Andrzej Siewior <bige...@linutronix.de> wrote: [...] > Now. Based on this: > The RCU read section increased the runtime (on my hardware) for the test > from 30 to 43 seconds which is roughly 43%. > This is due to the lockdep annotation within rcu_read_lock() and > unlock() which is not existing in preempt_disable(). After disabling > UBSAN + KASAN the lockdep annotation has no effect. My guess that > UBSAN/ KASAN is in charge of countless backtraces while enabled. Those > backtraces seem to be limited to the core kernel. > > How much do we care here? Is this something that makes UBSAN + KASAN > folks uncomfortable? Or is lockdep slowing things down anyway?
Does this series from Waiman help? https://lore.kernel.org/all/20250213200228.1993588-4-long...@redhat.com/ > If so, we could either move the RCU section down (as in #5) so it is not > used that often or go the other direction and move it up. I got this: > | ~# time ./hardware_disable_test > | Random seed: 0x6b8b4567 > | > | real 0m32.618s > | user 0m0.537s > | sys 0m13.942s > > which is almost the pre-level with the hunk below after figuring out > that most callers are from arch_stack_walk(). > > diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h > index 7cede4dc21f0..f20e3613942f 100644 > --- a/arch/x86/include/asm/unwind.h > +++ b/arch/x86/include/asm/unwind.h > @@ -42,6 +42,7 @@ struct unwind_state { > void __unwind_start(struct unwind_state *state, struct task_struct *task, > struct pt_regs *regs, unsigned long *first_frame); > bool unwind_next_frame(struct unwind_state *state); > +bool unwind_next_frame_unlocked(struct unwind_state *state); > unsigned long unwind_get_return_address(struct unwind_state *state); > unsigned long *unwind_get_return_address_ptr(struct unwind_state *state); > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > index ee117fcf46ed..4df346b11f1e 100644 > --- a/arch/x86/kernel/stacktrace.c > +++ b/arch/x86/kernel/stacktrace.c > @@ -21,8 +21,9 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, > void *cookie, > if (regs && !consume_entry(cookie, regs->ip)) > return; > > + guard(rcu)(); > for (unwind_start(&state, task, regs, NULL); !unwind_done(&state); > - unwind_next_frame(&state)) { > + unwind_next_frame_unlocked(&state)) { > addr = unwind_get_return_address(&state); > if (!addr || !consume_entry(cookie, addr)) > break; > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > index 977ee75e047c..402779b3e90a 100644 > --- a/arch/x86/kernel/unwind_orc.c > +++ b/arch/x86/kernel/unwind_orc.c > @@ -465,7 +465,7 @@ static bool get_reg(struct unwind_state *state, unsigned > int reg_off, > return false; > } > > -bool unwind_next_frame(struct unwind_state *state) > +bool unwind_next_frame_unlocked(struct unwind_state *state) > { > unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp; > enum stack_type prev_type = state->stack_info.type; > @@ -475,9 +475,6 @@ bool unwind_next_frame(struct unwind_state *state) > if (unwind_done(state)) > return false; > > - /* Don't let modules unload while we're reading their ORC data. */ > - guard(rcu)(); > - > /* End-of-stack check for user tasks: */ > if (state->regs && user_mode(state->regs)) > goto the_end; > @@ -678,6 +675,13 @@ bool unwind_next_frame(struct unwind_state *state) > state->stack_info.type = STACK_TYPE_UNKNOWN; > return false; > } > + > +bool unwind_next_frame(struct unwind_state *state) > +{ > + /* Don't let modules unload while we're reading their ORC data. */ > + guard(rcu)(); > + return unwind_next_frame_unlocked(state); > +} > EXPORT_SYMBOL_GPL(unwind_next_frame); > > void __unwind_start(struct unwind_state *state, struct task_struct *task, > > Sebastian