On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > This allows to get rid of unneeded invocations. > > Function debug_lockdep_rcu_enabled() becomes really hot if several > debug options are enabled together with CONFIG_PROVE_RCU. > > Hottest path ends with: > debug_lockdep_rcu_enabled > is_ftrace_trampoline > __kernel_text_address > > Here debug_lockdep_rcu_enabled() is called from condition > (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > do_for_each_ftrace_op(), where "c" is false. > > With this patch "netperf -H localhost" shows boost from 2400 to 2500. > > Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru>
Nice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul > --- > include/linux/rcupdate.h | 2 +- > kernel/rcu/update.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index e1e5d002fdb9..9ecb3cb715bd 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > extern struct lockdep_map rcu_bh_lock_map; > extern struct lockdep_map rcu_sched_lock_map; > extern struct lockdep_map rcu_callback_map; > -int debug_lockdep_rcu_enabled(void); > +int __pure debug_lockdep_rcu_enabled(void); > > int rcu_read_lock_held(void); > int rcu_read_lock_bh_held(void); > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 273e869ca21d..a0c30abefdcd 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); > EXPORT_SYMBOL_GPL(rcu_callback_map); > > -int notrace debug_lockdep_rcu_enabled(void) > +int __pure notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; >