On Thu, Jan 16, 2025 at 1:55 PM Paul E. McKenney <paul...@kernel.org> wrote: > > On Thu, Jan 16, 2025 at 01:00:24PM -0800, Alexei Starovoitov wrote: > > On Thu, Jan 16, 2025 at 12:21 PM Paul E. McKenney <paul...@kernel.org> > > wrote: > > > > > > +/* > > > + * Counts the new reader in the appropriate per-CPU element of the > > > + * srcu_struct. Returns a pointer that must be passed to the matching > > > + * srcu_read_unlock_fast(). > > > + * > > > + * Note that this_cpu_inc() is an RCU read-side critical section either > > > + * because it disables interrupts, because it is a single instruction, > > > + * or because it is a read-modify-write atomic operation, depending on > > > + * the whims of the architecture. > > > + */ > > > +static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct > > > srcu_struct *ssp) > > > +{ > > > + struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp); > > > + > > > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching > > > srcu_read_lock_fast()."); > > > + this_cpu_inc(scp->srcu_locks.counter); /* Y */ > > > + barrier(); /* Avoid leaking the critical section. */ > > > + return scp; > > > +} > > > > This doesn't look fast. > > If I'm reading this correctly, > > even with debugs off RCU_LOCKDEP_WARN() will still call > > rcu_is_watching() and this doesn't look cheap or fast. > > Here is the CONFIG_PROVE_RCU=n definition: > > #define RCU_LOCKDEP_WARN(c, s) do { } while (0 && (c)) > > The "0" in the "0 && (c)" should prevent that call to rcu_is_watching(). > > But why not see what the compiler thinks? I added the following function > to kernel/rcu/srcutree.c: > > struct srcu_ctr __percpu *test_srcu_read_lock_fast(struct srcu_struct *ssp) > { > struct srcu_ctr __percpu *p; > > p = srcu_read_lock_fast(ssp); > return p; > } > > This function compiles to the following code: > > Dump of assembler code for function test_srcu_read_lock_fast: > 0xffffffff811220c0 <+0>: endbr64 > 0xffffffff811220c4 <+4>: sub $0x8,%rsp > 0xffffffff811220c8 <+8>: mov 0x8(%rdi),%rax > 0xffffffff811220cc <+12>: add %gs:0x7eef3944(%rip),%rax # > 0x15a18 <this_cpu_off> > 0xffffffff811220d4 <+20>: mov 0x20(%rax),%eax > 0xffffffff811220d7 <+23>: test $0x8,%al > 0xffffffff811220d9 <+25>: je 0xffffffff811220eb > <test_srcu_read_lock_fast+43> > 0xffffffff811220db <+27>: mov (%rdi),%rax > 0xffffffff811220de <+30>: incq %gs:(%rax) > 0xffffffff811220e2 <+34>: add $0x8,%rsp > 0xffffffff811220e6 <+38>: jmp 0xffffffff81f5fe60 <__x86_return_thunk> > 0xffffffff811220eb <+43>: mov $0x8,%esi > 0xffffffff811220f0 <+48>: mov %rdi,(%rsp) > 0xffffffff811220f4 <+52>: call 0xffffffff8111fb90 > <__srcu_check_read_flavor> > 0xffffffff811220f9 <+57>: mov (%rsp),%rdi > 0xffffffff811220fd <+61>: jmp 0xffffffff811220db > <test_srcu_read_lock_fast+27> > > The first call to srcu_read_lock_fast() invokes __srcu_check_read_flavor(), > but after that the "je" instruction will fall through. So the common-case > code path executes only the part of this function up to and including the > "jmp 0xffffffff81f5fe60 <__x86_return_thunk>". > > Does that serve?
Thanks for checking. I was worried that in case of: while (0 && (c)) the compiler might need to still emit (c) since it cannot prove that there are no side effects there. I vaguely recall now that C standard allows to ignore 2nd expression when the first expression satisfies the boolean condition.