On 05/21/2016 01:42 PM, Emilio G. Cota wrote:
In the process, the atomic_rcu_read/set were converted to implement
consume/release semantics, respectively. This is inefficient; for
correctness and maximum performance we only need an smp_barrier_depends
for reads, and an smp_wmb for writes. Fix it by using the original
definition of these two primitives for all compilers.
For what host do you think this is inefficient?
In particular, what you've done is going to be less efficient for e.g. armv8,
where the __atomic formulation is going to produce load-acquire and
store-release instructions. Whereas the separate barriers are going to produce
two insns.
As for the common case of x86_64, what you're doing is going to make no
difference at all.
So what are you trying to improve?
r~