On Fri, Mar 09, 2018 at 14:29:22 +0100, Paolo Bonzini wrote: > Actually enable the global memory barriers if supported by the OS. > Because only recent versions of Linux include the support, they > are disabled by default. Note that it also has to be disabled > for QEMU to run under Wine. > > Before this patch, rcutorture reports 85 ns/read for my machine, > after the patch it reports 12.5 ns/read. On the other hand updates > go from 50 *micro*seconds to 20 *milli*seconds.
It is indeed hard to see a large impact on performance given the large size of our critical sections. But hey, rcu_read_unlock goes down from 0.24% to 0.08% of execution time when booting aarch64 linux! As we remove bottlenecks though we should be able to gain more benefits from this, at least in MTTCG where vcpu threads exit the execution loop quite often. I did some tests on qht-bench, moving the rcu_read_lock/unlock pair to wrap each lookup instead of wrapping the entire test. The results are great; without membarrier lookup throughput goes down by half; with it, throughput only goes down by 5%. (snip) > +########################################## > +# check for usable membarrier system call > +if test "$membarrier" = "yes"; then > + have_membarrier=no > + if test "$mingw32" = "yes" ; then > + have_membarrier=yes > + elif test "$linux" = "yes" ; then > + cat > $TMPC << EOF > + #include <linux/membarrier.h> > + #include <sys/syscall.h> > + #include <unistd.h> > + int main(void) { > + syscall(__NR_membarrier, MEMBARRIER_CMD_QUERY, 0); > + syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0); > + } I think we should also check here that MEMBARRIER_CMD_SHARED is actually supported; it is possible for a kernel to have the system call yet not support it (e.g. when the kernel is compiled with nohz_full). Instead of failing at run-time (in smp_mb_global_init) we should perhaps bark at configure time as well. Other than that the patches look good. Thanks for doing this! Emilio