On Fri, Apr 20, 2018 at 12:42:10 +0800, Peter Xu wrote: > Put all the shared qemu-thread implementations into these files. The > header should be internal to qemu-thread but not for qemu-thread users. > > Introduce some hooks correspondingly for the shared part. Note that in > qemu_mutex_unlock_impl() we moved the call before unlock operation which > should make more sense. And we don't need qemu_mutex_post_unlock() hook. > > Currently the hooks only calls the tracepoints. > > Signed-off-by: Peter Xu <pet...@redhat.com> (snip) > - trace_qemu_mutex_lock(mutex, file, line); > - > + qemu_mutex_pre_lock(mutex, file, line); > err = pthread_mutex_lock(&mutex->lock); > if (err) > error_exit(err, __func__); > - > - trace_qemu_mutex_locked(mutex, file, line); > + qemu_mutex_post_lock(mutex, file, line); > }
I see the value in consolidating these calls. However, having a separate object means that this adds two function calls to mutex_lock. This significantly reduces performance, even without --enable-debug-mutex: - Before: $ taskset -c 0 tests/atomic_add-bench -n 1 -m Parameters: # of threads: 1 duration: 1 ops' range: 1024 Results: Duration: 1 s Throughput: 57.24 Mops/s Throughput/thread: 57.24 Mops/s/thread - After: $ taskset -c 0 tests/atomic_add-bench -n 1 -m Parameters: # of threads: 1 duration: 1 ops' range: 1024 Results: Duration: 1 s Throughput: 49.22 Mops/s Throughput/thread: 49.22 Mops/s/thread So either inlines/macros should be used instead -- I'd prefer inlines but I'm not sure they'll work with the tracing calls. I think you should cherry-pick this patch[1] and add it to the series -- it'll let you make sure the series does not affect performance. Cheers, Emilio [1] https://github.com/cota/qemu/commit/f04f34df