On Thu, 13 Dec 2018 11:37:43 +0800 Joyce Kong <joyce.k...@arm.com> wrote:
> v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test > on all available cores to benchmark the improvement. > > We tested the patches on three arm64 platforms, ThundeX2 gained 20% > performance, > Qualcomm gained 36% and the 4-Cortex-A72 Marvell MACCHIATObin gained 19.6%. > Below is the detailed test result on ThunderX2: > > *** rwlock_autotest without __atomic builtins *** > Rwlock Perf Test on 128 cores... > Core [0] count = 281 > Core [1] count = 252 > Core [2] count = 290 > Core [3] count = 259 > Core [4] count = 287 > ... > Core [209] count = 3 > Core [210] count = 31 > Core [211] count = 120 > Total count = 18537 > > *** rwlock_autotest with __atomic builtins *** > Rwlock Perf Test on 128 cores... > Core [0] count = 346 > Core [1] count = 355 > Core [2] count = 259 > Core [3] count = 285 > Core [4] count = 320 > ... > Core [209] count = 2 > Core [210] count = 23 > Core [211] count = 63 > Total count = 22194 > > Gavin Hu (1): > rwlock: reimplement with __atomic builtins > > Joyce Kong (1): > test/rwlock: add perf test case > > lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- > test/test/test_rwlock.c | 71 > ++++++++++++++++++++++ > 2 files changed, 79 insertions(+), 8 deletions(-) > Did you consider using a better algorithm not just better primitives. See https://locklessinc.com/articles/locks/ for a more complete discussion of alternatives like ticket locks.