On Fri, Sep 25, 2020 at 7:44 PM Steven Lariau <steven.lar...@arm.com> wrote: > > One implementation of the DPDK stack library is lockfree, > based on C11 memory model for atomics. > Some of these atomic operations use unnecessary memory orders, > that can be relaxed. > This patch relax some of these operations in order to improve > the performance of the stack library. > > The patch was tested on several architectures, to ensure that > the implementation is correct, and to measure performance. > Below are the results for a few architectures on multithread stack > lockfree test. > The cycles count is the average number of cycles per item to perform > a bulk push / pop. > > $sudo ./builddir/app/dpdk-test > RTE>>stack_lf_perf_autotest > difference compared to main > Cycles count on ThunderX2 > 2 cores, bulk size = 8: -15.85% > 2 cores, bulk size = 32: -04.56% > 4 cores, bulk size = 8: -05.00% > 4 cores, bulk size = 32: -04.35% > 16 cores, bulk size = 8: -02.38% > 16 cores, bulk size = 32: -01.88% > > difference compared to main > Cycles count on N1SDP > 2 cores, batch size = 8: +00.77% > 2 cores, batch size = 32: -16.00% > > difference compared to main > Cycles count on Skylake > 2 cores, bulk size = 8: -00.18% > 2 cores, bulk size = 32: -00.95% > 4 cores, bulk size = 8: -01.19% > 4 cores, bulk size = 32: +00.64% > 16 cores, bulk size = 8: +01.20% > 16 cores, bulk size = 32: +00.48% > > v2: add comment to explain why pop head CAS relaxed is valid > added Fixes information > > Steven Lariau (5): > lib/stack: fix inconsistent weak / strong cas > lib/stack: remove push acquire fence > lib/stack: remove redundant orderings for list->len > lib/stack: reload head when pop fails > lib/stack: remove pop cas release ordering > > lib/librte_stack/rte_stack_lf_c11.h | 32 +++++++++++++++++++---------- > 1 file changed, 21 insertions(+), 11 deletions(-)
Series applied, thanks for those optimisations. -- David Marchand