I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as
follows. The numbers in brackets are with the code on master.
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ### SP/SC single
enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
32): 2
### Testing empty dequeue ###
SC empty dequeue: 2.11
MC empty dequeue: 1.41 (2.11)
### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86) MP/MC bulk enq/dequeue
(size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk enq/dequeue
(size: 32): 25.74 (20.91)
### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66) MP/MC bulk
enq/dequeue (size: 8): 176.02 (173.43) SP/SC bulk enq/dequeue (size:
32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
On one of the Arm platform
MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are
ok)
Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
cores/node (SMT=4). Applied all 3 patches in v5, test results are as
follows:
RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 42
MP/MC single enq/dequeue: 59
SP/SC burst enq/dequeue (size: 8): 5
MP/MC burst enq/dequeue (size: 8): 7
SP/SC burst enq/dequeue (size: 32): 2
MP/MC burst enq/dequeue (size: 32): 2
### Testing empty dequeue ###
SC empty dequeue: 7.81
MC empty dequeue: 7.81
### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 5.76
MP/MC bulk enq/dequeue (size: 8): 7.66
SP/SC bulk enq/dequeue (size: 32): 2.10
MP/MC bulk enq/dequeue (size: 32): 2.57
### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 13.13
MP/MC bulk enq/dequeue (size: 8): 13.98
SP/SC bulk enq/dequeue (size: 32): 3.41
MP/MC bulk enq/dequeue (size: 32): 4.45
### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 11.00
MP/MC bulk enq/dequeue (size: 8): 10.95
SP/SC bulk enq/dequeue (size: 32): 3.08
MP/MC bulk enq/dequeue (size: 32): 3.40
### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 63.41
MP/MC bulk enq/dequeue (size: 8): 62.70
SP/SC bulk enq/dequeue (size: 32): 15.39
MP/MC bulk enq/dequeue (size: 32): 22.96
Dave