Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
follows:

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
MP/MC burst enq/dequeue (size: 32): 2

### Testing empty dequeue ###
SC empty dequeue: 7.81
MC empty dequeue: 7.81

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 5.76
MP/MC bulk enq/dequeue (size: 8): 7.66
SP/SC bulk enq/dequeue (size: 32): 2.10
MP/MC bulk enq/dequeue (size: 32): 2.57

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 13.13
MP/MC bulk enq/dequeue (size: 8): 13.98
SP/SC bulk enq/dequeue (size: 32): 3.41
MP/MC bulk enq/dequeue (size: 32): 4.45

### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
(size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 63.41
MP/MC bulk enq/dequeue (size: 8): 62.70
SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
32): 22.96

Thanks for running this. There is another test 'ring_perf_autotest' which 
provides the numbers with the original implementation. The goal is to make sure 
the numbers with the original implementation are the same as these. Can you 
please run that as well?

RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 42
MP/MC single enq/dequeue: 59
SP/SC burst enq/dequeue (size: 8): 6
MP/MC burst enq/dequeue (size: 8): 8
SP/SC burst enq/dequeue (size: 32): 2
MP/MC burst enq/dequeue (size: 32): 3

### Testing empty dequeue ###
SC empty dequeue: 7.81
MC empty dequeue: 7.81

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 6.91
MP/MC bulk enq/dequeue (size: 8): 8.87
SP/SC bulk enq/dequeue (size: 32): 2.55
MP/MC bulk enq/dequeue (size: 32): 3.04

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 11.70
MP/MC bulk enq/dequeue (size: 8): 13.56
SP/SC bulk enq/dequeue (size: 32): 3.48
MP/MC bulk enq/dequeue (size: 32): 3.95

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 10.86
MP/MC bulk enq/dequeue (size: 8): 11.11
SP/SC bulk enq/dequeue (size: 32): 2.97
MP/MC bulk enq/dequeue (size: 32): 3.43

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 48.07
MP/MC bulk enq/dequeue (size: 8): 67.38
SP/SC bulk enq/dequeue (size: 32): 13.04
MP/MC bulk enq/dequeue (size: 32): 27.10
Test OK

Dave

Reply via email to