<snip> > > > > > > I have applied your > > > > suggestion in 6/6 in v6 along with my corrections. The > > > > rte_ring_elem test cases are added in 3/6. I have verified that they are > running fine (they are done for 64b alone, will add more). Hopefully, there > are > no more errors. > > > > Applied v6 and re-run the tests. > > Functional test passes ok on my boxes. > > Pert-tests numbers below. > > As I can see pretty much same pattern as in v5 remains: > > MP/MC on 2 different cores > > Forgot to add: for 8 elems, for 32 - new ones always better. > > > and SP/SC single enq/deq > > show lower numbers for _elem_. > > For others _elem_ numbers are about the same or higher. > > Personally, I am ok to go ahead with these changes. > > Konstantin > > > > A - ring_perf_autotes > > B - ring_perf_elem_autotest > > > > ### Testing single element and burst enq/deq ### A B > > SP/SC single enq/dequeue: 8.27 10.94 > > MP/MC single enq/dequeue: 56.11 47.43 > > SP/SC burst enq/dequeue (size: 8): 4.20 3.50 > > MP/MC burst enq/dequeue (size: 8): 9.93 9.29 > > SP/SC burst enq/dequeue (size: 32): 2.93 1.94 > > MP/MC burst enq/dequeue (size: 32): 4.10 3.35 > > > > ### Testing empty dequeue ### > > SC empty dequeue: 2.00 3.00 > > MC empty dequeue: 3.00 2.00 > > > > ### Testing using a single lcore ### > > SP/SC bulk enq/dequeue (size: 8): 4.06 3.30 > > MP/MC bulk enq/dequeue (size: 8): 9.84 9.28 > > SP/SC bulk enq/dequeue (size: 32): 2.93 1.88 > > MP/MC bulk enq/dequeue (size: 32): 4.10 3.32 > > > > ### Testing using two hyperthreads ### > > SP/SC bulk enq/dequeue (size: 8): 9.22 8.83 > > MP/MC bulk enq/dequeue (size: 8): 15.73 15.86 > > SP/SC bulk enq/dequeue (size: 32): 5.78 3.83 > > MP/MC bulk enq/dequeue (size: 32): 6.33 4.53 > > > > ### Testing using two physical cores ### > > SP/SC bulk enq/dequeue (size: 8): 23.78 19.32 > > MP/MC bulk enq/dequeue (size: 8): 68.54 71.97 > > SP/SC bulk enq/dequeue (size: 32): 11.99 10.77 > > MP/MC bulk enq/dequeue (size: 32): 21.96 18.66 > > > > ### Testing using two NUMA nodes ### > > SP/SC bulk enq/dequeue (size: 8): 50.13 33.92 > > MP/MC bulk enq/dequeue (size: 8): 177.98 195.87 > > SP/SC bulk enq/dequeue (size: 32): 32.98 23.12 > > MP/MC bulk enq/dequeue (size: 32): 55.86 48.76
Thanks Konstantin. The performance of 5/6 is mostly worst than 6/6. So, we should not consider 5/6 (will not be included in the future). A - ring_perf_autotest (existing code) B - ring_perf_elem_autotest (6/6) Numbers from my side: On one Arm platform: ### Testing single element and burst enq/deq ### A B SP/SC single enq/dequeue: 1.04 1.06 (1.92) MP/MC single enq/dequeue: 1.46 1.51 (3.42) SP/SC burst enq/dequeue (size: 8): 0.18 0.17 (-5.55) MP/MC burst enq/dequeue (size: 8): 0.23 0.22 (-4.34) SP/SC burst enq/dequeue (size: 32): 0.05 0.05 (0) MP/MC burst enq/dequeue (size: 32): 0.07 0.06 (-14.28) ### Testing empty dequeue ### SC empty dequeue: 0.27 0.27 (0) MC empty dequeue: 0.27 0.27 (0) ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 0.18 0.17 (-5.55) MP/MC bulk enq/dequeue (size: 8): 0.23 0.23 (0) SP/SC bulk enq/dequeue (size: 32): 0.05 0.05 (0) MP/MC bulk enq/dequeue (size: 32): 0.07 0.06 (0) ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 0.79 0.79 (0) MP/MC bulk enq/dequeue (size: 8): 1.42 1.37 (-3.52) SP/SC bulk enq/dequeue (size: 32): 0.20 0.20 (0) MP/MC bulk enq/dequeue (size: 32): 0.33 0.35 (6.06) On another Arm platform: ### Testing single element and burst enq/deq ### A B SP/SC single enq/dequeue: 11.54 11.79 (2.16) MP/MC single enq/dequeue: 11.84 12.54 (5.91) SP/SC burst enq/dequeue (size: 8): 1.51 1.33 (-11.92) MP/MC burst enq/dequeue (size: 8): 1.91 1.73 (-9.42) SP/SC burst enq/dequeue (size: 32): 0.62 0.42 (-32.25) MP/MC burst enq/dequeue (size: 32): 0.72 0.52 (-27.77) ### Testing empty dequeue ### SC empty dequeue: 2.48 2.48 (0) MC empty dequeue: 2.48 2.48 (0) ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 1.52 1.33 (-12.5) MP/MC bulk enq/dequeue (size: 8): 1.92 1.73 (-9.89) SP/SC bulk enq/dequeue (size: 32): 0.62 0.42 (-32.25) MP/MC bulk enq/dequeue (size: 32): 0.72 0.52 (-27.77) ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 6.30 6.57 (4.28) MP/MC bulk enq/dequeue (size: 8): 10.59 10.45 (-1.32) SP/SC bulk enq/dequeue (size: 32): 1.92 1.58 (-17.70) MP/MC bulk enq/dequeue (size: 32): 2.51 2.47 (-1.59) From my side, I would say let us just go with patch 2/6. Jerin/David, any opinion on your side?