<snip>
> >
> > > > I have applied your
> > > > suggestion in 6/6 in v6 along with my corrections. The
> > > > rte_ring_elem test cases are added in 3/6. I have verified that they are
> running fine (they are done for 64b alone, will add more). Hopefully, there 
> are
> no more errors.
> >
> > Applied v6 and re-run the tests.
> > Functional test passes ok on my boxes.
> > Pert-tests numbers below.
> > As I can see pretty much same pattern as in v5 remains:
> > MP/MC on 2 different cores
> 
> Forgot to add: for 8 elems, for 32 - new ones always better.
> 
> > and SP/SC single enq/deq
> > show lower numbers for _elem_.
> > For others _elem_ numbers are about the same or higher.
> > Personally, I am ok to go ahead with these changes.
> > Konstantin
> >
> > A - ring_perf_autotes
> > B - ring_perf_elem_autotest
> >
> >  ### Testing single element and burst enq/deq ###   A       B
> > SP/SC single enq/dequeue:                           8.27    10.94
> > MP/MC single enq/dequeue:                           56.11   47.43
> > SP/SC burst enq/dequeue (size: 8):                  4.20    3.50
> > MP/MC burst enq/dequeue (size: 8):                  9.93    9.29
> > SP/SC burst enq/dequeue (size: 32):                         2.93    1.94
> > MP/MC burst enq/dequeue (size: 32):                         4.10    3.35
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue:                                   2.00    3.00
> > MC empty dequeue:                                   3.00    2.00
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8):                   4.06    3.30
> > MP/MC bulk enq/dequeue (size: 8):                   9.84    9.28
> > SP/SC bulk enq/dequeue (size: 32):                  2.93    1.88
> > MP/MC bulk enq/dequeue (size: 32):                  4.10    3.32
> >
> > ### Testing using two hyperthreads ###
> > SP/SC bulk enq/dequeue (size: 8):                   9.22    8.83
> > MP/MC bulk enq/dequeue (size: 8):                   15.73   15.86
> > SP/SC bulk enq/dequeue (size: 32):                  5.78    3.83
> > MP/MC bulk enq/dequeue (size: 32):                  6.33    4.53
> >
> > ### Testing using two physical cores ###
> > SP/SC bulk enq/dequeue (size: 8):                   23.78   19.32
> > MP/MC bulk enq/dequeue (size: 8):                   68.54   71.97
> > SP/SC bulk enq/dequeue (size: 32):                  11.99   10.77
> > MP/MC bulk enq/dequeue (size: 32):                  21.96   18.66
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8):                   50.13   33.92
> > MP/MC bulk enq/dequeue (size: 8):                   177.98  195.87
> > SP/SC bulk enq/dequeue (size: 32):                  32.98   23.12
> > MP/MC bulk enq/dequeue (size: 32):                  55.86   48.76

Thanks Konstantin. The performance of 5/6 is mostly worst than 6/6. So, we 
should not consider 5/6 (will not be included in the future).
A - ring_perf_autotest (existing code)
B - ring_perf_elem_autotest (6/6)

Numbers from my side:
On one Arm platform:
### Testing single element and burst enq/deq ###        A       B
SP/SC single enq/dequeue:                               1.04    1.06 (1.92)
MP/MC single enq/dequeue:                               1.46    1.51 (3.42)
SP/SC burst enq/dequeue (size: 8):                      0.18    0.17 (-5.55)
MP/MC burst enq/dequeue (size: 8):                      0.23    0.22 (-4.34)
SP/SC burst enq/dequeue (size: 32):                     0.05    0.05 (0)
MP/MC burst enq/dequeue (size: 32):                     0.07    0.06 (-14.28)
        
### Testing empty dequeue ###   
SC empty dequeue:                                       0.27    0.27 (0)
MC empty dequeue:                                       0.27    0.27 (0)
        
### Testing using a single lcore ###    
SP/SC bulk enq/dequeue (size: 8):                       0.18    0.17 (-5.55)
MP/MC bulk enq/dequeue (size: 8):                       0.23    0.23 (0)
SP/SC bulk enq/dequeue (size: 32):                      0.05    0.05 (0)
MP/MC bulk enq/dequeue (size: 32):                      0.07    0.06 (0)
        
### Testing using two physical cores ###        
SP/SC bulk enq/dequeue (size: 8):                       0.79    0.79 (0)
MP/MC bulk enq/dequeue (size: 8):                       1.42    1.37 (-3.52)
SP/SC bulk enq/dequeue (size: 32):                      0.20    0.20 (0)
MP/MC bulk enq/dequeue (size: 32):                      0.33    0.35 (6.06)

On another Arm platform:

### Testing single element and burst enq/deq ###        A       B       
SP/SC single enq/dequeue:                               11.54   11.79 (2.16)
MP/MC single enq/dequeue:                               11.84   12.54 (5.91)
SP/SC burst enq/dequeue (size: 8):                      1.51    1.33   (-11.92)
MP/MC burst enq/dequeue (size: 8):                      1.91    1.73   (-9.42)
SP/SC burst enq/dequeue (size: 32):                     0.62    0.42   (-32.25)
MP/MC burst enq/dequeue (size: 32):                     0.72    0.52   (-27.77)
        
### Testing empty dequeue ###   
SC empty dequeue:                                       2.48    2.48 (0)
MC empty dequeue:                                       2.48    2.48 (0)
        
### Testing using a single lcore ###    
SP/SC bulk enq/dequeue (size: 8):                       1.52    1.33 (-12.5)
MP/MC bulk enq/dequeue (size: 8):                       1.92    1.73 (-9.89)
SP/SC bulk enq/dequeue (size: 32):                      0.62    0.42 (-32.25)
MP/MC bulk enq/dequeue (size: 32):                      0.72    0.52 (-27.77)
        
### Testing using two physical cores ###        
SP/SC bulk enq/dequeue (size: 8):                       6.30    6.57   (4.28)
MP/MC bulk enq/dequeue (size: 8):                       10.59   10.45 (-1.32)
SP/SC bulk enq/dequeue (size: 32):                      1.92    1.58   (-17.70)
MP/MC bulk enq/dequeue (size: 32):                      2.51    2.47   (-1.59)

From my side, I would say let us just go with patch 2/6.

Jerin/David, any opinion on your side?

Reply via email to