Hi Jerin,

Following the guide to use the PMU counters(KO inserted and DPDK recompiled), 
the numbers increased 10+ folds(bigger numbers here mean more precise?), is 
this valid and expected? 
No significant difference was seen. 

gavin@net-arm-thunderx2:~/community/dpdk$ sudo ./test/test/test -l 
16-19,44-47,72-75,100-103 -n 4 --socket-mem=1024  -- -i
RTE>>ring_perf_autotest (#1 run w/o the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 130
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 21
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 3.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.48
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.39
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.32
MP/MC bulk enq/dequeue (size: 8): 38.52
SP/SC bulk enq/dequeue (size: 32): 13.39
MP/MC bulk enq/dequeue (size: 32): 14.15

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.00
MP/MC bulk enq/dequeue (size: 8): 141.97
SP/SC bulk enq/dequeue (size: 32): 23.85
MP/MC bulk enq/dequeue (size: 32): 36.13
Test OK
RTE>>ring_perf_autotest (#2 run w/o the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 130
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 21
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 3.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.48
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.38
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.31
MP/MC bulk enq/dequeue (size: 8): 38.52
SP/SC bulk enq/dequeue (size: 32): 13.33
MP/MC bulk enq/dequeue (size: 32): 14.16

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.74
MP/MC bulk enq/dequeue (size: 8): 147.33
SP/SC bulk enq/dequeue (size: 32): 24.79
MP/MC bulk enq/dequeue (size: 32): 40.09
Test OK

RTE>>ring_perf_autotest (#1 run w/ the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 129
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 22
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 4.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.89
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.50
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.24
MP/MC bulk enq/dequeue (size: 8): 38.14
SP/SC bulk enq/dequeue (size: 32): 13.24
MP/MC bulk enq/dequeue (size: 32): 14.69

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 74.63
MP/MC bulk enq/dequeue (size: 8): 137.61
SP/SC bulk enq/dequeue (size: 32): 24.82
MP/MC bulk enq/dequeue (size: 32): 36.64
Test OK
RTE>>ring_perf_autotest (#1 run w/ the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 129
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 22
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 4.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.89
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.50
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.53
MP/MC bulk enq/dequeue (size: 8): 38.59
SP/SC bulk enq/dequeue (size: 32): 13.24
MP/MC bulk enq/dequeue (size: 32): 14.69

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.60
MP/MC bulk enq/dequeue (size: 8): 149.14
SP/SC bulk enq/dequeue (size: 32): 25.13
MP/MC bulk enq/dequeue (size: 32): 40.60
Test OK


> -----Original Message-----
> From: Jerin Jacob <[email protected]>
> Sent: Monday, October 8, 2018 6:50 PM
> To: Gavin Hu (Arm Technology China) <[email protected]>
> Cc: Ola Liljedahl <[email protected]>; [email protected]; Honnappa
> Nagarahalli <[email protected]>; Ananyev, Konstantin
> <[email protected]>; Steve Capper <[email protected]>;
> nd <[email protected]>; [email protected]
> Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
> 
> -----Original Message-----
> > Date: Mon, 8 Oct 2018 10:33:43 +0000
> > From: "Gavin Hu (Arm Technology China)" <[email protected]>
> > To: Ola Liljedahl <[email protected]>, Jerin Jacob
> > <[email protected]>
> > CC: "[email protected]" <[email protected]>, Honnappa Nagarahalli
> > <[email protected]>, "Ananyev, Konstantin"
> >  <[email protected]>, Steve Capper
> <[email protected]>,
> > nd  <[email protected]>, "[email protected]" <[email protected]>
> > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load
> >
> >
> > I did benchmarking w/o and w/ the patch, it did not show any noticeable
> differences in terms of latency.
> > Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch).
> >
> > sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4
> > --socket-mem=1024  -- -i
> 
> These counters are running at 100MHz. Use PMU counters to get more
> accurate results.
> 
> https://doc.dpdk.org/guides/prog_guide/profile_app.html
> See: 55.2. Profiling on ARM64
> 

Reply via email to