> > Hi Jerin, > > Following the guide to use the PMU counters(KO inserted and DPDK > recompiled), the numbers increased 10+ folds(bigger numbers here mean > more precise?), is this valid and expected? This is correct, big numbers mean, more precise/granular results.
> No significant difference was seen. This is what we are interested in. Do you have any before and after this change numbers? > > gavin@net-arm-thunderx2:~/community/dpdk$ sudo ./test/test/test -l 16- > 19,44-47,72-75,100-103 -n 4 --socket-mem=1024 -- -i > RTE>>ring_perf_autotest (#1 run w/o the patch) > ### Testing single element and burst enq/deq ### SP/SC single > enq/dequeue: 103 MP/MC single enq/dequeue: 130 SP/SC burst > enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 21 SP/SC > burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8 > > ### Testing empty dequeue ### > SC empty dequeue: 3.00 > MC empty dequeue: 3.00 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 17.48 > MP/MC bulk enq/dequeue (size: 8): 21.77 > SP/SC bulk enq/dequeue (size: 32): 7.39 > MP/MC bulk enq/dequeue (size: 32): 8.52 > > ### Testing using two hyperthreads ### > SP/SC bulk enq/dequeue (size: 8): 31.32 > MP/MC bulk enq/dequeue (size: 8): 38.52 > SP/SC bulk enq/dequeue (size: 32): 13.39 MP/MC bulk enq/dequeue (size: > 32): 14.15 > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): > 75.00 MP/MC bulk enq/dequeue (size: 8): 141.97 SP/SC bulk enq/dequeue > (size: 32): 23.85 MP/MC bulk enq/dequeue (size: 32): 36.13 Test OK > RTE>>ring_perf_autotest (#2 run w/o the patch) > ### Testing single element and burst enq/deq ### SP/SC single > enq/dequeue: 103 MP/MC single enq/dequeue: 130 SP/SC burst > enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 21 SP/SC > burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8 > > ### Testing empty dequeue ### > SC empty dequeue: 3.00 > MC empty dequeue: 3.00 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 17.48 > MP/MC bulk enq/dequeue (size: 8): 21.77 > SP/SC bulk enq/dequeue (size: 32): 7.38 > MP/MC bulk enq/dequeue (size: 32): 8.52 > > ### Testing using two hyperthreads ### > SP/SC bulk enq/dequeue (size: 8): 31.31 > MP/MC bulk enq/dequeue (size: 8): 38.52 > SP/SC bulk enq/dequeue (size: 32): 13.33 MP/MC bulk enq/dequeue (size: > 32): 14.16 > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): > 75.74 MP/MC bulk enq/dequeue (size: 8): 147.33 SP/SC bulk enq/dequeue > (size: 32): 24.79 MP/MC bulk enq/dequeue (size: 32): 40.09 Test OK > > RTE>>ring_perf_autotest (#1 run w/ the patch) > ### Testing single element and burst enq/deq ### SP/SC single > enq/dequeue: 103 MP/MC single enq/dequeue: 129 SP/SC burst > enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 22 SP/SC > burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8 > > ### Testing empty dequeue ### > SC empty dequeue: 3.00 > MC empty dequeue: 4.00 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 17.89 > MP/MC bulk enq/dequeue (size: 8): 21.77 > SP/SC bulk enq/dequeue (size: 32): 7.50 > MP/MC bulk enq/dequeue (size: 32): 8.52 > > ### Testing using two hyperthreads ### > SP/SC bulk enq/dequeue (size: 8): 31.24 > MP/MC bulk enq/dequeue (size: 8): 38.14 > SP/SC bulk enq/dequeue (size: 32): 13.24 MP/MC bulk enq/dequeue (size: > 32): 14.69 > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): > 74.63 MP/MC bulk enq/dequeue (size: 8): 137.61 SP/SC bulk enq/dequeue > (size: 32): 24.82 MP/MC bulk enq/dequeue (size: 32): 36.64 Test OK > RTE>>ring_perf_autotest (#1 run w/ the patch) > ### Testing single element and burst enq/deq ### SP/SC single > enq/dequeue: 103 MP/MC single enq/dequeue: 129 SP/SC burst > enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 22 SP/SC > burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8 > > ### Testing empty dequeue ### > SC empty dequeue: 3.00 > MC empty dequeue: 4.00 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 17.89 > MP/MC bulk enq/dequeue (size: 8): 21.77 > SP/SC bulk enq/dequeue (size: 32): 7.50 > MP/MC bulk enq/dequeue (size: 32): 8.52 > > ### Testing using two hyperthreads ### > SP/SC bulk enq/dequeue (size: 8): 31.53 > MP/MC bulk enq/dequeue (size: 8): 38.59 > SP/SC bulk enq/dequeue (size: 32): 13.24 MP/MC bulk enq/dequeue (size: > 32): 14.69 > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): > 75.60 MP/MC bulk enq/dequeue (size: 8): 149.14 SP/SC bulk enq/dequeue > (size: 32): 25.13 MP/MC bulk enq/dequeue (size: 32): 40.60 Test OK > > > > -----Original Message----- > > From: Jerin Jacob <jerin.ja...@caviumnetworks.com> > > Sent: Monday, October 8, 2018 6:50 PM > > To: Gavin Hu (Arm Technology China) <gavin...@arm.com> > > Cc: Ola Liljedahl <ola.liljed...@arm.com>; dev@dpdk.org; Honnappa > > Nagarahalli <honnappa.nagaraha...@arm.com>; Ananyev, Konstantin > > <konstantin.anan...@intel.com>; Steve Capper > <steve.cap...@arm.com>; > > nd <n...@arm.com>; sta...@dpdk.org > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > > -----Original Message----- > > > Date: Mon, 8 Oct 2018 10:33:43 +0000 > > > From: "Gavin Hu (Arm Technology China)" <gavin...@arm.com> > > > To: Ola Liljedahl <ola.liljed...@arm.com>, Jerin Jacob > > > <jerin.ja...@caviumnetworks.com> > > > CC: "dev@dpdk.org" <dev@dpdk.org>, Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>, "Ananyev, Konstantin" > > > <konstantin.anan...@intel.com>, Steve Capper > > <steve.cap...@arm.com>, > > > nd <n...@arm.com>, "sta...@dpdk.org" <sta...@dpdk.org> > > > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load > > > > > > > > > I did benchmarking w/o and w/ the patch, it did not show any > > > noticeable > > differences in terms of latency. > > > Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch). > > > > > > sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 > > > --socket-mem=1024 -- -i > > > > These counters are running at 100MHz. Use PMU counters to get more > > accurate results. > > > > https://doc.dpdk.org/guides/prog_guide/profile_app.html > > See: 55.2. Profiling on ARM64 > >