> When testing ring performance in the case that multiple lcores are mapped to > the same physical core, e.g. --lcores '(0-3)@10', it takes a very long time > to wait for the "enqueue_dequeue_bulk_helper" to finish. This is because > too much iteration numbers and extremely low efficiency for enqueue and > dequeue with this kind of core mapping. Following are the test results to > show the above phenomenon: > > x86-Intel(R) Xeon(R) Gold 6240: > $sudo ./app/test/dpdk-test --lcores '(0-1)@25' > Testing using two hyperthreads(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 > 21 23 > run time: 7s 7s 7s 8s 9s 16s 47s 170s 660s > >0.5h >1h > legacy APIs: SP/SC: 37 11 6 40525 40525 40209 40367 40407 40541 > NoData NoData > legacy APIs: MP/MC: 56 14 11 50657 40526 40526 40526 40625 40585 > NoData NoData > > aarch64-n1sdp: > $sudo ./app/test/dpdk-test --lcore '(0-1)@1' > Testing using two hyperthreads(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 > 21 23 > run time: 8s 8s 8s 9s 9s 14s 34s 111s 418s > 25min >1h > legacy APIs: SP/SC: 0.4 0.2 0.1 488 488 488 488 488 489 > 489 NoData > legacy APIs: MP/MC: 0.4 0.3 0.2 488 488 488 488 490 489 > 489 NoData > > As the number of iterations increases, so does the time which is required to > run the program. Currently (iter_shift = 23), it will take more than 1 hour > to wait for the test to finish. To fix this, the "iter_shift" should decrease > and ensure enough iterations to keep the test data stable. In order to achieve > this, we also test with "-l" EAL argument: > > x86-Intel(R) Xeon(R) Gold 6240: > $sudo ./app/test/dpdk-test -l 25-26 > Testing using two NUMA nodes(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 > 21 23 > run time: 6s 6s 6s 6s 6s 6s 6s 7s 8s > 11s 27s > legacy APIs: SP/SC: 47 20 13 22 54 83 91 73 81 > 75 95 > legacy APIs: MP/MC: 44 18 18 240 245 270 250 249 252 > 250 253 > > aarch64-n1sdp: > $sudo ./app/test/dpdk-test -l 1-2 > Testing using two physical cores(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 > 21 23 > run time: 8s 8s 8s 8s 8s 8s 8s 9s 9s > 11s 23s > legacy APIs: SP/SC: 0.7 0.4 1.2 1.8 2.0 2.0 2.0 2.0 2.0 > 2.0 2.0 > legacy APIs: MP/MC: 0.3 0.4 1.3 1.9 2.9 2.9 2.9 2.9 2.9 > 2.9 2.9 > > According to above test data, when "iter_shift" is set as "15", the test run > time is reduced to less than 1 minute and the test result can keep stable > in x86 and aarch64 servers. > > Fixes: 1fa5d0099efc ("test/ring: add custom element size performance tests") > Cc: honnappa.nagaraha...@arm.com > Cc: sta...@dpdk.org > > Signed-off-by: Feifei Wang <feifei.wa...@arm.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > --- > app/test/test_ring_perf.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c > index e63e25a86..fd82e2041 100644 > --- a/app/test/test_ring_perf.c > +++ b/app/test/test_ring_perf.c > @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int flag, > const int esize, > struct thread_params *p) > { > int ret; > - const unsigned int iter_shift = 23; > + const unsigned int iter_shift = 15; > const unsigned int iterations = 1 << iter_shift; > struct rte_ring *r = p->r; > unsigned int bsize = p->size; > --
I think it would be better to rework the test(s) to terminate after some timeout (30s or so), and report number of ops per timeout. Anyway, as a short term fix, I am ok with it. Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com> > 2.17.1