Hi Pavan, > -----Original Message----- > From: pbhagavat...@marvell.com <pbhagavat...@marvell.com> > Sent: Friday, February 14, 2020 2:45 PM > To: jer...@marvell.com; Pavan Nikhilesh <pbhagavat...@marvell.com> > Cc: Gavin Hu <gavin...@arm.com>; dev@dpdk.org > Subject: [dpdk-dev] [PATCH] event/octeontx2: remove WFE from dualslot > dequeue > > From: Pavan Nikhilesh <pbhagavat...@marvell.com> > > Each workslot is always bound to a specific lcore there is no multi-core > contention to cause cache trashing as a result it is safe to remove the > WFE. Also, in dual workslot dequeue work will mostlikely be available on > the pair workslot making WFE impractical.
Does SSO still signal EVENTI to exit from WFE? Then the core ignore it? Can this be disabled as WFE is removed? > > Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com> > --- > > Also, this in-turn reduces the branch misses > > Before: > 0 > arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1, > min_latency=0/ > 0 dummy:u > 0 llc-miss > 0 tlb-miss > 853 branch-miss > 0 remote-access > 0 l1d-miss > > After: > 0 > arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1, > min_latency=0/ > 0 dummy:u > 0 llc-miss > 0 tlb-miss > 250 branch-miss > 0 remote-access > 0 l1d-miss > > WFE Data: > > 0x4C40 - WFI_WFE_WAIT_CYCLES - Number of cycles waiting at a WFI or > WFE instruction. > > - WFE Cycles before the patch for Dual workslot > #perf stat -C 20 -e r4C40 sleep 1 > Performance counter stats for 'CPU(s) 20': > > 264 r4C40 > 1.002494168 seconds time elapsed > > - WFE Cycles for single workslot > #perf stat -C 20 -e r4C40 sleep 1 > Performance counter stats for 'CPU(s) 20': > > 908,778,351 r4C40 > 1.002598253 seconds time elapsed > > drivers/event/octeontx2/otx2_worker_dual.h | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/event/octeontx2/otx2_worker_dual.h > b/drivers/event/octeontx2/otx2_worker_dual.h > index 5134e3d52..c88420eb4 100644 > --- a/drivers/event/octeontx2/otx2_worker_dual.h > +++ b/drivers/event/octeontx2/otx2_worker_dual.h > @@ -29,11 +29,7 @@ otx2_ssogws_dual_get_work(struct > otx2_ssogws_state *ws, > rte_prefetch_non_temporal(lookup_mem); > #ifdef RTE_ARCH_ARM64 > asm volatile( > - " ldr %[tag], [%[tag_loc]] \n" > - " ldr %[wqp], [%[wqp_loc]] \n" > - " tbz %[tag], 63, done%= \n" > - " sevl \n" > - "rty%=: wfe \n" > + "rty%=: \n" > " ldr %[tag], [%[tag_loc]] \n" > " ldr %[wqp], [%[wqp_loc]] \n" > " tbnz %[tag], 63, rty%= \n" > -- > 2.17.1