Hi Gavin, >Hi Pavan, > >> -----Original Message----- >> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com> >> Sent: Friday, February 14, 2020 2:45 PM >> To: jer...@marvell.com; Pavan Nikhilesh ><pbhagavat...@marvell.com> >> Cc: Gavin Hu <gavin...@arm.com>; dev@dpdk.org >> Subject: [dpdk-dev] [PATCH] event/octeontx2: remove WFE from >dualslot >> dequeue >> >> From: Pavan Nikhilesh <pbhagavat...@marvell.com> >> >> Each workslot is always bound to a specific lcore there is no multi-core >> contention to cause cache trashing as a result it is safe to remove the >> WFE. Also, in dual workslot dequeue work will mostlikely be available >on >> the pair workslot making WFE impractical. > >Does SSO still signal EVENTI to exit from WFE? Then the core ignore it?
All transactions on SSO bus take the core out of WFE. >Can this be disabled as WFE is removed? This can't be disabled. > >> >> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com> >> --- >> >> Also, this in-turn reduces the branch misses >> >> Before: >> 0 >> >arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jit >ter=1, >> min_latency=0/ >> 0 dummy:u >> 0 llc-miss >> 0 tlb-miss >> 853 branch-miss >> 0 remote-access >> 0 l1d-miss >> >> After: >> 0 >> >arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jit >ter=1, >> min_latency=0/ >> 0 dummy:u >> 0 llc-miss >> 0 tlb-miss >> 250 branch-miss >> 0 remote-access >> 0 l1d-miss >> >> WFE Data: >> >> 0x4C40 - WFI_WFE_WAIT_CYCLES - Number of cycles waiting at a WFI >or >> WFE instruction. >> >> - WFE Cycles before the patch for Dual workslot >> #perf stat -C 20 -e r4C40 sleep 1 >> Performance counter stats for 'CPU(s) 20': >> >> 264 r4C40 >> 1.002494168 seconds time elapsed >> >> - WFE Cycles for single workslot >> #perf stat -C 20 -e r4C40 sleep 1 >> Performance counter stats for 'CPU(s) 20': >> >> 908,778,351 r4C40 >> 1.002598253 seconds time elapsed >> >> drivers/event/octeontx2/otx2_worker_dual.h | 6 +----- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/drivers/event/octeontx2/otx2_worker_dual.h >> b/drivers/event/octeontx2/otx2_worker_dual.h >> index 5134e3d52..c88420eb4 100644 >> --- a/drivers/event/octeontx2/otx2_worker_dual.h >> +++ b/drivers/event/octeontx2/otx2_worker_dual.h >> @@ -29,11 +29,7 @@ otx2_ssogws_dual_get_work(struct >> otx2_ssogws_state *ws, >> rte_prefetch_non_temporal(lookup_mem); >> #ifdef RTE_ARCH_ARM64 >> asm volatile( >> - " ldr %[tag], [%[tag_loc]] \n" >> - " ldr %[wqp], [%[wqp_loc]] \n" >> - " tbz %[tag], 63, done%= \n" >> - " sevl \n" >> - "rty%=: wfe \n" >> + "rty%=: \n" >> " ldr %[tag], [%[tag_loc]] \n" >> " ldr %[wqp], [%[wqp_loc]] \n" >> " tbnz %[tag], 63, rty%= \n" >> -- >> 2.17.1