Hi Pavan,

> -----Original Message-----
> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com>
> Sent: Friday, February 14, 2020 2:45 PM
> To: jer...@marvell.com; Pavan Nikhilesh <pbhagavat...@marvell.com>
> Cc: Gavin Hu <gavin...@arm.com>; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] event/octeontx2: remove WFE from dualslot
> dequeue
> 
> From: Pavan Nikhilesh <pbhagavat...@marvell.com>
> 
> Each workslot is always bound to a specific lcore there is no multi-core
> contention to cause cache trashing as a result it is safe to remove the
> WFE. Also, in dual workslot dequeue work will mostlikely be available on
> the pair workslot making WFE impractical.

Does SSO still signal EVENTI to exit from WFE?  Then the core ignore it?
Can this be disabled as WFE is removed?  

> 
> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com>
> ---
> 
> Also, this in-turn reduces the branch misses
> 
> Before:
>       0
> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,
> min_latency=0/
>       0 dummy:u
>       0 llc-miss
>       0 tlb-miss
>       853 branch-miss
>       0 remote-access
>       0 l1d-miss
> 
> After:
>       0
> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,
> min_latency=0/
>       0 dummy:u
>       0 llc-miss
>       0 tlb-miss
>       250 branch-miss
>       0 remote-access
>       0 l1d-miss
> 
> WFE Data:
> 
> 0x4C40 - WFI_WFE_WAIT_CYCLES - Number of cycles waiting at a WFI or
> WFE instruction.
> 
> - WFE Cycles before the patch for Dual workslot
> #perf stat -C 20 -e r4C40 sleep 1
> Performance counter stats for 'CPU(s) 20':
> 
>                264      r4C40
>        1.002494168 seconds time elapsed
> 
> - WFE Cycles for single workslot
> #perf stat -C 20 -e r4C40 sleep 1
> Performance counter stats for 'CPU(s) 20':
> 
>        908,778,351      r4C40
>        1.002598253 seconds time elapsed
> 
>  drivers/event/octeontx2/otx2_worker_dual.h | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/event/octeontx2/otx2_worker_dual.h
> b/drivers/event/octeontx2/otx2_worker_dual.h
> index 5134e3d52..c88420eb4 100644
> --- a/drivers/event/octeontx2/otx2_worker_dual.h
> +++ b/drivers/event/octeontx2/otx2_worker_dual.h
> @@ -29,11 +29,7 @@ otx2_ssogws_dual_get_work(struct
> otx2_ssogws_state *ws,
>               rte_prefetch_non_temporal(lookup_mem);
>  #ifdef RTE_ARCH_ARM64
>       asm volatile(
> -                     "        ldr %[tag], [%[tag_loc]]    \n"
> -                     "        ldr %[wqp], [%[wqp_loc]]    \n"
> -                     "        tbz %[tag], 63, done%=      \n"
> -                     "        sevl                        \n"
> -                     "rty%=:  wfe                         \n"
> +                     "rty%=:                              \n"
>                       "        ldr %[tag], [%[tag_loc]]    \n"
>                       "        ldr %[wqp], [%[wqp_loc]]    \n"
>                       "        tbnz %[tag], 63, rty%=      \n"
> --
> 2.17.1

Reply via email to