Hi Konstantin, 

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.anan...@intel.com>
> Sent: Thursday, October 24, 2019 6:21 PM
> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>;
> dev@dpdk.org
> Cc: nd <n...@arm.com>; tho...@monjalon.net;
> step...@networkplumber.org; hemant.agra...@nxp.com;
> jer...@marvell.com; pbhagavat...@marvell.com; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Ruifeng Wang (Arm Technology China)
> <ruifeng.w...@arm.com>; Phil Yang (Arm Technology China)
> <phil.y...@arm.com>; Steve Capper <steve.cap...@arm.com>; nd
> <n...@arm.com>; nd <n...@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v7 2/7] eal: add the APIs to wait until equal
> 
> 
> 
> Hi Gavin,
> > > > > > The rte_wait_until_equal_xx APIs abstract the functionality of
> > > > > > 'polling for a memory location to become equal to a given value'.
> > > > > >
> > > > > > Add the RTE_ARM_USE_WFE configuration entry for aarch64,
> disabled
> > > > > > by default. When it is enabled, the above APIs will call WFE
> instruction
> > > > > > to save CPU cycles and power.
> > > > > >
> > > > > > Signed-off-by: Gavin Hu <gavin...@arm.com>
> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > > > > > Reviewed-by: Steve Capper <steve.cap...@arm.com>
> > > > > > Reviewed-by: Ola Liljedahl <ola.liljed...@arm.com>
> > > > > > Reviewed-by: Honnappa Nagarahalli
> > > <honnappa.nagaraha...@arm.com>
> > > > > > Reviewed-by: Phil Yang <phil.y...@arm.com>
> > > > > > Acked-by: Pavan Nikhilesh <pbhagavat...@marvell.com>
> > > > > > ---
> > > > > >  config/arm/meson.build                             |   1 +
> > > > > >  config/common_base                                 |   5 +
> > > > > >  .../common/include/arch/arm/rte_pause_64.h         |  30 ++++++
> > > > > >  lib/librte_eal/common/include/generic/rte_pause.h  | 106
> > > > +++++++++++++++++++++
> > > > > >  4 files changed, 142 insertions(+)
> > > > > >
> > > > > > diff --git a/config/arm/meson.build b/config/arm/meson.build
> > > > > > index 979018e..b4b4cac 100644
> > > > > > --- a/config/arm/meson.build
> > > > > > +++ b/config/arm/meson.build
> > > > > > @@ -26,6 +26,7 @@ flags_common_default = [
> > > > > >     ['RTE_LIBRTE_AVP_PMD', false],
> > > > > >
> > > > > >     ['RTE_SCHED_VECTOR', false],
> > > > > > +   ['RTE_ARM_USE_WFE', false],
> > > > > >  ]
> > > > > >
> > > > > >  flags_generic = [
> > > > > > diff --git a/config/common_base b/config/common_base
> > > > > > index 8ef75c2..8861713 100644
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > @@ -111,6 +111,11 @@
> CONFIG_RTE_MAX_VFIO_CONTAINERS=64
> > > > > >  CONFIG_RTE_MALLOC_DEBUG=n
> > > > > >  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
> > > > > >  CONFIG_RTE_USE_LIBBSD=n
> > > > > > +# Use WFE instructions to implement the rte_wait_for_equal_xxx
> APIs,
> > > > > > +# calling these APIs put the cores in low power state while waiting
> > > > > > +# for the memory address to become equal to the expected
> value.
> > > > > > +# This is supported only by aarch64.
> > > > > > +CONFIG_RTE_ARM_USE_WFE=n
> > > > > >
> > > > > >  #
> > > > > >  # Recognize/ignore the AVX/AVX512 CPU flags for
> performance/power
> > > > testing.
> > > > > > diff --git
> a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> > > > b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> > > > > > index 93895d3..dabde17 100644
> > > > > > --- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> > > > > > +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> > > > > > @@ -1,5 +1,6 @@
> > > > > >  /* SPDX-License-Identifier: BSD-3-Clause
> > > > > >   * Copyright(c) 2017 Cavium, Inc
> > > > > > + * Copyright(c) 2019 Arm Limited
> > > > > >   */
> > > > > >
> > > > > >  #ifndef _RTE_PAUSE_ARM64_H_
> > > > > > @@ -17,6 +18,35 @@ static inline void rte_pause(void)
> > > > > >     asm volatile("yield" ::: "memory");
> > > > > >  }
> > > > > >
> > > > > > +#ifdef RTE_ARM_USE_WFE
> > > > > > +#define __WAIT_UNTIL_EQUAL(name, asm_op, wide, type) \
> > > > > > +static __rte_always_inline void \
> > > > > > +rte_wait_until_equal_##name(volatile type * addr, type
> expected) \
> > > > > > +{ \
> > > > > > +   type tmp; \
> > > > > > +   asm volatile( \
> > > > > > +           #asm_op " %" #wide "[tmp], %[addr]\n" \
> > > > > > +           "cmp    %" #wide "[tmp], %" #wide "[expected]\n"
> \
> > > > > > +           "b.eq   2f\n" \
> > > > > > +           "sevl\n" \
> > > > > > +           "1:     wfe\n" \
> > > > > > +           #asm_op " %" #wide "[tmp], %[addr]\n" \
> > > > > > +           "cmp    %" #wide "[tmp], %" #wide "[expected]\n"
> \
> > > > > > +           "bne    1b\n" \
> > > > > > +           "2:\n" \
> > > > > > +           : [tmp] "=&r" (tmp) \
> > > > > > +           : [addr] "Q"(*addr), [expected] "r"(expected) \
> > > > > > +           : "cc", "memory"); \
> > > > > > +}
> > > >
> > > > One more thought:
> > > > Why do you need to write asm code for the whole procedure?
> > > > Why not to do like linux kernel:
> > > > define wfe() and sev() macros and use them inside normal C code?
> > > >
> > > > #define sev()           asm volatile("sev" : : : "memory")
> > > > #define wfe()           asm volatile("wfe" : : : "memory")
> > > >
> > > > Then:
> > > > rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> int
> > > > memorder)
> > > > {
> > > >      if (__atomic_load_n(addr, memorder) != expected) {
> > > >          sev();
> > > >          do {
> > > >              wfe();
> > > >          } while ((__atomic_load_n(addr, memorder) != expected);
> > > >      }
> > > > }
> > > >
> > > > ?
> > > A really good suggestion, I made corresponding changes to v8 already,
> but it
> > > missed a armv8 specific feature after internal discussion.
> > > We call wfe to wait/sleep on the 'monitored' address, it will be waken up
> > > upon someone write to the monitor address, so before wfe, we have to
> call
> > > load-exclusive instruction to 'monitor'.
> > > __atomic_load_n - disassembled to "ldr" does not do so. We have to
> use
> > > "ldxrh" for relaxed mem ordering and "ldaxrh" for acquire ordering, in
> > > example of 16-bit.
> 
> Didn't realize that, sorry for confusion caused...
Your comments are really helpful! Although we missed this point, anyway it 
helped to make the patches in a better shape(I personally likes the new v9 more 
than v7 😊), really appreciate, thanks!
/Gavin
> 
> > >
> > > Let me re-think coming back to the full assembly procedure or
> implementing
> > > a 'load-exclusive' function. What do you think?
> 
> After some thought I am leaning towards 'load-exclusive' function -
> Hopefully it would help you avoid ras asm here and in other places.
> What do you think?
> Konstantin
Yes, I implemented 'load-exclusive' function in v9, please have a review, 
thanks!
Currently I did not make it 'rte_' as it is not used in other places than the 
rte_wait_until_equal APIs. 
Any more comments are welcome!
/Gavin
> 
> > > /Gavin
> > Forgot to mention, kernel uses wfe() without preceding load-exclusive
> instructions because:
> > 1) it replies on the timer, to wake up, i.e. __delay()
> > 2) explicit calling sev to send wake events, for all kinds of locks
> > 3) IPI instructions.
> >
> > Our patches can't count on these events, due to of lack of these events or
> performance  impact.
> > /Gavin
> > >
> > > > > > +/* Wait for *addr to be updated with expected value */
> > > > > > +__WAIT_UNTIL_EQUAL(relaxed_16, ldxrh, w, uint16_t)
> > > > > > +__WAIT_UNTIL_EQUAL(acquire_16, ldaxrh, w, uint16_t)
> > > > > > +__WAIT_UNTIL_EQUAL(relaxed_32, ldxr, w, uint32_t)
> > > > > > +__WAIT_UNTIL_EQUAL(acquire_32, ldaxr, w, uint32_t)
> > > > > > +__WAIT_UNTIL_EQUAL(relaxed_64, ldxr, x, uint64_t)
> > > > > > +__WAIT_UNTIL_EQUAL(acquire_64, ldaxr, x, uint64_t)
> > > > > > +#endif
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif

Reply via email to