On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote: > On 1/13/2017 8:17 AM, Jerin Jacob wrote: > > On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote: > >> Hi Jerin, > >> > >> On 1/12/2017 9:17 AM, Jerin Jacob wrote: > >> <...> > >> > >>> +#include <rte_io.h> > >>> + > >>> /* CSR write macro */ > >>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \ > >>> - (void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \ > >>> - = (val))) > >>> +#define ADF_CSR_WR(csrAddr, csrOffset, val) \ > >>> + rte_write32(val, (((uint8_t *)csrAddr) + csrOffset)) > >> > >> For IA, this update introduces an extra compiler barrier (rte_io_wmb()), > >> which is indeed not a must, is this correct? > > > > AFAIK, Compiler barrier is required for IA. I am not an IA expert, if > > someone thinks it needs to changed then I can fix it in following commit > > in this patch series by making rte_io_wmb() and rte_io_rmb() as empty. > > > > Let me know. > > > > AFAIK, Linux kernel code has a barrier in readl/writel for IA. > > > > Typically we don't use any non relaxed versions in fast path.In fast > > typically all the drivers has explicit write barrier for doorbell write > > and followed by a relaxed version of write. IMO, In any event, it won't > > generate performance regression. > > > > [dpdk-master] $ git show > > 70c343bdc8c33a51a9db23cd58122bdfc120a58f > > commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f > > Author: Jerin Jacob <jerin.ja...@caviumnetworks.com> > > Date: Mon Dec 5 06:36:49 2016 +0530 > > > > eal/x86: define I/O device memory barriers for IA > > > > The patch does not provide any functional change for IA. > > I/O barriers are mapped to existing smp barriers. > > > > CC: Bruce Richardson <bruce.richard...@intel.com> > > CC: Konstantin Ananyev <konstantin.anan...@intel.com> > > Signed-off-by: Jerin Jacob <jerin.ja...@caviumnetworks.com> > > > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > > b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > > index 00b1cdf..4eac666 100644 > > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > > @@ -61,6 +61,12 @@ extern "C" { > > > > #define rte_smp_rmb() rte_compiler_barrier() > > > > +#define rte_io_mb() rte_mb() > > + > > +#define rte_io_wmb() rte_compiler_barrier() > > + > > +#define rte_io_rmb() rte_compiler_barrier() > > + > > /*------------------------- 16 bit atomic operations > > * -------------------------*/ > > > > #ifndef RTE_FORCE_INTRINSICS > > > >> > >> If so, does it make sense to override these functions for x86, and make > >> rte_writeX = rte_writeX_relaxed > >> rte_readX = rte_readX_relaxed > >> > >>> > >>> /* CSR read macro */ > >>> -#define ADF_CSR_RD(csrAddr, csrOffset) \ > >>> - (*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset))) > >>> +#define ADF_CSR_RD(csrAddr, csrOffset) \ > >>> + rte_read32((((uint8_t *)csrAddr) + csrOffset)) > >> > >> This patchset both introduces new rte_readX/rte_writeX functions, also > >> applies them into drivers. > >> > >> While applying them, it changes the behavior. > >> Like above code was doing a read, but after update it does read and > >> read_memory_barrier. > >> > >> What do you think this patchset updates usage in a manner that keeps > >> behavior exact same. Like using rte_read32_relaxed for this case. > >> And doing architecture related updates in a different patchset? > > > > Need to use rte_read32 at this commit otherwise it will break for ARM. > > That's was all point for this patchset. > > Why it breaks the ARM, is it because rte_*mb() updated for ARM in this > patchset (patch 7/29) ?
Yes. > > I believe it is good to make these modifications in two phase: It is in two phases only. First introduced the API with implementation and enabled in each driver. Why did you think other-way around it is better? I can rework and test if there is any value addition. If you concerned about git bisect ability then I don't think we are loosing that in this model. Thoughts? > - First replace old usage with rte_readX/rte_writeX while keeping exact > same behavior > > - Second, do architecture specific changes. Both in eal and drivers > level if required. > > Thanks, > ferruh > > > For performance regression, we can always verify by taking delta > > between this changeset and the previous changeset. If you think, I need > > to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that > > as well. > > > > > >> > >> This both makes easy to see architecture specific updates, and makes > >> easy to trace any possible performance issues by this patchset. > >> > >>> > >>> #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL > >>> #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL > >>> > >> >