Thomas Monjalon <tho...@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> From: Thomas Monjalon <tho...@monjalon.net> > To: Pradeep Satyanarayana <prad...@us.ibm.com>, David Wilder > <wil...@us.ibm.com> > Cc: Shahaf Shuler <shah...@mellanox.com>, Chao Zhu > <chao...@linux.vnet.ibm.com>, Dekel Peled <dek...@mellanox.com>, > dev@dpdk.org, David Christensen <d...@ibm.com>, Ori Kam > <or...@mellanox.com>, Yongseok Koh <ys...@mellanox.com>, > konstantin.anan...@intel.com, ola.liljed...@arm.com, > honnappa.nagaraha...@arm.com, bruce.richard...@intel.com > Date: 03/22/2019 01:49 AM > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER > > We need to agree on the definitions. > Please see below, > > 22/03/2019 02:40, Pradeep Satyanarayana: > > Shahaf Shuler <shah...@mellanox.com> wrote on 03/21/2019 01:49:39 AM: > > > Pradeep Satyanarayana <prad...@us.ibm.com> wrote on Thu 3/21/2019 12:41 > > AM: > > > >> > So far, when not running on power, we used the rte_wmb for that. > > > >> On x86 and ARM systems it provided the needed guarantees. > > > >> > It is also mentioned in the barrier doxygen on ARM arch: > > > >> > " > > > >> > Write memory barrier. > > > >> > > > > >> > Guarantees that the STORE operations generated before the barrier > > > >> > occur before the STORE operations generated after. > > > >> > " > > > >> > > > > >> > It doesn't restrict to store to system memory only. > > > >> > w/ power is on somewhat different and in fact rte_mb is required. > > > >> It obviously miss the point of those barrier if we will need to use > > > >> a different barrier based on the system arch. > > > >> > > > > >> > We need to align the definition of the different barriers in DPDK: > > > >> > 1. need a clear documentation of each. this should be global and > > > >> not part of the specific implementation on each arch. > > > > > > > >A single approach may not work for all architectures. Power is different > > > >from others, so we need to be able to accommodate that. More comments > > below. > > > > > > it don't get this claim. > > > It is ok to have some differences between the different arch, but > > > here you implement a well-defined barrier - rte_wmb. > > > if you see a need we can discuss to define a **new** barrier which > > > sync STORE only to system memory, and will be able to utilize the > > > lwsync command. > > > > > > > > > > >> The global definition is in lib/librte_eal/common/include/ > > > generic/rte_atomic.h > > > >> > > > >> There are some copy/paste in Arm32 and PPC that I will remove. > > > >> > > > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to > > > >> define a new type of barrier which will sync between both I/O and > > > >> stores to systems memory. > > > >> > > > >> The basic memory barrier of DPDK does not mention > > > >> a difference between I/O and system memory. > > > > > > > >In the case of Power, sync will cater to both I/O and system > > > memory. However, that > > > >is too big a hammer in all cases. > > > > > > rte_wmb requires such sync. you propose to have the wrong barrier in > > > favor of performance. > > > to mitigate this you can take my suggestion above and define a new, > > > more lightweight one. > > > > > > > > > > >> It is not explicit (yet) but I assume it is protecting both. > > > >> So, in my opinion, we need to make it explicit in the doc, > > > >> and fix the PPC implementation to comply with this definition. > > > >> > > > >> Anyway, I don't see any significant effort from IBM to move from > > > >> the alpha support stage to a real Open Source support. > > > >> PS: sending a mail every two months, to promise improvements, is > > > not enough! > > > > > > > > > > […] > > > > > > > > > > >We should retain lwsync, should not be removed as discussed in here: > > > > > > > >https://urldefense.proofpoint.com/v2/url? > u=http-3A__mails.dpdk.org_archives_dev_2019-2DMarch_126746.html&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=co4lCofxrQP11yIVMply- > QYvsUyeKJkYY_jL1QVgeGA&m=SNGJjgGF8mHR9t- > ixYHznWvUoXvC3zlbm8q1vlS4x_s&s=TXCEGDEjCiUW1ug5kDwlfuUaqRMowGhpihjly5zEZp8&e= > > > > > > i don't agree. > > > it is very clear the rte_wmb implementation in power is broken and > > > we need to fix this right away before other customers will hit the > > > same issue. > > > > > > In the DPDK source I see a couple of different classes of memory barriers. > > I am > > not clear on the usage of these in the drivers, but I would think the > > guidelines > > to be as shown below (for Power): > > > > - rte_[rw]mb (general memory barrier) --> should be lwsync > > This is what may be discussed. > The assumption is that the general memory barrier should cover > all cases (CPU caches, SMP and I/O). > That's why we think it should "sync" for Power. In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb and retain it as lwsync. Agreed? > > > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync > > - rte_io_[rw]mb (I/O memory barrier) --> should be sync > > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync > > > > lwsync is appropriate for cases where CPUs are accessing cacheable memory > > (i.e. Memory Coherence Required) while the sync instruction should be used > > in all other cases. > > Thanks Pradeep prad...@us.ibm.com