On Tue, Jan 15, 2019 at 08:55:50AM +0000, Shahaf Shuler wrote: > Tuesday, January 15, 2019 10:29 AM, Ilya Maximets: > > Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: add platform memory > > ordering feature support > > > > On 15.01.2019 9:33, Shahaf Shuler wrote: > > > Thursday, January 10, 2019 10:37 PM, Shahaf Shuler: > > >> Subject: RE: [dpdk-dev] [PATCH v2] net/virtio: add platform memory > > >> ordering feature support > > >> > > >> Wednesday, January 9, 2019 5:50 PM, Michael S. Tsirkin: > > >>> alejandro.luc...@netronome.com; Daniel Marcovitch > > >>> <dani...@mellanox.com> > > >>> Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: add platform memory > > >>> ordering feature support > > >>> > > >>> On Wed, Jan 09, 2019 at 05:34:38PM +0300, Ilya Maximets wrote: > > >>>> virtio_mb() is really heavy. I'd like to avoid it somehow, but I > > >>>> don't know how to do this yet. > > >>> > > >>> Linux driver doesn't avoid it either. > > >> > > >> I understand v3 was merged but still would like to continue the > > >> discuss and make sure all is clear and agreed. > > >> > > >> Form patch [1] description it is very clear why we need the > > >> rte_smp_mb() barrier. > > >> However I am not sure why this barrier is interoperate into rte_mb in > > >> case of vDPA. In vDPA case, both read of the user ring and write of > > >> the avail index are for local cached memory. > > >> The only write which is to uncachable memory (device memory) is the > > >> notify itself. > > >> > > >> As I mentioned, there is a need to have a store fence before doing > > >> the notify, but from different reasons. So vDPA use case and need Is > > >> a bit different than what presented in [1]. > > > > > > Any answer? > > > It is pity if we add redundant barriers which will degrade the driver > > performance. > > > > Sorry for late responses. Have a lot of work with OVS right now. > > > > Regarding your question. > > Current code looks like this: > > > > 1. Update ring. > > 2. virtio_wmb() > > 3. Update idx. > > 4. virtio_mb() > > 5. read flags. > > 6. notify. > > > > virtio_mb() is here to avoid reordering of steps 3 and 5. > > i.e. we need a full barrier to ensure the order between store (idx update) > > and load (reading the flags). Otherwise we could miss the notification. > > We can't avoid the barrier here, because even x86 does not guarantee the > > ordering of the local load with earlier local store. > > This is clear. You need the rte_smp_mb() here. My question is why you need > the rte_mb() in case of vDPA? > As you said, all accesses are local.
Are you asking why the different barriers? Or as you asking why is a barrier needed at all? The barriers themselves are clearly needed. But in my opinion some dpdk barrier implementations are sub-optimal and too strong. For example on intel: the big question is whether anyone does any non-temporals. In absence of these, and with non-cacheable mappings on x86 wmb and rmb should be a nop, and mb should be a locked instruction. It might make sense to add rte_dma_rmb/wmb/mb. > Pasting you commit code: > /* > * Per virtio_ring.h in Linux. > * For virtio_pci on SMP, we don't need to order with respect to MMIO > * accesses through relaxed memory I/O windows, so smp_mb() et al are > * sufficient. > * > * For using virtio to talk to real devices (eg. vDPA) we do need real > * barriers. > */ > static inline void > virtio_mb(uint8_t weak_barriers) > { > if (weak_barriers) > rte_smp_mb(); > else > rte_mb(); > } > > > > > > > > >> > > >> [1] > > >> > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpa > > >> > > tches.dpdk.org%2Fpatch%2F49545%2F&data=02%7C01%7Cshahafs%40 > > mellan > > >> > > ox.com%7C01907f1b2e0e4002cb7508d67ac38a98%7Ca652971c7d2e4d9ba6a4 > > d1492 > > >> > > 56f461b%7C0%7C0%7C636831377591864200&sdata=TSpc%2Fzyq2aq0N3 > > %2Bh4o > > >> ro4std8ut%2FQU6%2BOeMDeeaQdsM%3D&reserved=0 > > >> > > >>> > > >>> -- > > >>> MST > > > > > >