On 15.01.2019 11:55, Shahaf Shuler wrote: > Tuesday, January 15, 2019 10:29 AM, Ilya Maximets: >> Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: add platform memory >> ordering feature support >> >> On 15.01.2019 9:33, Shahaf Shuler wrote: >>> Thursday, January 10, 2019 10:37 PM, Shahaf Shuler: >>>> Subject: RE: [dpdk-dev] [PATCH v2] net/virtio: add platform memory >>>> ordering feature support >>>> >>>> Wednesday, January 9, 2019 5:50 PM, Michael S. Tsirkin: >>>>> alejandro.luc...@netronome.com; Daniel Marcovitch >>>>> <dani...@mellanox.com> >>>>> Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: add platform memory >>>>> ordering feature support >>>>> >>>>> On Wed, Jan 09, 2019 at 05:34:38PM +0300, Ilya Maximets wrote: >>>>>> virtio_mb() is really heavy. I'd like to avoid it somehow, but I >>>>>> don't know how to do this yet. >>>>> >>>>> Linux driver doesn't avoid it either. >>>> >>>> I understand v3 was merged but still would like to continue the >>>> discuss and make sure all is clear and agreed. >>>> >>>> Form patch [1] description it is very clear why we need the >>>> rte_smp_mb() barrier. >>>> However I am not sure why this barrier is interoperate into rte_mb in >>>> case of vDPA. In vDPA case, both read of the user ring and write of >>>> the avail index are for local cached memory. >>>> The only write which is to uncachable memory (device memory) is the >>>> notify itself. >>>> >>>> As I mentioned, there is a need to have a store fence before doing >>>> the notify, but from different reasons. So vDPA use case and need Is >>>> a bit different than what presented in [1]. >>> >>> Any answer? >>> It is pity if we add redundant barriers which will degrade the driver >> performance. >> >> Sorry for late responses. Have a lot of work with OVS right now. >> >> Regarding your question. >> Current code looks like this: >> >> 1. Update ring. >> 2. virtio_wmb() >> 3. Update idx. >> 4. virtio_mb() >> 5. read flags. >> 6. notify. >> >> virtio_mb() is here to avoid reordering of steps 3 and 5. >> i.e. we need a full barrier to ensure the order between store (idx update) >> and load (reading the flags). Otherwise we could miss the notification. >> We can't avoid the barrier here, because even x86 does not guarantee the >> ordering of the local load with earlier local store. > > This is clear. You need the rte_smp_mb() here. My question is why you need > the rte_mb() in case of vDPA? > As you said, all accesses are local.
Sorry, I misunderstood your question. Memory accesses are not local here. We want the ring data/idx be visible to vDPA HW before reading the flags. > > Pasting you commit code: > /* > * Per virtio_ring.h in Linux. > * For virtio_pci on SMP, we don't need to order with respect to MMIO > * accesses through relaxed memory I/O windows, so smp_mb() et al are > * sufficient. > * > * For using virtio to talk to real devices (eg. vDPA) we do need real > * barriers. > */ > static inline void > virtio_mb(uint8_t weak_barriers) > { > if (weak_barriers) > rte_smp_mb(); > else > rte_mb(); > } > >> >>> >>>> >>>> [1] >>>> >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpa >>>> >> tches.dpdk.org%2Fpatch%2F49545%2F&data=02%7C01%7Cshahafs%40 >> mellan >>>> >> ox.com%7C01907f1b2e0e4002cb7508d67ac38a98%7Ca652971c7d2e4d9ba6a4 >> d1492 >>>> >> 56f461b%7C0%7C0%7C636831377591864200&sdata=TSpc%2Fzyq2aq0N3 >> %2Bh4o >>>> ro4std8ut%2FQU6%2BOeMDeeaQdsM%3D&reserved=0 >>>> >>>>> >>>>> -- >>>>> MST >>> >>>