On 3/30/22 13:12, Bruce Richardson wrote:
> On Wed, Mar 30, 2022 at 12:52:15PM +0200, Ilya Maximets wrote:
>> On 3/30/22 12:41, Ilya Maximets wrote:
>>> Forking the thread to discuss a memory consistency/ordering model.
>>>
>>> AFAICT, dmadev can be anything from part of a CPU to a completely
>>> separate PCI device.  However, I don't see any memory ordering being
>>> enforced or even described in the dmadev API or documentation.
>>> Please, point me to the correct documentation, if I somehow missed it.
>>>
>>> We have a DMA device (A) and a CPU core (B) writing respectively
>>> the data and the descriptor info.  CPU core (C) is reading the
>>> descriptor and the data it points too.
>>>
>>> A few things about that process:
>>>
>>> 1. There is no memory barrier between writes A and B (Did I miss
>>>    them?).  Meaning that those operations can be seen by C in a
>>>    different order regardless of barriers issued by C and regardless
>>>    of the nature of devices A and B.
>>>
>>> 2. Even if there is a write barrier between A and B, there is
>>>    no guarantee that C will see these writes in the same order
>>>    as C doesn't use real memory barriers because vhost advertises
>>
>> s/advertises/does not advertise/
>>
>>>    VIRTIO_F_ORDER_PLATFORM.
>>>
>>> So, I'm getting to conclusion that there is a missing write barrier
>>> on the vhost side and vhost itself must not advertise the
>>
>> s/must not/must/
>>
>> Sorry, I wrote things backwards. :)
>>
>>> VIRTIO_F_ORDER_PLATFORM, so the virtio driver can use actual memory
>>> barriers.
>>>
>>> Would like to hear some thoughts on that topic.  Is it a real issue?
>>> Is it an issue considering all possible CPU architectures and DMA
>>> HW variants?
>>>
> 
> In terms of ordering of operations using dmadev:
> 
> * Some DMA HW will perform all operations strictly in order e.g. Intel
>   IOAT, while other hardware may not guarantee order of operations/do
>   things in parallel e.g. Intel DSA. Therefore the dmadev API provides the
>   fence operation which allows the order to be enforced. The fence can be
>   thought of as a full memory barrier, meaning no jobs after the barrier can
>   be started until all those before it have completed. Obviously, for HW
>   where order is always enforced, this will be a no-op, but for hardware that
>   parallelizes, we want to reduce the fences to get best performance.
> 
> * For synchronization between DMA devices and CPUs, where a CPU can only
>   write after a DMA copy has been done, the CPU must wait for the dma
>   completion to guarantee ordering. Once the completion has been returned
>   the completed operation is globally visible to all cores.

Thanks for explanation!  Some questions though:

In our case one CPU waits for completion and another CPU is actually using
the data.  IOW, "CPU must wait" is a bit ambiguous.  Which CPU must wait?

Or should it be "Once the completion is visible on any core, the completed
operation is globally visible to all cores." ?

And the main question:
  Are these synchronization claims documented somewhere?

Best regards, Ilya Maximets.

Reply via email to