On Thu, 2024-07-04 at 21:16 +0100, Anton Ivanov wrote: > > > - qi->queue_depth = 0; > > > + wmb(); /* Ensure that RX processing elsewhere sees the changes */ > > > + atomic_set(&qi->queue_depth, 0); > > > } > > I don't understand this. > > > > prep_queue_for_rx() is called by vector_mmsg_rx(), not in parallel or in > > a different thread or something, inside the NAPI polling. All it does is > > reset the queue to empty with all the SKBs allocated. > > > > After that, prep_queue_for_rx() calls uml_vector_recvmmsg() [1] which > > fills the SKBs, and then vector_mmsg_rx() itself consumes them by going > > over it and calling napi_gro_receive() (or whatever else). > > > > There's no parallelism here? The RX queue wouldn't even need locking or > > atomic instructions at all at this point, since it's just "refill > > buffers, fill buffers, release buffers to network stack". > > > > What am I missing? > > You are not missing anything. > > The rx and tx are using the same infra to map vectors to skbs arrays. > > I can make the RX use a set of lockless and atomicless functions, but > this means duplicating some of the code.
Right. No need to duplicate I'd say, but I don't see then a need to add a wmb() barrier here? It's confusing me because there's nothing accessing it in parallel. johannes