Hi Honnappa, Inline comments...
> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Saturday, September 19, 2020 12:49 AM > To: Phil Yang <phil.y...@arm.com>; Jakub Grajciar -X (jgrajcia - PANTHEON > TECH SRO at Cisco) <jgraj...@cisco.com>; dev@dpdk.org > Cc: Ruifeng Wang <ruifeng.w...@arm.com>; nd <n...@arm.com>; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH] net/memif: relax barrier for zero copy path > > Hi Jakub, > I am trying to review this patch. I am having difficulty in > understanding > the implementation for the queue/ring, appreciate if you could help me > understand the logic. 'ring' refers to a ring buffer holding packet descriptors. These descriptors hold metadata about the packet (packet buffer address, length, etc..). 'queues' are a representation of rings and buffers (+ some metadata). In more detail, one ring (S2M) and packet buffers allocated for this ring would be represented as 'tx queue' for the slave and 'rx queue' for the master. > > 1) The S2M queues - are used to send packets from slave to master. My > understanding is that, the slave thread would call 'eth_memif_tx_zc' and the > master thread would call 'eth_memif_rx_zc'. Is this correct? > 2) The M2S queues - are used to send packets from master to slave. Here the > slave thread would call 'eth_memif_rx_zc' and the master thread would call > 'eth_memif_tx_zc'. Is this correct? This is inded correct. > > Thank you, > Honnappa > > > -----Original Message----- > > From: Phil Yang <phil.y...@arm.com> > > Sent: Friday, September 11, 2020 12:38 AM > > To: jgraj...@cisco.com; dev@dpdk.org > > Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Ruifeng Wang > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > Subject: [PATCH] net/memif: relax barrier for zero copy path > > > > Using 'rte_mb' to synchronize the shared ring head/tail between > > producer and consumer will stall the pipeline and damage performance > > on the weak memory model platforms, such like aarch64. > > > > Relax the expensive barrier with c11 atomic with explicit memory > > ordering can improve 3.6% performance on throughput. My question here is: `rte_mb` is supposed to make sure that head/tail pointer are not updated before the packets are written into shared memory. Does the atomic ensures that the packets are written into shared memory before head/tail pointers are updated?