> -----Original Message----- > From: Lance Richardson <lance.richard...@broadcom.com> > Sent: Thursday, July 8, 2021 9:51 PM > To: Zhang, Qi Z <qi.z.zh...@intel.com> > Cc: Joyce Kong <joyce.k...@arm.com>; Xing, Beilei <beilei.x...@intel.com>; > ruifeng.w...@arm.com; honnappa.nagaraha...@arm.com; Richardson, Bruce > <bruce.richard...@intel.com>; Zhang, Helin <helin.zh...@intel.com>; > dev@dpdk.org; sta...@dpdk.org; n...@arm.com > Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with > thread fence > > On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zh...@intel.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Joyce Kong <joyce.k...@arm.com> > > > Sent: Tuesday, July 6, 2021 2:54 PM > > > To: Xing, Beilei <beilei.x...@intel.com>; Zhang, Qi Z > <qi.z.zh...@intel.com>; > > > ruifeng.w...@arm.com; honnappa.nagaraha...@arm.com; Richardson, > Bruce > > > <bruce.richard...@intel.com>; Zhang, Helin <helin.zh...@intel.com> > > > Cc: dev@dpdk.org; sta...@dpdk.org; n...@arm.com > > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence > > > > > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring > sacn, > > > if there is no synchronization point. > > > > > > Signed-off-by: Joyce Kong <joyce.k...@arm.com> > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > > --- > > > drivers/net/i40e/i40e_rxtx.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c > > > index > > > 9aaabfd92..86e2f083e 100644 > > > --- a/drivers/net/i40e/i40e_rxtx.c > > > +++ b/drivers/net/i40e/i40e_rxtx.c > > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > > > > I40E_RXD_QW1_STATUS_SHIFT; > > > } > > > > > > - rte_smp_rmb(); > > > + /* This barrier is to order loads of different words in the > descriptor */ > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > Now for x86, you actually replace a compiler barrier with a memory fence, > this may have potential performance impact which need additional resource to > investigate > > No memory fence instruction is generated for > __ATOMIC_ACQUIRE on x86 for any version of gcc > or clang that I've tried, based on experiments here: > > https://godbolt.org/z/Yxr1vGhKP
Nice tool! I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE) but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?