Hi, Phil Very nice comment, I found it is useful, I'm on moving mlx5 PMD to use C11 atomic primitives, thank you. But, don't you think it is an overkill for testpmd case and would make code wordy and less readable ? How does testpmd start the forwarding? Let's have a glance at launch_packet_forwarding():
- initialize forwarding with port_fwd_begin() - It is tx_only_begin for txonly mode, master core context - launch the forwarding cores rte_eal_remote_launch(), the forwarding will happen in forward core context Let's reconsider: - tx_only_begin() is called once in master core context, forwarding is not launched yet, there is NO any concurrent access to variables, we can set ones in any way, NO atomic, volatile, etc. is needed at all. We should do setup and commit all our memory writes - rte_wmb() seems to be the very native way to do it once for all preceding writes. Performance can be neglected here - it is not a datapath. - pkt_burst_prepare() is being executed in the forwarding core context, and perform only READ access to the global variables, those are stable while forwarding lasts, and can be considered as constants. No atomic, barriers, etc. are needed either. I agree - the volatile is redundant ("over-reassurance"), let's remove it. But atomic access - not needed, would make code complicated. What do you think? Do you insist on atomic-barriered access implementing? With best regards, Slava > -----Original Message----- > From: Phil Yang <phil.y...@arm.com> > Sent: Tuesday, July 28, 2020 19:24 > To: Slava Ovsiienko <viachesl...@mellanox.com> > Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh > <rasl...@mellanox.com>; Thomas Monjalon <tho...@monjalon.net>; > ferruh.yi...@intel.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>; dev@dpdk.org; nd > <n...@arm.com> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > intitialization > > > -----Original Message----- > > From: dev <dev-boun...@dpdk.org> On Behalf Of Viacheslav Ovsiienko > > Sent: Monday, July 27, 2020 11:27 PM > > To: dev@dpdk.org > > Cc: ma...@mellanox.com; rasl...@mellanox.com; > tho...@monjalon.net; > > ferruh.yi...@intel.com > > Subject: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > > intitialization > > > > The testpmd application forwards data in multiple threads. > > In the txonly mode the Tx timestamps must be initialized on per thread > > basis to provide phase shift for the packet burst being sent. This per > > thread initialization was performed on zero value of the variable in > > thread local storage and happened only once after testpmd forwarding > > start. Executing "start" and "stop" commands did not cause thread > > local variables zeroing and wrong timestamp values were used. > > I think it is too heavy to use rte_wmb() to guarantee the visibility of > 'timestamp_init_req' updating for subsequent read operations. > We can use C11 atomics with explicit memory ordering instead of rte_wmb() > to achieve the same goal. > > > > > Fixes: 4940344dab1d ("app/testpmd: add Tx scheduling command") > > > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > > --- > > app/test-pmd/txonly.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index > > 97f4a45..415431d 100644 > > --- a/app/test-pmd/txonly.c > > +++ b/app/test-pmd/txonly.c > > @@ -55,9 +55,13 @@ > > static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. > > */ RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew); > > /**< Timestamp offset per queue */ > > +RTE_DEFINE_PER_LCORE(uint32_t, timestamp_idone); /**< Timestamp > init > > done. */ > > + > > static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */ > > static int32_t timestamp_off; /**< Timestamp dynamic field offset */ > > static bool timestamp_enable; /**< Timestamp enable */ > > +static volatile uint32_t timestamp_init_req; > > If we use C11 atomic builtins for 'timestamp_init_req' accessing, the volatile > key word becomes unnecessary. > Because they will generate same instructions. > > > + /**< Timestamp initialization request. */ > > static uint64_t timestamp_initial[RTE_MAX_ETHPORTS]; > > > > static void > > @@ -229,7 +233,8 @@ > > rte_be64_t ts; > > } timestamp_mark; > > > > - if (unlikely(!skew)) { > > + if (unlikely(timestamp_init_req != > > if (unlikely(__atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED) != > > > + RTE_PER_LCORE(timestamp_idone))) { > > struct rte_eth_dev *dev = &rte_eth_devices[fs- > > >tx_port]; > > unsigned int txqs_n = dev->data->nb_tx_queues; > > uint64_t phase = tx_pkt_times_inter * fs->tx_queue / > @@ -241,6 > > +246,7 @@ > > skew = timestamp_initial[fs->tx_port] + > > tx_pkt_times_inter + phase; > > RTE_PER_LCORE(timestamp_qskew) = skew; > > + RTE_PER_LCORE(timestamp_idone) = > > timestamp_init_req; > > > RTE_PER_LCORE(timestamp_idone) = > __atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED); > > > > } > > timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx); > > timestamp_mark.queue_idx = rte_cpu_to_be_16(fs- > > >tx_queue); > > @@ -426,6 +432,9 @@ > > timestamp_mask && > > timestamp_off >= 0 && > > !rte_eth_read_clock(pi, ×tamp_initial[pi]); > > + if (timestamp_enable) > > + timestamp_init_req++; > > > __atomic_add_fetch(×tamp_init_req, 1, __ATOMIC_ACQ_REL); > > > > + rte_wmb(); > > We can remove it now. > > > Thanks, > Phil