> -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Viacheslav Ovsiienko > Sent: Monday, July 27, 2020 11:27 PM > To: dev@dpdk.org > Cc: ma...@mellanox.com; rasl...@mellanox.com; tho...@monjalon.net; > ferruh.yi...@intel.com > Subject: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > intitialization > > The testpmd application forwards data in multiple threads. > In the txonly mode the Tx timestamps must be initialized > on per thread basis to provide phase shift for the packet > burst being sent. This per thread initialization was performed > on zero value of the variable in thread local storage and > happened only once after testpmd forwarding start. Executing > "start" and "stop" commands did not cause thread local variables > zeroing and wrong timestamp values were used.
I think it is too heavy to use rte_wmb() to guarantee the visibility of 'timestamp_init_req' updating for subsequent read operations. We can use C11 atomics with explicit memory ordering instead of rte_wmb() to achieve the same goal. > > Fixes: 4940344dab1d ("app/testpmd: add Tx scheduling command") > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > --- > app/test-pmd/txonly.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c > index 97f4a45..415431d 100644 > --- a/app/test-pmd/txonly.c > +++ b/app/test-pmd/txonly.c > @@ -55,9 +55,13 @@ > static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */ > RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew); > /**< Timestamp offset per queue */ > +RTE_DEFINE_PER_LCORE(uint32_t, timestamp_idone); /**< Timestamp init > done. */ > + > static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */ > static int32_t timestamp_off; /**< Timestamp dynamic field offset */ > static bool timestamp_enable; /**< Timestamp enable */ > +static volatile uint32_t timestamp_init_req; If we use C11 atomic builtins for 'timestamp_init_req' accessing, the volatile key word becomes unnecessary. Because they will generate same instructions. > + /**< Timestamp initialization request. */ > static uint64_t timestamp_initial[RTE_MAX_ETHPORTS]; > > static void > @@ -229,7 +233,8 @@ > rte_be64_t ts; > } timestamp_mark; > > - if (unlikely(!skew)) { > + if (unlikely(timestamp_init_req != if (unlikely(__atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED) != > + RTE_PER_LCORE(timestamp_idone))) { > struct rte_eth_dev *dev = &rte_eth_devices[fs- > >tx_port]; > unsigned int txqs_n = dev->data->nb_tx_queues; > uint64_t phase = tx_pkt_times_inter * fs->tx_queue > / > @@ -241,6 +246,7 @@ > skew = timestamp_initial[fs->tx_port] + > tx_pkt_times_inter + phase; > RTE_PER_LCORE(timestamp_qskew) = skew; > + RTE_PER_LCORE(timestamp_idone) = > timestamp_init_req; RTE_PER_LCORE(timestamp_idone) = __atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED); > } > timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx); > timestamp_mark.queue_idx = rte_cpu_to_be_16(fs- > >tx_queue); > @@ -426,6 +432,9 @@ > timestamp_mask && > timestamp_off >= 0 && > !rte_eth_read_clock(pi, ×tamp_initial[pi]); > + if (timestamp_enable) > + timestamp_init_req++; __atomic_add_fetch(×tamp_init_req, 1, __ATOMIC_ACQ_REL); > + rte_wmb(); We can remove it now. Thanks, Phil