Hi Slava, Thanks for your analysis. Some comments inline.
> -----Original Message----- > From: Slava Ovsiienko <viachesl...@mellanox.com> > Sent: Wednesday, July 29, 2020 4:08 PM > To: Phil Yang <phil.y...@arm.com> > Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh > <rasl...@mellanox.com>; tho...@monjalon.net; ferruh.yi...@intel.com; > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; nd > <n...@arm.com>; dev@dpdk.org; nd <n...@arm.com> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > intitialization > > Hi, Phil > > Very nice comment, I found it is useful, I'm on moving mlx5 PMD to use C11 > atomic primitives, thank you. > But, don't you think it is an overkill for testpmd case and would make code > wordy and less readable ? > How does testpmd start the forwarding? Let's have a glance at > launch_packet_forwarding(): > > - initialize forwarding with port_fwd_begin() - It is tx_only_begin for txonly > mode, master core context > - launch the forwarding cores rte_eal_remote_launch(), the forwarding will > happen in forward core context > > Let's reconsider: > > - tx_only_begin() is called once in master core context, forwarding is not > launched yet, there is NO any > concurrent access to variables, we can set ones in any way, NO atomic, > volatile, etc. is needed at all. > We should do setup and commit all our memory writes - rte_wmb() seems I thought this barrier in this patch was only for synchronizing the 'timestamp_init_req' between master core and forwarding cores. But in fact, it is more than that. I am clear now. I think it might be helpful to add some comments to this barrier. > to be the very native way > to do it once for all preceding writes. Performance can be neglected here - it > is not a datapath. Agreed. It is not in the datapath and it won't hurt performance. > > - pkt_burst_prepare() is being executed in the forwarding core context, and > perform only READ access > to the global variables, those are stable while forwarding lasts, and can be > considered as constants. > No atomic, barriers, etc. are needed either. I agree - the volatile is > redundant ("over-reassurance"), let's > remove it. But atomic access - not needed, would make code complicated. If we remove the volatile qualifier, then these atomic builtins becomes unnecessary. Because all these 32-bits READs are atomic. Thanks, Phil > > What do you think? Do you insist on atomic-barriered access implementing? > > With best regards, > Slava > > > -----Original Message----- > > From: Phil Yang <phil.y...@arm.com> > > Sent: Tuesday, July 28, 2020 19:24 > > To: Slava Ovsiienko <viachesl...@mellanox.com> > > Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh > > <rasl...@mellanox.com>; Thomas Monjalon <tho...@monjalon.net>; > > ferruh.yi...@intel.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>; dev@dpdk.org; > nd > > <n...@arm.com> > > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > > intitialization > > > > > -----Original Message----- > > > From: dev <dev-boun...@dpdk.org> On Behalf Of Viacheslav Ovsiienko > > > Sent: Monday, July 27, 2020 11:27 PM > > > To: dev@dpdk.org > > > Cc: ma...@mellanox.com; rasl...@mellanox.com; > > tho...@monjalon.net; > > > ferruh.yi...@intel.com > > > Subject: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp > > > intitialization > > > > > > The testpmd application forwards data in multiple threads. > > > In the txonly mode the Tx timestamps must be initialized on per thread > > > basis to provide phase shift for the packet burst being sent. This per > > > thread initialization was performed on zero value of the variable in > > > thread local storage and happened only once after testpmd forwarding > > > start. Executing "start" and "stop" commands did not cause thread > > > local variables zeroing and wrong timestamp values were used. > > > > I think it is too heavy to use rte_wmb() to guarantee the visibility of > > 'timestamp_init_req' updating for subsequent read operations. > > We can use C11 atomics with explicit memory ordering instead of > rte_wmb() > > to achieve the same goal. > > > > > > > > Fixes: 4940344dab1d ("app/testpmd: add Tx scheduling command") > > > > > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > > > --- > > > app/test-pmd/txonly.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index > > > 97f4a45..415431d 100644 > > > --- a/app/test-pmd/txonly.c > > > +++ b/app/test-pmd/txonly.c > > > @@ -55,9 +55,13 @@ > > > static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. > > > */ RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew); > > > /**< Timestamp offset per queue */ > > > +RTE_DEFINE_PER_LCORE(uint32_t, timestamp_idone); /**< Timestamp > > init > > > done. */ > > > + > > > static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */ > > > static int32_t timestamp_off; /**< Timestamp dynamic field offset */ > > > static bool timestamp_enable; /**< Timestamp enable */ > > > +static volatile uint32_t timestamp_init_req; > > > > If we use C11 atomic builtins for 'timestamp_init_req' accessing, the > > volatile > > key word becomes unnecessary. > > Because they will generate same instructions. > > > > > + /**< Timestamp initialization request. */ > > > static uint64_t timestamp_initial[RTE_MAX_ETHPORTS]; > > > > > > static void > > > @@ -229,7 +233,8 @@ > > > rte_be64_t ts; > > > } timestamp_mark; > > > > > > - if (unlikely(!skew)) { > > > + if (unlikely(timestamp_init_req != > > > > if (unlikely(__atomic_load_n(×tamp_init_req, > __ATOMIC_RELAXED) != > > > > > + RTE_PER_LCORE(timestamp_idone))) { > > > struct rte_eth_dev *dev = &rte_eth_devices[fs- > > > >tx_port]; > > > unsigned int txqs_n = dev->data->nb_tx_queues; > > > uint64_t phase = tx_pkt_times_inter * fs->tx_queue > / > > @@ -241,6 > > > +246,7 @@ > > > skew = timestamp_initial[fs->tx_port] + > > > tx_pkt_times_inter + phase; > > > RTE_PER_LCORE(timestamp_qskew) = skew; > > > + RTE_PER_LCORE(timestamp_idone) = > > > timestamp_init_req; > > > > > > RTE_PER_LCORE(timestamp_idone) = > > __atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED); > > > > > > > } > > > timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx); > > > timestamp_mark.queue_idx = rte_cpu_to_be_16(fs- > > > >tx_queue); > > > @@ -426,6 +432,9 @@ > > > timestamp_mask && > > > timestamp_off >= 0 && > > > !rte_eth_read_clock(pi, ×tamp_initial[pi]); > > > + if (timestamp_enable) > > > + timestamp_init_req++; > > > > > > __atomic_add_fetch(×tamp_init_req, 1, __ATOMIC_ACQ_REL); > > > > > > > + rte_wmb(); > > > > We can remove it now. > > > > > > Thanks, > > Phil