Hi Slava,

Thanks for your analysis. Some comments inline.

> -----Original Message-----
> From: Slava Ovsiienko <viachesl...@mellanox.com>
> Sent: Wednesday, July 29, 2020 4:08 PM
> To: Phil Yang <phil.y...@arm.com>
> Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh
> <rasl...@mellanox.com>; tho...@monjalon.net; ferruh.yi...@intel.com;
> Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; nd
> <n...@arm.com>; dev@dpdk.org; nd <n...@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> intitialization
> 
> Hi, Phil
> 
> Very nice comment, I found it is useful, I'm on moving mlx5 PMD to use C11
> atomic primitives, thank you.
> But, don't you think it is an overkill for testpmd case and would make code
> wordy and less readable ?
> How does testpmd start the forwarding? Let's have a glance at
> launch_packet_forwarding():
> 
> - initialize forwarding with port_fwd_begin() - It is tx_only_begin for txonly
> mode, master core context
> - launch the forwarding cores rte_eal_remote_launch(), the forwarding will
> happen in forward core context
> 
> Let's reconsider:
> 
>  - tx_only_begin() is called once in master core context, forwarding is not
> launched yet, there is NO any
>  concurrent access to variables, we can set ones in any way, NO atomic,
> volatile, etc. is needed at all.
>  We should do setup and commit all our memory writes - rte_wmb() seems

I thought this barrier in this patch was only for synchronizing the 
'timestamp_init_req' between master core and forwarding cores.
But in fact, it is more than that.  I am clear now.
I think it might be helpful to add some comments to this barrier. 


> to be the very native way
> to do it once for all preceding writes. Performance can be neglected here - it
> is not a datapath.

Agreed. It is not in the datapath and it won't hurt performance.

> 
> - pkt_burst_prepare() is being executed in the forwarding core context, and
> perform only READ access
>  to the global variables, those are stable while forwarding lasts, and can be
> considered as constants.
>  No atomic, barriers, etc. are needed either. I agree - the volatile is
> redundant ("over-reassurance"), let's
>  remove it. But atomic access - not needed, would make code complicated.

If we remove the volatile qualifier, then these atomic builtins becomes 
unnecessary.
Because all these 32-bits READs are atomic.

Thanks,
Phil

> 
> What do you think? Do you insist on atomic-barriered access implementing?
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Phil Yang <phil.y...@arm.com>
> > Sent: Tuesday, July 28, 2020 19:24
> > To: Slava Ovsiienko <viachesl...@mellanox.com>
> > Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh
> > <rasl...@mellanox.com>; Thomas Monjalon <tho...@monjalon.net>;
> > ferruh.yi...@intel.com; Honnappa Nagarahalli
> > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>; dev@dpdk.org;
> nd
> > <n...@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> > intitialization
> >
> > > -----Original Message-----
> > > From: dev <dev-boun...@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> > > Sent: Monday, July 27, 2020 11:27 PM
> > > To: dev@dpdk.org
> > > Cc: ma...@mellanox.com; rasl...@mellanox.com;
> > tho...@monjalon.net;
> > > ferruh.yi...@intel.com
> > > Subject: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> > > intitialization
> > >
> > > The testpmd application forwards data in multiple threads.
> > > In the txonly mode the Tx timestamps must be initialized on per thread
> > > basis to provide phase shift for the packet burst being sent. This per
> > > thread initialization was performed on zero value of the variable in
> > > thread local storage and happened only once after testpmd forwarding
> > > start. Executing "start" and "stop" commands did not cause thread
> > > local variables zeroing and wrong timestamp values were used.
> >
> > I think it is too heavy to use rte_wmb() to guarantee the visibility of
> > 'timestamp_init_req' updating for subsequent read operations.
> > We can use C11 atomics with explicit memory ordering instead of
> rte_wmb()
> > to achieve the same goal.
> >
> > >
> > > Fixes: 4940344dab1d ("app/testpmd: add Tx scheduling command")
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com>
> > > ---
> > >  app/test-pmd/txonly.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index
> > > 97f4a45..415431d 100644
> > > --- a/app/test-pmd/txonly.c
> > > +++ b/app/test-pmd/txonly.c
> > > @@ -55,9 +55,13 @@
> > >  static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets.
> > > */  RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew);
> > >                                   /**< Timestamp offset per queue */
> > > +RTE_DEFINE_PER_LCORE(uint32_t, timestamp_idone); /**< Timestamp
> > init
> > > done. */
> > > +
> > >  static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */
> > > static int32_t timestamp_off; /**< Timestamp dynamic field offset */
> > > static bool timestamp_enable; /**< Timestamp enable */
> > > +static volatile uint32_t timestamp_init_req;
> >
> > If we use C11 atomic builtins for 'timestamp_init_req' accessing, the 
> > volatile
> > key word becomes unnecessary.
> > Because they will generate same instructions.
> >
> > > +                          /**< Timestamp initialization request. */
> > >  static uint64_t timestamp_initial[RTE_MAX_ETHPORTS];
> > >
> > >  static void
> > > @@ -229,7 +233,8 @@
> > >                   rte_be64_t ts;
> > >           } timestamp_mark;
> > >
> > > -         if (unlikely(!skew)) {
> > > +         if (unlikely(timestamp_init_req !=
> >
> > if (unlikely(__atomic_load_n(&timestamp_init_req,
> __ATOMIC_RELAXED) !=
> >
> > > +                 RTE_PER_LCORE(timestamp_idone))) {
> > >                   struct rte_eth_dev *dev = &rte_eth_devices[fs-
> > > >tx_port];
> > >                   unsigned int txqs_n = dev->data->nb_tx_queues;
> > >                   uint64_t phase = tx_pkt_times_inter * fs->tx_queue
> /
> > @@ -241,6
> > > +246,7 @@
> > >                   skew = timestamp_initial[fs->tx_port] +
> > >                          tx_pkt_times_inter + phase;
> > >                   RTE_PER_LCORE(timestamp_qskew) = skew;
> > > +                 RTE_PER_LCORE(timestamp_idone) =
> > > timestamp_init_req;
> >
> >
> > RTE_PER_LCORE(timestamp_idone) =
> > __atomic_load_n(&timestamp_init_req, __ATOMIC_RELAXED);
> >
> >
> > >           }
> > >           timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx);
> > >           timestamp_mark.queue_idx = rte_cpu_to_be_16(fs-
> > > >tx_queue);
> > > @@ -426,6 +432,9 @@
> > >                      timestamp_mask &&
> > >                      timestamp_off >= 0 &&
> > >                      !rte_eth_read_clock(pi, &timestamp_initial[pi]);
> > > + if (timestamp_enable)
> > > +         timestamp_init_req++;
> >
> >
> > __atomic_add_fetch(&timestamp_init_req, 1, __ATOMIC_ACQ_REL);
> >
> >
> > > + rte_wmb();
> >
> > We can remove it now.
> >
> >
> > Thanks,
> > Phil

Reply via email to