Hi Pavan,

> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
> Sent: Saturday, October 26, 2019 1:06 AM
> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>;
> jer...@marvell.com
> Cc: dev@dpdk.org; nd <n...@arm.com>; nd <n...@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for
> head
> 
> Hi Gavin,
> >Hi Pavan,
> >
> >> -----Original Message-----
> >> From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
> >> Sent: Friday, October 25, 2019 12:26 PM
> >> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>;
> >> jer...@marvell.com
> >> Cc: dev@dpdk.org; nd <n...@arm.com>
> >> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >waiting for
> >> head
> >>
> >> Hi Gavin,
> >>
> >> >-----Original Message-----
> >> >From: dev <dev-boun...@dpdk.org> On Behalf Of Gavin Hu (Arm
> >> >Technology China)
> >> >Sent: Thursday, October 24, 2019 9:23 PM
> >> >To: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>;
> >Jerin
> >> >Jacob Kollanukkaran <jer...@marvell.com>
> >> >Cc: dev@dpdk.org; nd <n...@arm.com>
> >> >Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >> >waiting for head
> >> >
> >> >Hi Pavan,
> >> >
> >> >> -----Original Message-----
> >> >> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com>
> >> >> Sent: Thursday, October 24, 2019 12:13 AM
> >> >> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>;
> >> >> jer...@marvell.com; Pavan Nikhilesh
> ><pbhagavat...@marvell.com>
> >> >> Cc: dev@dpdk.org
> >> >> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >waiting
> >> >for
> >> >> head
> >> >>
> >> >> From: Pavan Nikhilesh <pbhagavat...@marvell.com>
> >> >>
> >> >> Use wfe to save power while waiting for tag to become head.
> >> >>
> >> >> SSO signals EVENTI to allow cores to exit from wfe when they
> >> >> are waiting for specific operations in which one of them is
> >> >> setting HEAD bit in GWS_TAG.
> >> >>
> >> >> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com>
> >> >> ---
> >> >>  drivers/event/octeontx2/otx2_worker.h | 30
> >> >++++++++++++++++++++++++--
> >> >> -
> >> >>  1 file changed, 27 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/drivers/event/octeontx2/otx2_worker.h
> >> >> b/drivers/event/octeontx2/otx2_worker.h
> >> >> index 4e971f27c..7a55caca5 100644
> >> >> --- a/drivers/event/octeontx2/otx2_worker.h
> >> >> +++ b/drivers/event/octeontx2/otx2_worker.h
> >> >> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct
> >> >otx2_ssogws *ws)
> >> >>  }
> >> >>
> >> >>  static __rte_always_inline void
> >> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
> >> >wait_flag)
> >> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
> >> >>  {
> >> >> -       while (wait_flag && !(otx2_read64(ws->tag_op) &
> >> >BIT_ULL(35)))
> >> >> +#ifdef RTE_ARCH_ARM64
> >> >> +       uint64_t tag;
> >> >> +
> >> >> +       asm volatile (
> >> >> +                       "       ldr %[tag], [%[tag_op]]         \n"
> >> >"ldxr" should be used, exclusive-load is required to "monitor" the
> >> >location, then a write to the location will cause clear of the exclusive
> >> >monitor, thus a wake up event is generated implicitly.
> >>
> >> As I have mentioned in the commit log:
> >> "SSO signals EVENTI to allow cores to exit from wfe when they
> >> are waiting for specific operations in which one of them is
> >> setting HEAD bit in GWS_TAG."
> >If you have other expected wake up sources, that is ok. Just curious is
> >this signal explicitly sent to quit WFE?
> 
> AFAIK yes, explicitly sent to quit WFE.
> 
> >Just wondering, implicit event(Clear of exclusive monitor) vs explicit
> >signal, which has shorter latency?
> 
> Not really sure but SSO has dedicated bus inside each core.
That's ok.
> 
> >/Gavin
> >>
> >> The address need not be tracked by the global monitor.
> >>
> >> >You can find more explanation is here:
> >> >https://urldefense.proofpoint.com/v2/url?u=http-
> >>
> >>3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068
> >F
> >> >6B0-
> >>
> >>40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKj
> >W
> >>
> >>ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6
> >F
> >> >N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0-
> >> >GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC-
> >> >iTiJXU0Vwxr4&e=
> >> >/Gavin
> >> >> +                       "       tbnz %[tag], 35, done%=
> >> >  \n"
> >> >> +                       "       sevl                            \n"
> >> >> +                       "rty%=: wfe                             \n"
> >> >> +                       "       ldr %[tag], [%[tag_op]]         \n"
> >> >> +                       "       tbz %[tag], 35, rty%=           \n"
> >> >> +                       "done%=:                                \n"
> >> >> +                       : [tag] "=&r" (tag)
> >> >> +                       : [tag_op] "r" (ws->tag_op)
> >> >> +                       );
> >> >> +#else
> >> >> +       /* Wait for the HEAD to be set */
> >> >> +       while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
> >> >>                 ;
> >> >> +#endif
> >> >> +}
> >> >> +
> >> >> +static __rte_always_inline void
> >> >> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t
> >> >wait_flag)
> >> >> +{
> >> >> +       if (wait_flag)
> >> >> +               otx2_ssogws_head_wait(ws);
> >> >>
> >> >>         rte_cio_wmb();
> >> >What ordering does this barrier try to keep?  If there is a write then
> >wait
> >> >for kind of response, should this barrier move before
> >> >otx2_ssogws_head_wait?
> >>
> >> The barrier is used to flush out write buffer to LLC (octeontx2 point of
> >> coherence) so
> >> that NIX Tx picks up all the modifications done to the packet.
> 
> >Looking at the otx2_ssogws_event_tx function, so far at the point of
> >rte_cio_wmb, only the header is written?
> >Should it be delayed after the whole packet written and before the
> >submission?
> 
> We only care that the writes to the actual packet buffer ex. Start of ethernet
> header
> are committed.
> The rest of mbuf fields are translated into a HW command after the barrier
> and written
> to a LMTLINE using ldoer.
> 
> >If NIX is not falling within the SMP configuration, should it be
> >rte_io_wmb instead?
> 
> Octeontx2 has only single shareability domain i.e. it makes no distinction
> between
> Outer and inner sharable domains.
> Since all IO devices are interpreted to be on outer sharable domain, we like
> to use
> rte_cio_(r/w)mb for IO devices.
Yes, for an integral part of the out sharable domain, rte_cio_(r/w)mb is 
sufficient. 
> 
> Regards,
> Pavan.
> 
> >> >>  }
> >> >> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws
> >> >*ws,
> >> >> struct rte_event ev[],
> >> >>
> >> >>         /* Perform header writes before barrier for TSO */
> >> >>         otx2_nix_xmit_prepare_tso(m, flags);
> >> >> -       otx2_ssogws_head_wait(ws, !ev->sched_type);
> >> >> +       otx2_ssogws_order(ws, !ev->sched_type);
> >> >>         otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
> >> >>
> >> >>         if (flags & NIX_TX_MULTI_SEG_F) {
> >> >> --
> >> >> 2.17.1

Reviewed-by: Gavin Hu <gavin...@arm.com>

Reply via email to