Hi Pavan, > -----Original Message----- > From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > Sent: Saturday, October 26, 2019 1:06 AM > To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; > jer...@marvell.com > Cc: dev@dpdk.org; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for > head > > Hi Gavin, > >Hi Pavan, > > > >> -----Original Message----- > >> From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > >> Sent: Friday, October 25, 2019 12:26 PM > >> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; > >> jer...@marvell.com > >> Cc: dev@dpdk.org; nd <n...@arm.com> > >> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while > >waiting for > >> head > >> > >> Hi Gavin, > >> > >> >-----Original Message----- > >> >From: dev <dev-boun...@dpdk.org> On Behalf Of Gavin Hu (Arm > >> >Technology China) > >> >Sent: Thursday, October 24, 2019 9:23 PM > >> >To: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>; > >Jerin > >> >Jacob Kollanukkaran <jer...@marvell.com> > >> >Cc: dev@dpdk.org; nd <n...@arm.com> > >> >Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while > >> >waiting for head > >> > > >> >Hi Pavan, > >> > > >> >> -----Original Message----- > >> >> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com> > >> >> Sent: Thursday, October 24, 2019 12:13 AM > >> >> To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; > >> >> jer...@marvell.com; Pavan Nikhilesh > ><pbhagavat...@marvell.com> > >> >> Cc: dev@dpdk.org > >> >> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while > >waiting > >> >for > >> >> head > >> >> > >> >> From: Pavan Nikhilesh <pbhagavat...@marvell.com> > >> >> > >> >> Use wfe to save power while waiting for tag to become head. > >> >> > >> >> SSO signals EVENTI to allow cores to exit from wfe when they > >> >> are waiting for specific operations in which one of them is > >> >> setting HEAD bit in GWS_TAG. > >> >> > >> >> Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com> > >> >> --- > >> >> drivers/event/octeontx2/otx2_worker.h | 30 > >> >++++++++++++++++++++++++-- > >> >> - > >> >> 1 file changed, 27 insertions(+), 3 deletions(-) > >> >> > >> >> diff --git a/drivers/event/octeontx2/otx2_worker.h > >> >> b/drivers/event/octeontx2/otx2_worker.h > >> >> index 4e971f27c..7a55caca5 100644 > >> >> --- a/drivers/event/octeontx2/otx2_worker.h > >> >> +++ b/drivers/event/octeontx2/otx2_worker.h > >> >> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct > >> >otx2_ssogws *ws) > >> >> } > >> >> > >> >> static __rte_always_inline void > >> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t > >> >wait_flag) > >> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws) > >> >> { > >> >> - while (wait_flag && !(otx2_read64(ws->tag_op) & > >> >BIT_ULL(35))) > >> >> +#ifdef RTE_ARCH_ARM64 > >> >> + uint64_t tag; > >> >> + > >> >> + asm volatile ( > >> >> + " ldr %[tag], [%[tag_op]] \n" > >> >"ldxr" should be used, exclusive-load is required to "monitor" the > >> >location, then a write to the location will cause clear of the exclusive > >> >monitor, thus a wake up event is generated implicitly. > >> > >> As I have mentioned in the commit log: > >> "SSO signals EVENTI to allow cores to exit from wfe when they > >> are waiting for specific operations in which one of them is > >> setting HEAD bit in GWS_TAG." > >If you have other expected wake up sources, that is ok. Just curious is > >this signal explicitly sent to quit WFE? > > AFAIK yes, explicitly sent to quit WFE. > > >Just wondering, implicit event(Clear of exclusive monitor) vs explicit > >signal, which has shorter latency? > > Not really sure but SSO has dedicated bus inside each core. That's ok. > > >/Gavin > >> > >> The address need not be tracked by the global monitor. > >> > >> >You can find more explanation is here: > >> >https://urldefense.proofpoint.com/v2/url?u=http- > >> > >>3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068 > >F > >> >6B0- > >> > >>40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKj > >W > >> > >>ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6 > >F > >> >N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0- > >> >GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC- > >> >iTiJXU0Vwxr4&e= > >> >/Gavin > >> >> + " tbnz %[tag], 35, done%= > >> > \n" > >> >> + " sevl \n" > >> >> + "rty%=: wfe \n" > >> >> + " ldr %[tag], [%[tag_op]] \n" > >> >> + " tbz %[tag], 35, rty%= \n" > >> >> + "done%=: \n" > >> >> + : [tag] "=&r" (tag) > >> >> + : [tag_op] "r" (ws->tag_op) > >> >> + ); > >> >> +#else > >> >> + /* Wait for the HEAD to be set */ > >> >> + while (!(otx2_read64(ws->tag_op) & BIT_ULL(35))) > >> >> ; > >> >> +#endif > >> >> +} > >> >> + > >> >> +static __rte_always_inline void > >> >> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t > >> >wait_flag) > >> >> +{ > >> >> + if (wait_flag) > >> >> + otx2_ssogws_head_wait(ws); > >> >> > >> >> rte_cio_wmb(); > >> >What ordering does this barrier try to keep? If there is a write then > >wait > >> >for kind of response, should this barrier move before > >> >otx2_ssogws_head_wait? > >> > >> The barrier is used to flush out write buffer to LLC (octeontx2 point of > >> coherence) so > >> that NIX Tx picks up all the modifications done to the packet. > > >Looking at the otx2_ssogws_event_tx function, so far at the point of > >rte_cio_wmb, only the header is written? > >Should it be delayed after the whole packet written and before the > >submission? > > We only care that the writes to the actual packet buffer ex. Start of ethernet > header > are committed. > The rest of mbuf fields are translated into a HW command after the barrier > and written > to a LMTLINE using ldoer. > > >If NIX is not falling within the SMP configuration, should it be > >rte_io_wmb instead? > > Octeontx2 has only single shareability domain i.e. it makes no distinction > between > Outer and inner sharable domains. > Since all IO devices are interpreted to be on outer sharable domain, we like > to use > rte_cio_(r/w)mb for IO devices. Yes, for an integral part of the out sharable domain, rte_cio_(r/w)mb is sufficient. > > Regards, > Pavan. > > >> >> } > >> >> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws > >> >*ws, > >> >> struct rte_event ev[], > >> >> > >> >> /* Perform header writes before barrier for TSO */ > >> >> otx2_nix_xmit_prepare_tso(m, flags); > >> >> - otx2_ssogws_head_wait(ws, !ev->sched_type); > >> >> + otx2_ssogws_order(ws, !ev->sched_type); > >> >> otx2_ssogws_prepare_pkt(txq, m, cmd, flags); > >> >> > >> >> if (flags & NIX_TX_MULTI_SEG_F) { > >> >> -- > >> >> 2.17.1
Reviewed-by: Gavin Hu <gavin...@arm.com>