On Wed, 2021-09-01 at 14:44 +0000, Xueming(Steven) Li wrote:
> 
> > -----Original Message-----
> > From: dev <dev-boun...@dpdk.org> On Behalf Of Xueming(Steven) Li
> > Sent: Sunday, August 29, 2021 3:08 PM
> > To: Jerin Jacob <jerinjac...@gmail.com>
> > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>; Xiaoyun Li 
> > <xiaoyun...@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd 
> > wrapper function
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > Sent: Thursday, August 26, 2021 7:28 PM
> > > To: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>; Xiaoyun Li
> > > <xiaoyun...@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd
> > > wrapper function
> > > 
> > > On Wed, Aug 18, 2021 at 7:38 PM Xueming(Steven) Li <xuemi...@nvidia.com> 
> > > wrote:
> > > > 
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > > > Sent: Wednesday, August 18, 2021 7:48 PM
> > > > > To: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > > > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>;
> > > > > Xiaoyun Li <xiaoyun...@intel.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common
> > > > > fwd wrapper function
> > > > > 
> > > > > On Wed, Aug 18, 2021 at 4:57 PM Xueming(Steven) Li 
> > > > > <xuemi...@nvidia.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > > > > > Sent: Tuesday, August 17, 2021 5:37 PM
> > > > > > > To: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > > > > > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>;
> > > > > > > Xiaoyun Li <xiaoyun...@intel.com>
> > > > > > > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add
> > > > > > > common fwd wrapper function
> > > > > > > 
> > > > > > > On Wed, Aug 11, 2021 at 7:35 PM Xueming Li <xuemi...@nvidia.com> 
> > > > > > > wrote:
> > > > > > > > 
> > > > > > > > From: Xiaoyu Min <jack...@nvidia.com>
> > > > > > > > 
> > > > > > > > Added an inline common wrapper function for all fwd engines
> > > > > > > > which do the following in common:
> > > > > > > > 
> > > > > > > > 1. get_start_cycles
> > > > > > > > 2. rte_eth_rx_burst(...,nb_pkt_per_burst)
> > > > > > > > 3. if rxq_share do forward_shared_rxq(), otherwise do fwd 
> > > > > > > > directly 4.
> > > > > > > > get_end_cycle
> > > > > > > > 
> > > > > > > > Signed-off-by: Xiaoyu Min <jack...@nvidia.com>
> > > > > > > > ---
> > > > > > > >  app/test-pmd/testpmd.h | 24 ++++++++++++++++++++++++
> > > > > > > >  1 file changed, 24 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> > > > > > > > index
> > > > > > > > 13141dfed9..b685ac48d6 100644
> > > > > > > > --- a/app/test-pmd/testpmd.h
> > > > > > > > +++ b/app/test-pmd/testpmd.h
> > > > > > > > @@ -1022,6 +1022,30 @@ void add_tx_dynf_callback(portid_t
> > > > > > > > portid); void remove_tx_dynf_callback(portid_t portid);  int
> > > > > > > > update_jumbo_frame_offload(portid_t portid);
> > > > > > > > 
> > > > > > > > +static inline void
> > > > > > > > +do_burst_fwd(struct fwd_stream *fs, packet_fwd_cb fwd) {
> > > > > > > > +       struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> > > > > > > > +       uint16_t nb_rx;
> > > > > > > > +       uint64_t start_tsc = 0;
> > > > > > > > +
> > > > > > > > +       get_start_cycles(&start_tsc);
> > > > > > > > +
> > > > > > > > +       /*
> > > > > > > > +        * Receive a burst of packets and forward them.
> > > > > > > > +        */
> > > > > > > > +       nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue,
> > > > > > > > +                       pkts_burst, nb_pkt_per_burst);
> > > > > > > > +       inc_rx_burst_stats(fs, nb_rx);
> > > > > > > > +       if (unlikely(nb_rx == 0))
> > > > > > > > +               return;
> > > > > > > > +       if (unlikely(rxq_share > 0))
> > > > > > > 
> > > > > > > See below. It reads a global memory.
> > > > > > > 
> > > > > > > > +               forward_shared_rxq(fs, nb_rx, pkts_burst, fwd);
> > > > > > > > +       else
> > > > > > > > +               (*fwd)(fs, nb_rx, pkts_burst);
> > > > > > > 
> > > > > > > New function pointer in fastpath.
> > > > > > > 
> > > > > > > IMO, We should not create performance regression for the existing 
> > > > > > > forward engine.
> > > > > > > Can we have a new forward engine just for shared memory testing?
> > > > > > 
> > > > > > Yes, fully aware of the performance concern, the global could be 
> > > > > > defined around record_core_cycles to minimize the impacts.
> > > > > > Based on test data, the impacts almost invisible in legacy mode.
> > > > > 
> > > > > Are you saying there is zero % regression? If not, could you share 
> > > > > the data?
> > > > 
> > > > Almost zero, here is a quick single core result of rxonly:
> > > >         32.2Mpps, 58.9cycles/packet
> > > > Revert the patch to rxonly.c:
> > > >         32.1Mpps 59.9cycles/packet
> > > > The result doesn't make sense and I realized that I used batch mbuf 
> > > > free, apply it now:
> > > >         32.2Mpps, 58.9cycles/packet
> > > > There were small digit jumps between testpmd restart, I picked the best 
> > > > one.
> > > > The result is almost same, seems the cost of each packet is small 
> > > > enough.
> > > > BTW, I'm testing with default burst size and queue depth.
> > > 
> > > I tested this on octeontx2 with iofwd with single core with 100Gbps
> > > Without this patch - 73.5mpps With this patch - 72.8 mpps
> > > 
> > > We are taking the shared queue runtime option without a separate fwd 
> > > engine.
> > > and to have zero performance impact and no compile time flag Then I 
> > > think, only way to have a function template .
> > > Example change to outline function template principle.
> > > 
> > > static inline
> > > __pkt_burst_io_forward(struct fwd_stream *fs, const u64 flag) {
> > > 
> > > Introduce new checks under
> > > if (flags & SHARED_QUEUE)
> > > 
> > > 
> > > }
> > > 
> > > Have two versions of io_fwd_engine.packet_fwd per engine.
> > > 
> > > - first version
> > > static pkt_burst_io_forward(struct fwd_stream *fs) {
> > >         return __pkt_burst_io_forward(fs, 0); }
> > > 
> > > - Second version
> > > static pkt_burst_io_forward_shared_queue(struct fwd_stream *fs) {
> > >         return __pkt_burst_io_forward(fs, SHARED_QUEUE); }
> > > 
> > > 
> > > Update io_fwd_engine.packet_fwd in slowpath to respective version based 
> > > on offload.
> > > 
> > > If shared offoad is not selected, pkt_burst_io_forward() will be
> > > selected and
> > > __pkt_burst_io_forward() will be a compile time version of !SHARED_QUEUE 
> > > aka same as existing coe.
> > 
> > Thanks for testing and suggestion. So the only difference here in above 
> > code is access to rxq_shared changed to function parameter,
> > right? Have you tested this performance? If not, I could verify.
> 
> Performance result looks better by removing this wrapper and hide global 
> variable access like you suggested, thanks!
> Tried to add rxq_share bit field  in struct fwd_stream, same result as the 
> static function selection, looks less changes.

The changes reflected in v3, also consolidated patches for each
forwarding engine into one, please check.

> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > From test perspective, better to have all forward engine to
> > > > > > verify shared rxq, test team want to run the regression with
> > > > > > less impacts. Hope to have a solution to utilize all forwarding
> > > > > > engines
> > > > > seamlessly.
> > > > > 
> > > > > Yes. it good goal. testpmd forward performance using as synthetic 
> > > > > bench everyone.
> > > > > I think, we are aligned to not have any regression for the generic 
> > > > > forward engine.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > +       get_end_cycles(fs, start_tsc); }
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * Work-around of a compilation error with ICC on invocations 
> > > > > > > > of the
> > > > > > > >   * rte_be_to_cpu_16() function.
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > > 

Reply via email to