> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Thursday, August 26, 2021 7:28 PM > To: Xueming(Steven) Li <xuemi...@nvidia.com> > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>; Xiaoyun Li > <xiaoyun...@intel.com> > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper > function > > On Wed, Aug 18, 2021 at 7:38 PM Xueming(Steven) Li <xuemi...@nvidia.com> > wrote: > > > > > > > > > -----Original Message----- > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > Sent: Wednesday, August 18, 2021 7:48 PM > > > To: Xueming(Steven) Li <xuemi...@nvidia.com> > > > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>; Xiaoyun > > > Li <xiaoyun...@intel.com> > > > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd > > > wrapper function > > > > > > On Wed, Aug 18, 2021 at 4:57 PM Xueming(Steven) Li <xuemi...@nvidia.com> > > > wrote: > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > > > Sent: Tuesday, August 17, 2021 5:37 PM > > > > > To: Xueming(Steven) Li <xuemi...@nvidia.com> > > > > > Cc: Jack Min <jack...@nvidia.com>; dpdk-dev <dev@dpdk.org>; > > > > > Xiaoyun Li <xiaoyun...@intel.com> > > > > > Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common > > > > > fwd wrapper function > > > > > > > > > > On Wed, Aug 11, 2021 at 7:35 PM Xueming Li <xuemi...@nvidia.com> > > > > > wrote: > > > > > > > > > > > > From: Xiaoyu Min <jack...@nvidia.com> > > > > > > > > > > > > Added an inline common wrapper function for all fwd engines > > > > > > which do the following in common: > > > > > > > > > > > > 1. get_start_cycles > > > > > > 2. rte_eth_rx_burst(...,nb_pkt_per_burst) > > > > > > 3. if rxq_share do forward_shared_rxq(), otherwise do fwd directly > > > > > > 4. > > > > > > get_end_cycle > > > > > > > > > > > > Signed-off-by: Xiaoyu Min <jack...@nvidia.com> > > > > > > --- > > > > > > app/test-pmd/testpmd.h | 24 ++++++++++++++++++++++++ > > > > > > 1 file changed, 24 insertions(+) > > > > > > > > > > > > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h > > > > > > index > > > > > > 13141dfed9..b685ac48d6 100644 > > > > > > --- a/app/test-pmd/testpmd.h > > > > > > +++ b/app/test-pmd/testpmd.h > > > > > > @@ -1022,6 +1022,30 @@ void add_tx_dynf_callback(portid_t > > > > > > portid); void remove_tx_dynf_callback(portid_t portid); int > > > > > > update_jumbo_frame_offload(portid_t portid); > > > > > > > > > > > > +static inline void > > > > > > +do_burst_fwd(struct fwd_stream *fs, packet_fwd_cb fwd) { > > > > > > + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > > > > > > + uint16_t nb_rx; > > > > > > + uint64_t start_tsc = 0; > > > > > > + > > > > > > + get_start_cycles(&start_tsc); > > > > > > + > > > > > > + /* > > > > > > + * Receive a burst of packets and forward them. > > > > > > + */ > > > > > > + nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, > > > > > > + pkts_burst, nb_pkt_per_burst); > > > > > > + inc_rx_burst_stats(fs, nb_rx); > > > > > > + if (unlikely(nb_rx == 0)) > > > > > > + return; > > > > > > + if (unlikely(rxq_share > 0)) > > > > > > > > > > See below. It reads a global memory. > > > > > > > > > > > + forward_shared_rxq(fs, nb_rx, pkts_burst, fwd); > > > > > > + else > > > > > > + (*fwd)(fs, nb_rx, pkts_burst); > > > > > > > > > > New function pointer in fastpath. > > > > > > > > > > IMO, We should not create performance regression for the existing > > > > > forward engine. > > > > > Can we have a new forward engine just for shared memory testing? > > > > > > > > Yes, fully aware of the performance concern, the global could be > > > > defined around record_core_cycles to minimize the impacts. > > > > Based on test data, the impacts almost invisible in legacy mode. > > > > > > Are you saying there is zero % regression? If not, could you share the > > > data? > > > > Almost zero, here is a quick single core result of rxonly: > > 32.2Mpps, 58.9cycles/packet > > Revert the patch to rxonly.c: > > 32.1Mpps 59.9cycles/packet > > The result doesn't make sense and I realized that I used batch mbuf free, > > apply it now: > > 32.2Mpps, 58.9cycles/packet > > There were small digit jumps between testpmd restart, I picked the best one. > > The result is almost same, seems the cost of each packet is small enough. > > BTW, I'm testing with default burst size and queue depth. > > I tested this on octeontx2 with iofwd with single core with 100Gbps Without > this patch - 73.5mpps With this patch - 72.8 mpps > > We are taking the shared queue runtime option without a separate fwd engine. > and to have zero performance impact and no compile time flag Then I think, > only way to have a function template . > Example change to outline function template principle. > > static inline > __pkt_burst_io_forward(struct fwd_stream *fs, const u64 flag) { > > Introduce new checks under > if (flags & SHARED_QUEUE) > > > } > > Have two versions of io_fwd_engine.packet_fwd per engine. > > - first version > static pkt_burst_io_forward(struct fwd_stream *fs) { > return __pkt_burst_io_forward(fs, 0); } > > - Second version > static pkt_burst_io_forward_shared_queue(struct fwd_stream *fs) { > return __pkt_burst_io_forward(fs, SHARED_QUEUE); } > > > Update io_fwd_engine.packet_fwd in slowpath to respective version based on > offload. > > If shared offoad is not selected, pkt_burst_io_forward() will be selected and > __pkt_burst_io_forward() will be a compile time version of !SHARED_QUEUE aka > same as existing coe.
Thanks for testing and suggestion. So the only difference here in above code is access to rxq_shared changed to function parameter, right? Have you tested this performance? If not, I could verify. > > > > > > > > > > > > > > From test perspective, better to have all forward engine to verify > > > > shared rxq, test team want to run the regression with less > > > > impacts. Hope to have a solution to utilize all forwarding engines > > > seamlessly. > > > > > > Yes. it good goal. testpmd forward performance using as synthetic bench > > > everyone. > > > I think, we are aligned to not have any regression for the generic > > > forward engine. > > > > > > > > > > > > > > > > > > + get_end_cycles(fs, start_tsc); } > > > > > > + > > > > > > /* > > > > > > * Work-around of a compilation error with ICC on invocations of > > > > > > the > > > > > > * rte_be_to_cpu_16() function. > > > > > > -- > > > > > > 2.25.1 > > > > > >