<snip>

> 
> On 3/6/2023 1:26 PM, Morten Brørup wrote:
> >> From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
> >> Sent: Monday, 6 March 2023 13.49
> >>
> >> On 1/4/2023 8:21 AM, Morten Brørup wrote:
> >>>> From: Feifei Wang [mailto:feifei.wa...@arm.com]
> >>>> Sent: Wednesday, 4 January 2023 08.31
> >>>>
> >>>> Add 'tx_fill_sw_ring' and 'rx_flush_descriptor' API into direct
> >>>> rearm mode for separate Rx and Tx Operation. And this can support
> >>>> different multiple sources in direct rearm mode. For examples, Rx
> >>>> driver is ixgbe, and Tx driver is i40e.
> >>>>
> >>>> Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> >>>> Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com>
> >>>> Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
> >>>> Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> >>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> >>>> ---
> >>>
> >>> This feature looks very promising for performance. I am pleased to
> >>> see
> >> progress on it.
> >>>
> >>
> >> Hi Morten,
> >>
> >> Yes it brings some performance, but not to generic use case, only to
> >> specific and constraint use case.
> >
> > I got the impression that the supported use case is a prominent and 
> > important
> use case.
> >
> 
> Can you please give real life samples for this use case, other than just 
> showing
> better performance number in the test bench? This helps to understand the
> reasoning better.
The very first patch started off with a constrained but prominent use case. 
Though, DPU based PCIe cards running DPDK applications with 1 or max 2 ports 
being used in tons of data centers is not a secret anymore and not a small use 
case that can be ignored.
However, the design of the patch has changed significantly from then. Now the 
solution can be applied to any generic use case that uses run-to-completion 
model of DPDK. i.e. the mapping of the RX and TX ports can be done dynamically 
in the data plane threads. There is no need of static configuration from 
control plane.

On the test bench, we need to make up our mind. When we see improvements, we 
say it is just a test bench. On other occasions when the test bench does not 
show any improvements (but improvements are shown by other metrics), we say the 
test bench does not show any improvements.

> 
> > This is the primary argument for considering such a complex non-generic
> feature.
I am not sure what is the complexity here, can you please elaborate?
I see other patches/designs (ex: proactive error recovery) which are way more 
complex to understand and comprehend.

> >
> >>
> >> And changes are relatively invasive comparing the usecase it
> >> supports, like it adds new two inline datapath functions and a new dev_ops.
> >>
> >> I am worried the unnecessary complexity and possible regressions in
> >> the fundamental and simple parts of the project, with a good
> >> intention to gain a few percentage performance in a specific usecase,
> >> can hurt the project.
I agree that we are touching some fundamental parts of the project. But, we 
also need to realize that those fundamental parts were not developed on 
architectures that have joined the project way later. Similarly, the use cases 
have evolved significantly from the original intended use cases. We cannot hold 
on to those fundamental designs if they affect the performance on other 
architectures while addressing prominent new use cases.
Please note that this patch does not break any existing features or affect 
their performance in any negative way. The generic and originally intended use 
cases can benefit from this feature.

> >>
> >>
> >> I can see this is compared to MBUF_FAST_FREE feature, but
> >> MBUF_FAST_FREE is just an offload benefiting from existing offload
> >> infrastructure, which requires very small update and logically change
> >> in application and simple to implement in the drivers. So, they are
> >> not same from complexity perspective.
> >>
> >> Briefly, I am not comfortable with this change, I would like to see
> >> an explicit approval and code review from techboard to proceed.
> >
> > I agree that the complexity is very high, and thus requires extra 
> > consideration.
> Your suggested techboard review and approval process seems like a good
> solution.
We can add to the agenda for the next Techboard meeting.

> >
> > And the performance benefit of direct rearm should be compared to the
> performance using the new zero-copy mempool API.
> >
> > -Morten
> >

Reply via email to