> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Wednesday, 11 May 2022 00.02
> 
> (apologies for the late response, this one slipped my mind)
> 
> Appreciate if others could weigh their opinions.
> 
> <snip>
> >
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > > Sent: Thursday, 21 April 2022 04.35
> > > >
> > > > > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > > > > Sent: Wednesday, 20 April 2022 10.17
> > > > >
> > > > > Enable direct rearm mode. The mapping is decided in the data
> plane
> > > > > based on the first packet received.
> > > >
> > > > I usually don't care much about l3fwd, but putting configuration
> > > changes in the
> > > > fast path is just wrong!
> > > I would say it depends. In this case the cycles consumed by the API
> > > are very less and configuration data is very small and is already
> in
> > > the cache as PMD has accessed the same data structure.
> > >
> > > If the configuration needs more cycles than a typical (depending on
> > > the
> > > application) data plane packet processing needs or brings in
> enormous
> > > amount of data in to the cache, it should not be done on the data
> > > plane.
> > >
> >
> > As a matter of principle, configuration changes should be done
> outside the fast
> > path.
> >
> > If we allow an exception for this feature, it will set a bad
> precedent about
> > where to put configuration code.
> I think there are other examples though not exactly the same. For ex:
> the seqlock, we cannot have a scheduled out writer while holding the
> lock. But, it was mentioned that this can be over come easily by
> running the writer on an isolated core (which to me breaks some
> principles).

Referring to a bad example (which breaks some principles) does not change my 
opinion. ;-)

> 
> >
> > > >
> > > > Also, l3fwd is often used for benchmarking, and this small piece
> of
> > > code in the
> > > > fast path will affect benchmark results (although only very
> little).
> > > We do not see any impact on the performance numbers. The reason for
> > > putting in the data plane was it covers wider use case in this
> L3fwd
> > > application. If the app were to be simple, the configuration could
> be
> > > done from the control plane. Unfortunately, the performance of
> L3fwd
> > > application matters.
> > >
> >
> > Let's proceed down that path for the sake of discussion... Then the
> fast path is
> > missing runtime verification that all preconditions for using
> remapping are
> > present at any time.
> Agree, few checks (ensuring that TX and RX buffers are from the same
> pool, ensuring tx_rs_thresh is same as RX rearm threshold) are missing.
> We will add these, it is possible to add these checks outside the
> packet processing loop.
> 
> >
> > > >
> > > > Please move it out of the fast path.
> >
> > BTW, this patch does not call the rte_eth_direct_rxrearm_enable() to
> enable
> > the feature.
> >
> > And finally, this feature should be disabled by default, and only
> enabled by a
> > command line parameter or similar. Otherwise, future l3fwd NIC
> performance
> > reports will provide misleading performance results, if the feature
> is utilized.
> > Application developers, when comparing NIC performance results, don't
> care
> > about the performance for this unique use case; they care about the
> > performance for the generic use case.
> >
> I think this feature is similar to fast free feature
> (RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) as you have mentioned in the other
> thread. It should be handled similar to how fast free feature is
> handled.

I agree with this comparison.

Quickly skimming l3fwd/main.c reveals that RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE is 
used without checking preconditions, and thus might be buggy. E.g. what happens 
when the NICs support RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE, and l3fwd is run with 
the "per-port-pool" command line option? Obviously, the "direct rearm" patch 
should not be punished because of bugs in similar features ("fast free"). But 
it is not a valid reason to allow similar bugs. You mentioned above that 
precondition checking will be added, so pardon me for ranting a bit here. ;-)

Furthermore, if using l3fwd for NIC performance reports, I find the results 
misleading if application specific optimizations are used without mentioning it 
in the report. This applies to both "fast free" and "direct rearm" 
optimizations - they only work in specific application scenarios, and thus the 
l3fwd NIC performance test should be run without these optimization, or at 
least mention that the report only covers these specific applications. Which is 
why I prefer that such optimizations must be explicitly enabled through a 
command line parameter, and not used in testing for official NIC performance 
reports. Taking one step back, the real problem here is that an *example* 
application is used for NIC performance testing, and this is the main reason 
for my performance related objections. I should probably object to using l3fwd 
for NIC performance testing instead.

I don't feel strongly about l3fwd, so I will not object to the l3fwd patch. 
Just providing some feedback. :-)


Reply via email to