eal: add temporal store memcpy support for AMD platform

Thomas Monjalon Wed, 27 Oct 2021 07:31:44 -0700

27/10/2021 16:10, Van Haaren, Harry:
> From: Aman Kumar <aman.ku...@vvdntech.in> 
> On Wed, Oct 27, 2021 at 5:53 PM Ananyev, Konstantin 
> <mailto:konstantin.anan...@intel.com> wrote
> > 
> > Hi Mattias,
> > 
> > > > 6) What is the use-case for this? When would a user *want* to use this 
> > > > instead
> > > of rte_memcpy()?
> > > > If the data being loaded is relevant to datapath/packets, presumably 
> > > > other
> > > packets might require the
> > > > loaded data, so temporal (normal) loads should be used to cache the 
> > > > source
> > > data?
> > >
> > >
> > > I'm not sure if your first question is rhetorical or not, but a memcpy()
> > > in a NT variant is certainly useful. One use case for a memcpy() with
> > > temporal loads and non-temporal stores is if you need to archive packet
> > > payload for (distant, potential) future use, and want to avoid causing
> > > unnecessary LLC evictions while doing so.
> > 
> > Yes I agree that there are certainly benefits in using cache-locality hints.
> > There is an open question around if the src or dst or both are non-temporal.
> > 
> > In the implementation of this patch, the NT/T type of store is reversed 
> > from your use-case:
> > 1) Loads are NT (so loaded data is not cached for future packets)
> > 2) Stores are T (so copied/dst data is now resident in L1/L2)
> > 
> > In theory there might even be valid uses for this type of memcpy where 
> > loaded
> > data is not needed again soon and stored data is referenced again soon,
> > although I cannot think of any here while typing this mail..
> > 
> > I think some use-case examples, and clear documentation on when/how to 
> > choose
> > between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is 
> > required
> > to progress this patch.
> > 
> > Assuming a strong use-case exists, and it can be clearly indicators to 
> > users of DPDK APIs which
> > rte_memcpy() to use, we can look at technical details around enabling the 
> > implementation.
> > 
> 
> [Konstantin wrote]:
> +1 here.
> Function behaviour and restrictions (src parameter needs to be 16/32 B 
> aligned, etc.),
> along with expected usage scenarios have to be documented properly.
> Again, as Harry pointed out, I don't see any AMD specific instructions in 
> this function,
> so presumably such function can go into __AVX2__ code block and no new 
> defines will
> be required. 
> 
> 
> [Aman wrote]:
> Agreed that APIs are generic but we've kept under an AMD flag for a simple 
> reason that it is NOT tested on any other platform.
> A use-case on how to use this was planned earlier for mlx5 pmd but dropped in 
> this version of patch as the data path of mlx5 is going to be refactored soon 
> and may not be useful for future versions of mlx5 (>22.02). 
> Ref link: 
> https://patchwork.dpdk.org/project/dpdk/patch/20211019104724.19416-2-aman.ku...@vvdntech.in/(we've
>  plan to adapt this into future version)
> The patch in the link basically enhances mlx5 mprq implementation for our 
> specific use-case and with 128B packet size, we achieve ~60% better perf. We 
> understand the use of this copy function should be documented which we shall 
> plan along with few other platform specific optimizations in future versions 
> of DPDK. As this does not conflict with other platforms, can we still keep 
> under AMD flag for now as suggested by Thomas?


I said I could merge if there is no objection.
I've overlooked that it's adding completely new functions in the API.
And the comments go in the direction of what I asked in previous version:
what is specific to AMD here?
Now seeing the valid objections, I agree it should be reworked.
We must provide API to applications which is generic, stable and well 
documented.


> [HvH wrote]:
> As an open-source community, any contributions should aim to improve the 
> whole.
> In the past, numerous improvements have been merged to DPDK that improve 
> performance.
> Sometimes these are architecture specific (x86/arm/ppc) sometimes the are ISA 
> specific (SSE, AVX512, NEON).
> 
> I am not familiar with any cases in DPDK, where there is a #ifdef based on a 
> *specific platform*.
> A quick "grep" through the "dpdk/lib" directory does not show any place where 
> PMD or generic code
> has been explicitly optimized for a *specific platform*.
> 
> Obviously, in cases where ISA either exists or does not exist, yes there is 
> an optimization to enable it.
> But this is not exposed as a top-level compile-time option, it uses runtime 
> CPU ISA detection.
> 
> Please take a step back from the code, and look at what this patch asks of 
> DPDK:
> "Please accept & maintain these changes upstream, which benefit only platform 
> X, even though these ISA features are also available on other platforms".
> 
> Other patches that enhance performance of DPDK ask this:
> "Please accept & maintain these changes upstream, which benefit all platforms 
> which have ISA capability X".
> 
> 
> === Question "As this does not conflict with other platforms, can we still 
> keep under AMD flag for now"?
> I feel the contribution is too specific to a platform. Make it generic by 
> enabling it at an ISA capability level.
> 
> Please yes, contribute to the DPDK community by improving performance of a 
> PMD by enabling/leveraging ISA.
> But do so in a way that does not benefit only a specific platform - do so in 
> a way that enhances all of DPDK, as
> other patches have done for the DPDK that this patch is built on.
> 
> If you have concerns that the PMD maintainers will not accept the changes due 
> to potential regressions on
> other platforms, then discuss those, make a plan on how to performance 
> validate, and work to a solution.
> 
> 
> === Regarding specifically the request for "can we still keep under AMD flag 
> for now"?
> I do not believe we should introduce APIs for specific platforms. DPDK's EAL 
> is an abstraction layer.
> The value of EAL is to provide a common abstraction. This platform-specific 
> flag breaks the abstraction,
> and results in packaging issues, as well as API/ABI instability based on 
> -Dcpu_instruction_set choice.
> So, no, we should not introduce APIs based on any compile-time flag.

I agree

Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy support for AMD platform

Reply via email to