eal: add temporal store memcpy support for AMD platform

Ananyev, Konstantin Wed, 27 Oct 2021 07:26:43 -0700


From: Aman Kumar <aman.ku...@vvdntech.in> 
Sent: Wednesday, October 27, 2021 2:35 PM
To: Ananyev, Konstantin <konstantin.anan...@intel.com>
Cc: Van Haaren, Harry <harry.van.haa...@intel.com>; mattias.ronnblom 
<mattias.ronnb...@ericsson.com>; Thomas Monjalon <tho...@monjalon.net>; 
dev@dpdk.org; viachesl...@nvidia.com; Burakov, Anatoly 
<anatoly.bura...@intel.com>; Song, Keesang <keesang.s...@amd.com>; 
jerinjac...@gmail.com; Richardson, Bruce <bruce.richard...@intel.com>; 
honnappa.nagaraha...@arm.com; Ruifeng Wang <ruifeng.w...@arm.com>; David 
Christensen <d...@linux.vnet.ibm.com>; david.march...@redhat.com; 
step...@networkplumber.org
Subject: Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy 
support for AMD platform


>> 
>> Hi Mattias,
>> 
>> > > 6) What is the use-case for this? When would a user *want* to use this 
>> > > instead
>> > of rte_memcpy()?
>> > > If the data being loaded is relevant to datapath/packets, presumably 
>> > > other
>> > packets might require the
>> > > loaded data, so temporal (normal) loads should be used to cache the 
>> > > source
>> > data?
>> >
>> >
>> > I'm not sure if your first question is rhetorical or not, but a memcpy()
>> > in a NT variant is certainly useful. One use case for a memcpy() with
>> > temporal loads and non-temporal stores is if you need to archive packet
>> > payload for (distant, potential) future use, and want to avoid causing
>> > unnecessary LLC evictions while doing so.
>> 
>> Yes I agree that there are certainly benefits in using cache-locality hints.
>> There is an open question around if the src or dst or both are non-temporal.
>> 
>> In the implementation of this patch, the NT/T type of store is reversed from 
>> your use-case:
>> 1) Loads are NT (so loaded data is not cached for future packets)
>> 2) Stores are T (so copied/dst data is now resident in L1/L2)
>> 
>> In theory there might even be valid uses for this type of memcpy where loaded
>> data is not needed again soon and stored data is referenced again soon,
>> although I cannot think of any here while typing this mail..
>> 
>> I think some use-case examples, and clear documentation on when/how to choose
>> between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is 
>> required
>> to progress this patch.
>> 
>> Assuming a strong use-case exists, and it can be clearly indicators to users 
>> of DPDK APIs which
>> rte_memcpy() to use, we can look at technical details around enabling the 
>> implementation.
>> 
>
> +1 here.
> Function behaviour and restrictions (src parameter needs to be 16/32 B 
> aligned, etc.),
> along with expected usage scenarios have to be documented properly.
> Again, as Harry pointed out, I don't see any AMD specific instructions in 
> this function,
> so presumably such function can go into __AVX2__ code block and no new 
> defines will
> be required. 
> Agreed that APIs are generic but we've kept under an AMD flag for a simple 
> reason that it is NOT tested on any other platform.
> A use-case on how to use this was planned earlier for mlx5 pmd but dropped in 
> this version of patch as the data path of mlx5 is going to be refactored soon 
> and may not be useful for > future versions of mlx5 (>22.02). 
> Ref link: 
> https://patchwork.dpdk.org/project/dpdk/patch/20211019104724.19416-2-aman.ku...@vvdntech.in/(we've
>  plan to adapt this into future version)
> The patch in the link basically enhances mlx5 mprq implementation for our 
> specific use-case and with 128B packet size, we achieve ~60% better perf. We 
> understand the use of this
> copy function should be documented which we shall plan along with few other 
> platform specific optimizations in future versions of DPDK. As this does not 
> conflict with other  >platforms, can we still keep under AMD flag for now as 
> suggested by Thomas?

From what I read above the patch is sort of in the half-ready stage.
Why to rush here and try to push to the DPDK things that doesn't full-fill DPDK 
policy?
Probably better to do all missing parts first (docs, tests, etc.) and then 
come-up with updated version.

Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy support for AMD platform

Reply via email to