> -----Original Message----- > From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > Sent: Wednesday, October 27, 2021 12:42 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Thomas Monjalon > <tho...@monjalon.net>; Aman Kumar <aman.ku...@vvdntech.in> > Cc: dev@dpdk.org; viachesl...@nvidia.com; Burakov, Anatoly > <anatoly.bura...@intel.com>; Song, Keesang <keesang.s...@amd.com>; > jerinjac...@gmail.com; Ananyev, Konstantin <konstantin.anan...@intel.com>; > Richardson, Bruce <bruce.richard...@intel.com>; > honnappa.nagaraha...@arm.com; Ruifeng Wang <ruifeng.w...@arm.com>; > David Christensen <d...@linux.vnet.ibm.com>; david.march...@redhat.com; > step...@networkplumber.org > Subject: Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy > support for AMD platform > > On 2021-10-27 13:03, Van Haaren, Harry wrote: > >> -----Original Message-----
<snip> Hi Mattias, > > 6) What is the use-case for this? When would a user *want* to use this > > instead > of rte_memcpy()? > > If the data being loaded is relevant to datapath/packets, presumably other > packets might require the > > loaded data, so temporal (normal) loads should be used to cache the source > data? > > > I'm not sure if your first question is rhetorical or not, but a memcpy() > in a NT variant is certainly useful. One use case for a memcpy() with > temporal loads and non-temporal stores is if you need to archive packet > payload for (distant, potential) future use, and want to avoid causing > unnecessary LLC evictions while doing so. Yes I agree that there are certainly benefits in using cache-locality hints. There is an open question around if the src or dst or both are non-temporal. In the implementation of this patch, the NT/T type of store is reversed from your use-case: 1) Loads are NT (so loaded data is not cached for future packets) 2) Stores are T (so copied/dst data is now resident in L1/L2) In theory there might even be valid uses for this type of memcpy where loaded data is not needed again soon and stored data is referenced again soon, although I cannot think of any here while typing this mail.. I think some use-case examples, and clear documentation on when/how to choose between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is required to progress this patch. Assuming a strong use-case exists, and it can be clearly indicators to users of DPDK APIs which rte_memcpy() to use, we can look at technical details around enabling the implementation. -Harry <snip remaining points>