> > Hi Mattias, > > > > 6) What is the use-case for this? When would a user *want* to use this > > > instead > > of rte_memcpy()? > > > If the data being loaded is relevant to datapath/packets, presumably other > > packets might require the > > > loaded data, so temporal (normal) loads should be used to cache the source > > data? > > > > > > I'm not sure if your first question is rhetorical or not, but a memcpy() > > in a NT variant is certainly useful. One use case for a memcpy() with > > temporal loads and non-temporal stores is if you need to archive packet > > payload for (distant, potential) future use, and want to avoid causing > > unnecessary LLC evictions while doing so. > > Yes I agree that there are certainly benefits in using cache-locality hints. > There is an open question around if the src or dst or both are non-temporal. > > In the implementation of this patch, the NT/T type of store is reversed from > your use-case: > 1) Loads are NT (so loaded data is not cached for future packets) > 2) Stores are T (so copied/dst data is now resident in L1/L2) > > In theory there might even be valid uses for this type of memcpy where loaded > data is not needed again soon and stored data is referenced again soon, > although I cannot think of any here while typing this mail.. > > I think some use-case examples, and clear documentation on when/how to choose > between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is > required > to progress this patch. > > Assuming a strong use-case exists, and it can be clearly indicators to users > of DPDK APIs which > rte_memcpy() to use, we can look at technical details around enabling the > implementation. >
+1 here. Function behaviour and restrictions (src parameter needs to be 16/32 B aligned, etc.), along with expected usage scenarios have to be documented properly. Again, as Harry pointed out, I don't see any AMD specific instructions in this function, so presumably such function can go into __AVX2__ code block and no new defines will be required.