27/10/2021 16:10, Van Haaren, Harry: > From: Aman Kumar <aman.ku...@vvdntech.in> > On Wed, Oct 27, 2021 at 5:53 PM Ananyev, Konstantin > <mailto:konstantin.anan...@intel.com> wrote > > > > Hi Mattias, > > > > > > 6) What is the use-case for this? When would a user *want* to use this > > > > instead > > > of rte_memcpy()? > > > > If the data being loaded is relevant to datapath/packets, presumably > > > > other > > > packets might require the > > > > loaded data, so temporal (normal) loads should be used to cache the > > > > source > > > data? > > > > > > > > > I'm not sure if your first question is rhetorical or not, but a memcpy() > > > in a NT variant is certainly useful. One use case for a memcpy() with > > > temporal loads and non-temporal stores is if you need to archive packet > > > payload for (distant, potential) future use, and want to avoid causing > > > unnecessary LLC evictions while doing so. > > > > Yes I agree that there are certainly benefits in using cache-locality hints. > > There is an open question around if the src or dst or both are non-temporal. > > > > In the implementation of this patch, the NT/T type of store is reversed > > from your use-case: > > 1) Loads are NT (so loaded data is not cached for future packets) > > 2) Stores are T (so copied/dst data is now resident in L1/L2) > > > > In theory there might even be valid uses for this type of memcpy where > > loaded > > data is not needed again soon and stored data is referenced again soon, > > although I cannot think of any here while typing this mail.. > > > > I think some use-case examples, and clear documentation on when/how to > > choose > > between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is > > required > > to progress this patch. > > > > Assuming a strong use-case exists, and it can be clearly indicators to > > users of DPDK APIs which > > rte_memcpy() to use, we can look at technical details around enabling the > > implementation. > > > > [Konstantin wrote]: > +1 here. > Function behaviour and restrictions (src parameter needs to be 16/32 B > aligned, etc.), > along with expected usage scenarios have to be documented properly. > Again, as Harry pointed out, I don't see any AMD specific instructions in > this function, > so presumably such function can go into __AVX2__ code block and no new > defines will > be required. > > > [Aman wrote]: > Agreed that APIs are generic but we've kept under an AMD flag for a simple > reason that it is NOT tested on any other platform. > A use-case on how to use this was planned earlier for mlx5 pmd but dropped in > this version of patch as the data path of mlx5 is going to be refactored soon > and may not be useful for future versions of mlx5 (>22.02). > Ref link: > https://patchwork.dpdk.org/project/dpdk/patch/20211019104724.19416-2-aman.ku...@vvdntech.in/(we've > plan to adapt this into future version) > The patch in the link basically enhances mlx5 mprq implementation for our > specific use-case and with 128B packet size, we achieve ~60% better perf. We > understand the use of this copy function should be documented which we shall > plan along with few other platform specific optimizations in future versions > of DPDK. As this does not conflict with other platforms, can we still keep > under AMD flag for now as suggested by Thomas?
I said I could merge if there is no objection. I've overlooked that it's adding completely new functions in the API. And the comments go in the direction of what I asked in previous version: what is specific to AMD here? Now seeing the valid objections, I agree it should be reworked. We must provide API to applications which is generic, stable and well documented. > [HvH wrote]: > As an open-source community, any contributions should aim to improve the > whole. > In the past, numerous improvements have been merged to DPDK that improve > performance. > Sometimes these are architecture specific (x86/arm/ppc) sometimes the are ISA > specific (SSE, AVX512, NEON). > > I am not familiar with any cases in DPDK, where there is a #ifdef based on a > *specific platform*. > A quick "grep" through the "dpdk/lib" directory does not show any place where > PMD or generic code > has been explicitly optimized for a *specific platform*. > > Obviously, in cases where ISA either exists or does not exist, yes there is > an optimization to enable it. > But this is not exposed as a top-level compile-time option, it uses runtime > CPU ISA detection. > > Please take a step back from the code, and look at what this patch asks of > DPDK: > "Please accept & maintain these changes upstream, which benefit only platform > X, even though these ISA features are also available on other platforms". > > Other patches that enhance performance of DPDK ask this: > "Please accept & maintain these changes upstream, which benefit all platforms > which have ISA capability X". > > > === Question "As this does not conflict with other platforms, can we still > keep under AMD flag for now"? > I feel the contribution is too specific to a platform. Make it generic by > enabling it at an ISA capability level. > > Please yes, contribute to the DPDK community by improving performance of a > PMD by enabling/leveraging ISA. > But do so in a way that does not benefit only a specific platform - do so in > a way that enhances all of DPDK, as > other patches have done for the DPDK that this patch is built on. > > If you have concerns that the PMD maintainers will not accept the changes due > to potential regressions on > other platforms, then discuss those, make a plan on how to performance > validate, and work to a solution. > > > === Regarding specifically the request for "can we still keep under AMD flag > for now"? > I do not believe we should introduce APIs for specific platforms. DPDK's EAL > is an abstraction layer. > The value of EAL is to provide a common abstraction. This platform-specific > flag breaks the abstraction, > and results in packaging issues, as well as API/ABI instability based on > -Dcpu_instruction_set choice. > So, no, we should not introduce APIs based on any compile-time flag. I agree