> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > Sent: Monday, 25 July 2022 03.18 >
[...] > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed > to be arch > > specific limitation, that we probably want to hide, no? Correct. However, optional hints for optimization purposes will be available. And it is up to the architecture specific implementation to make the best use of these hints, or just ignore them. > > Inside the function can check alignment of both src and dst and > decide should it > > use NT load/store instructions or just do normal copy. > IMO, the normal copy should not be done by this API under any > conditions. Why not let the application call memcpy/rte_memcpy when the > NT copy is not applicable? It helps the programmer to understand and > debug the issues much easier. Yes, the programmer must choose between normal memcpy() and non-temporal rte_memcpy_nt(). I am offering new functions, not modifying memcpy() or rte_memcpy(). And rte_memcpy_nt() will silently fall back to normal memcpy() if non-temporal copying is unavailable, e.g. on POWER and RISC-V architectures, which don't have NT load/store instructions.