> Calls to rte_memcpy for 1 < n < 16 could result in unaligned > loads/stores, which is undefined behaviour according to the C > standard, and strict aliasing violations. > > The code was changed to use a packed structure that allows aliasing > (using the __may_alias__ attribute) to perform the load/store > operations. This results in code that has the same performance as the > original code and that is also C standards-compliant. > > Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time") > Cc: Xiaoyun Li <xiaoyun...@intel.com> > Cc: sta...@dpdk.org > > Signed-off-by: Luc Pelletier <lucp.at.w...@gmail.com> > --- > v7: > * Fix coding style issue by adding new __rte_may_alias macro rather > than directly use __attribute__ > > v6: > * Refocus to fix strict aliasing problems discovered following > discussions in this thread. > * Modified the code to use __may_alias__ and packed structure. This fixes > both the undefined behaviour of unaligned access (which is not the main > concern), and also fixes the strict aliasing violations (which can cause > major bugs, as demonstrated in a previous message in this thread). > * Renamed new function from rte_mov15_or_less_unaligned to > rte_mov15_or_less. > * Modified code that copies <= 15 bytes to call rte_mov15_or_less. > > v5: > * Replaced assembly with pure C code that uses a packed structure to make > unaligned loads conform to C standard. > > v4: > * Added volatile to asm statements, which is required under gcc. > > v3: > * Removed for loop and went back to using assembly. > > v2: > * Replaced assembly with a regular for loop that copies bytes one by > one. > > v1: > * Fix undefined behaviour of unaligned stores/loads by using assembly > to perform stores/loads.
LGTM, thanks for highlighting and fixing the problem. Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com> Tested-by: Konstantin Ananyev <konstantin.anan...@intel.com> As a side note, we probably need to check other similar places in DPDK code.