https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
Bug ID: 99754 Summary: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: e...@coeus-group.com Target Milestone: --- Created attachment 50470 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50470&action=edit Trivial patch _mm_loadu_si16 and _mm_loadu_si32 were implemented in GCC 11, but incorrectly. The value pointed to by the argument is supposed to go in the first element, but _mm_set_epi16 / _mm_set_epi32 reverse the argument order so in GCC they go in the *last* elemement. The most straightforward solution would be to change the _mm_set_* calls so the input is used for the last argument instead of the first (patch attached). FWIW, here is LLVM's implementation: <https://github.com/llvm/llvm-project/blob/a76d0207d5f94af698525d7dc1f0953ed35901a6/clang/lib/Headers/emmintrin.h#L1670-L1710>. I've verified that LLVM's implementation matches ICC's.