https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93588
--- Comment #4 from Alex Reinking <alex.reinking at gmail dot com> --- (In reply to Andrew Pinski from comment #1) > >The intrinsics are supposed to map to the corresponding assembly > >instructions, no? > NO, it is an interface to what the assembly instructions do; not always 1-1 > mapping. Have you benchmarked both versions? Thanks for clarifying that for me... I'll try to figure out inline assembly in the mean time. I have benchmarked both versions. > Also what happens if you use -march=native on your machines? Do the > benchmark for that version is the best? I have done march=native on both machines. The Xeon is haswell, which implies march/mtune=haswell; it performs poorly without -mtune=skylake additionally. The i9 is skylake-x and tunes like skylake and performs well. The code that uses vmovupd runs twice as fast on both machines.