https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93588

--- Comment #4 from Alex Reinking <alex.reinking at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> >The intrinsics are supposed to map to the corresponding assembly 
> >instructions, no?
> NO, it is an interface to what the assembly instructions do; not always 1-1
> mapping.  Have you benchmarked both versions?

Thanks for clarifying that for me... I'll try to figure out inline assembly in
the mean time. I have benchmarked both versions.

> Also what happens if you use -march=native on your machines?  Do the
> benchmark for that version is the best?

I have done march=native on both machines. The Xeon is haswell, which implies
march/mtune=haswell; it performs poorly without -mtune=skylake additionally.
The i9 is skylake-x and tunes like skylake and performs well.

The code that uses vmovupd runs twice as fast on both machines.

Reply via email to