On 2/16/19, H.J. Lu <hjl.to...@gmail.com> wrote: > On x86-64, since __m64 is returned and passed in XMM registers, we can > emulate MMX intrinsics with SSE instructions. To support it, we added > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2) > > ;; Define instruction set of MMX instructions > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" > (const_string "base")) > > (eq_attr "mmx_isa" "native") > (symbol_ref "!TARGET_MMX_WITH_SSE") > (eq_attr "mmx_isa" "x64") > (symbol_ref "TARGET_MMX_WITH_SSE") > (eq_attr "mmx_isa" "x64_avx") > (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX") > (eq_attr "mmx_isa" "x64_noavx") > (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX") > > We added SSE emulation to MMX patterns and disabled MMX alternatives with > TARGET_MMX_WITH_SSE. > > Most of MMX instructions have equivalent SSE versions and results of some > SSE versions need to be reshuffled to the right order for MMX. Thee are > couple tricky cases: > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent. We emulate MMX > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the > mask operand and handle unmapped bits 64:127 at memory address by > adjusting source and mask operands together with memory address. > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available > in 64-bit mode. > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index. > SSE emulation must clear the bit 4 in the shuffle control mask. > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve > the upper 64 bits of destination XMM register. > > Tests are also added to check each SSE emulation of MMX intrinsics. > > There are no regressions on i686 and x86-64. For x86-64, GCC is also > tested with > > --with-arch=native --with-cpu=native > > on AVX2 and AVX512F machines.
An idea that would take patch a step further also on 32 bit targets: *Assuming* that operations on XMM registers are as fast (or perhaps faster) than operations on MMX registers, we can change mmx_isa attribute in e.g. + "@ + p<logic>\t{%2, %0|%0, %2} + p<logic>\t{%2, %0|%0, %2} + vp<logic>\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") to: [(set_attr "isa" "*,noavx,avx") (set_attr "mmx_isa" "native,*,*")] So, for x86_64 everything stays the same, but for x86_32 we now allow intrinsics to use xmm registers in addition to mmx registers. We can't disable MMX for x64_32 anyway due to ISA constraints (and some tricky cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq & similar, which are more efficient with MMX regs), but RA has much more freedom to allocate the most effective register set even for 32bit targets. WDYT? Uros. > > H.J. Lu (40): > i386: Allow MMX register modes in SSE registers > i386: Emulate MMX packsswb/packssdw/packuswb with SSE2 > i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX > i386: Emulate MMX plusminus/sat_plusminus with SSE > i386: Emulate MMX mulv4hi3 with SSE > i386: Emulate MMX smulv4hi3_highpart with SSE > i386: Emulate MMX mmx_pmaddwd with SSE > i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE > i386: Emulate MMX <any_logic><mode>3 with SSE > i386: Emulate MMX mmx_andnot<mode>3 with SSE > i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE > i386: Emulate MMX vec_dupv2si with SSE > i386: Emulate MMX pshufw with SSE > i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE > i386: Emulate MMX sse_cvtpi2ps with SSE > i386: Emulate MMX mmx_pextrw with SSE > i386: Emulate MMX mmx_pinsrw with SSE > i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE > i386: Emulate MMX mmx_pmovmskb with SSE > i386: Emulate MMX mmx_umulv4hi3_highpart with SSE > i386: Emulate MMX maskmovq with SSE2 maskmovdqu > i386: Emulate MMX mmx_uavgv8qi3 with SSE > i386: Emulate MMX mmx_uavgv4hi3 with SSE > i386: Emulate MMX mmx_psadbw with SSE > i386: Emulate MMX movntq with SSE2 movntidi > i386: Emulate MMX umulv1siv1di3 with SSE2 > i386: Make _mm_empty () as NOP for TARGET_MMX_WITH_SSE > i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE > i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE > i386: Emulate MMX ssse3_pmaddubsw with SSE > i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE > i386: Emulate MMX pshufb with SSE version > i386: Emulate MMX ssse3_psign<mode>3 with SSE > i386: Emulate MMX ssse3_palignrdi with SSE > i386: Emulate MMX abs<mode>2 with SSE > i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE > i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE > i386: Allow MMX intrinsic emulation with SSE > i386: Enable TM MMX intrinsics with SSE2 > i386: Add tests for MMX intrinsic emulations with SSE > > Uros Bizjak (1): > Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE > > gcc/config/i386/constraints.md | 6 + > gcc/config/i386/i386-builtin.def | 126 +-- > gcc/config/i386/i386-c.c | 2 + > gcc/config/i386/i386-protos.h | 4 + > gcc/config/i386/i386.c | 189 +++- > gcc/config/i386/i386.h | 2 + > gcc/config/i386/i386.md | 17 + > gcc/config/i386/mmintrin.h | 12 +- > gcc/config/i386/mmx.md | 984 ++++++++++++------ > gcc/config/i386/predicates.md | 7 + > gcc/config/i386/sse.md | 359 +++++-- > gcc/config/i386/xmmintrin.h | 61 ++ > gcc/testsuite/gcc.target/i386/mmx-vals.h | 77 ++ > gcc/testsuite/gcc.target/i386/pr82483-1.c | 2 +- > gcc/testsuite/gcc.target/i386/pr82483-2.c | 2 +- > gcc/testsuite/gcc.target/i386/sse2-mmx-10.c | 43 + > gcc/testsuite/gcc.target/i386/sse2-mmx-11.c | 39 + > gcc/testsuite/gcc.target/i386/sse2-mmx-12.c | 42 + > gcc/testsuite/gcc.target/i386/sse2-mmx-13.c | 40 + > gcc/testsuite/gcc.target/i386/sse2-mmx-14.c | 31 + > gcc/testsuite/gcc.target/i386/sse2-mmx-15.c | 36 + > gcc/testsuite/gcc.target/i386/sse2-mmx-16.c | 40 + > gcc/testsuite/gcc.target/i386/sse2-mmx-17.c | 51 + > gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c | 14 + > gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c | 14 + > gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c | 7 + > gcc/testsuite/gcc.target/i386/sse2-mmx-2.c | 12 + > gcc/testsuite/gcc.target/i386/sse2-mmx-20.c | 12 + > gcc/testsuite/gcc.target/i386/sse2-mmx-21.c | 13 + > gcc/testsuite/gcc.target/i386/sse2-mmx-22.c | 14 + > gcc/testsuite/gcc.target/i386/sse2-mmx-3.c | 13 + > gcc/testsuite/gcc.target/i386/sse2-mmx-4.c | 4 + > gcc/testsuite/gcc.target/i386/sse2-mmx-5.c | 11 + > gcc/testsuite/gcc.target/i386/sse2-mmx-6.c | 11 + > gcc/testsuite/gcc.target/i386/sse2-mmx-7.c | 13 + > gcc/testsuite/gcc.target/i386/sse2-mmx-8.c | 4 + > gcc/testsuite/gcc.target/i386/sse2-mmx-9.c | 79 ++ > .../gcc.target/i386/sse2-mmx-cvtpi2ps.c | 43 + > .../gcc.target/i386/sse2-mmx-cvtps2pi.c | 36 + > .../gcc.target/i386/sse2-mmx-cvttps2pi.c | 36 + > .../gcc.target/i386/sse2-mmx-maskmovq.c | 99 ++ > .../gcc.target/i386/sse2-mmx-packssdw.c | 52 + > .../gcc.target/i386/sse2-mmx-packsswb.c | 52 + > .../gcc.target/i386/sse2-mmx-packuswb.c | 52 + > .../gcc.target/i386/sse2-mmx-paddb.c | 48 + > .../gcc.target/i386/sse2-mmx-paddd.c | 48 + > .../gcc.target/i386/sse2-mmx-paddq.c | 43 + > .../gcc.target/i386/sse2-mmx-paddsb.c | 48 + > .../gcc.target/i386/sse2-mmx-paddsw.c | 48 + > .../gcc.target/i386/sse2-mmx-paddusb.c | 48 + > .../gcc.target/i386/sse2-mmx-paddusw.c | 48 + > .../gcc.target/i386/sse2-mmx-paddw.c | 48 + > gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c | 44 + > .../gcc.target/i386/sse2-mmx-pandn.c | 44 + > .../gcc.target/i386/sse2-mmx-pavgb.c | 52 + > .../gcc.target/i386/sse2-mmx-pavgw.c | 52 + > .../gcc.target/i386/sse2-mmx-pcmpeqb.c | 48 + > .../gcc.target/i386/sse2-mmx-pcmpeqd.c | 48 + > .../gcc.target/i386/sse2-mmx-pcmpeqw.c | 48 + > .../gcc.target/i386/sse2-mmx-pcmpgtb.c | 48 + > .../gcc.target/i386/sse2-mmx-pcmpgtd.c | 48 + > .../gcc.target/i386/sse2-mmx-pcmpgtw.c | 48 + > .../gcc.target/i386/sse2-mmx-pextrw.c | 59 ++ > .../gcc.target/i386/sse2-mmx-pinsrw.c | 61 ++ > .../gcc.target/i386/sse2-mmx-pmaddwd.c | 47 + > .../gcc.target/i386/sse2-mmx-pmaxsw.c | 48 + > .../gcc.target/i386/sse2-mmx-pmaxub.c | 48 + > .../gcc.target/i386/sse2-mmx-pminsw.c | 48 + > .../gcc.target/i386/sse2-mmx-pminub.c | 48 + > .../gcc.target/i386/sse2-mmx-pmovmskb.c | 46 + > .../gcc.target/i386/sse2-mmx-pmulhuw.c | 51 + > .../gcc.target/i386/sse2-mmx-pmulhw.c | 53 + > .../gcc.target/i386/sse2-mmx-pmullw.c | 52 + > .../gcc.target/i386/sse2-mmx-pmuludq.c | 47 + > gcc/testsuite/gcc.target/i386/sse2-mmx-por.c | 44 + > .../gcc.target/i386/sse2-mmx-psadbw.c | 58 ++ > .../gcc.target/i386/sse2-mmx-pshufw.c | 248 +++++ > .../gcc.target/i386/sse2-mmx-pslld.c | 52 + > .../gcc.target/i386/sse2-mmx-pslldi.c | 153 +++ > .../gcc.target/i386/sse2-mmx-psllq.c | 47 + > .../gcc.target/i386/sse2-mmx-psllqi.c | 245 +++++ > .../gcc.target/i386/sse2-mmx-psllw.c | 52 + > .../gcc.target/i386/sse2-mmx-psllwi.c | 105 ++ > .../gcc.target/i386/sse2-mmx-psrad.c | 52 + > .../gcc.target/i386/sse2-mmx-psradi.c | 153 +++ > .../gcc.target/i386/sse2-mmx-psraw.c | 52 + > .../gcc.target/i386/sse2-mmx-psrawi.c | 105 ++ > .../gcc.target/i386/sse2-mmx-psrld.c | 52 + > .../gcc.target/i386/sse2-mmx-psrldi.c | 153 +++ > .../gcc.target/i386/sse2-mmx-psrlq.c | 47 + > .../gcc.target/i386/sse2-mmx-psrlqi.c | 245 +++++ > .../gcc.target/i386/sse2-mmx-psrlw.c | 52 + > .../gcc.target/i386/sse2-mmx-psrlwi.c | 105 ++ > .../gcc.target/i386/sse2-mmx-psubb.c | 48 + > .../gcc.target/i386/sse2-mmx-psubd.c | 48 + > .../gcc.target/i386/sse2-mmx-psubq.c | 43 + > .../gcc.target/i386/sse2-mmx-psubusb.c | 48 + > .../gcc.target/i386/sse2-mmx-psubusw.c | 48 + > .../gcc.target/i386/sse2-mmx-psubw.c | 48 + > .../gcc.target/i386/sse2-mmx-punpckhbw.c | 53 + > .../gcc.target/i386/sse2-mmx-punpckhdq.c | 47 + > .../gcc.target/i386/sse2-mmx-punpckhwd.c | 49 + > .../gcc.target/i386/sse2-mmx-punpcklbw.c | 53 + > .../gcc.target/i386/sse2-mmx-punpckldq.c | 47 + > .../gcc.target/i386/sse2-mmx-punpcklwd.c | 49 + > gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c | 44 + > gcc/testsuite/gcc.target/i386/sse2-mmx.c | 1 - > 112 files changed, 6418 insertions(+), 493 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-22.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c > > -- > 2.20.1 > >