On Wed, May 15, 2019 at 2:29 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu" <hjl.to...@gmail.com> writes: > > On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu <hjl.to...@gmail.com> wrote: > >> > >> Standard scalar operation patterns which preserve the rest of the vector > >> look like > >> > >> (vec_merge:V2DF > >> (vec_duplicate:V2DF > >> (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ]) > >> (parallel [ (const_int 0 [0])])) > >> (reg:DF 87)) > >> (reg/v:V2DF 85 [ x ]) > >> (const_int 1 [0x1])])) > >> > >> Add such pattens to i386 backend and convert VEC_CONCAT patterns to > >> standard standard scalar operation patterns. > > It looks like there's some variety in the patterns used, e.g.: > > (define_insn > "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>" > [(set (match_operand:VF_128 0 "register_operand" "=x,v") > (vec_merge:VF_128 > (smaxmin:VF_128 > (match_operand:VF_128 1 "register_operand" "0,v") > (match_operand:VF_128 2 "vector_operand" > "xBm,<round_saeonly_scalar_constraint>")) > (match_dup 1) > (const_int 1)))] > "TARGET_SSE" > "@ > <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2} > v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_scalar_mask_op3>%2, > %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, > %<iptr>2<round_saeonly_scalar_mask_op3>}" > [(set_attr "isa" "noavx,avx") > (set_attr "type" "sse") > (set_attr "btver2_sse_attr" "maxmin") > (set_attr "prefix" "<round_saeonly_scalar_prefix>") > (set_attr "mode" "<ssescalarmode>")]) > > makes the operand a full vector operation, which seems simpler.
This pattern is used to implement scalar smaxmin intrinsics. > The above would then be: > > (vec_merge:V2DF > (op:V2DF > (reg:V2DF 85) > (vec_duplicate:V2DF (reg:DF 87))) > (reg/v:V2DF 85 [ x ]) > (const_int 1 [0x1])])) > > I guess technically the two have different faulting behaviour though, > since the smaxmin gets applied to all elements, not just element 0. This is the issue. We don't use the correct mode for scalar instructions: --- #include <immintrin.h> __m128d foo1 (__m128d x, double *p) { __m128d y = _mm_load_sd (p); return _mm_max_pd (x, y); } --- movq (%rdi), %xmm1 maxpd %xmm1, %xmm0 ret Here is the updated patch to add standard floating point scalar operation patterns to i386 backend. Then we can do --- #include <immintrin.h> extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _new_mm_max_pd (__m128d __A, __m128d __B) { __A[0] = __A[0] > __B[0] ? __A[0] : __B[0]; return __A; } __m128d foo2 (__m128d x, double *p) { __m128d y = _mm_load_sd (p); return _new_mm_max_pd (x, y); } maxsd (%rdi), %xmm0 ret We should use generic vector operations to implement i386 intrinsics as much as we can. > The patch seems very specific. E.g. why just PLUS, MINUS, MULT and DIV? This patch only adds +, -, *, /, > and <. We can add more if there are testcases for them. > Thanks, > Richard > > > >> > >> gcc/ > >> > >> PR target/54855 > >> * simplify-rtx.c (simplify_binary_operation_1): Convert > >> VEC_CONCAT patterns to standard standard scalar operation > >> patterns. > >> * config/i386/sse.md (*<sse>_vm<plusminus_insn><mode>3): New. > >> (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise. > >> > >> gcc/testsuite/ > >> > >> PR target/54855 > >> * gcc.target/i386/pr54855-1.c: New test. > >> * gcc.target/i386/pr54855-2.c: Likewise. > >> * gcc.target/i386/pr54855-3.c: Likewise. > >> * gcc.target/i386/pr54855-4.c: Likewise. > >> * gcc.target/i386/pr54855-5.c: Likewise. > >> * gcc.target/i386/pr54855-6.c: Likewise. > >> * gcc.target/i386/pr54855-7.c: Likewise. > > > > PING: > > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html Thanks. -- H.J.
From 5d91bf264c89541a79ca8f9121264416ce307420 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Sun, 3 Feb 2019 09:16:23 -0800 Subject: [PATCH] i386: Generate standard floating point scalar operation patterns Standard floating point scalar operation patterns for combiner, which preserve the rest of the vector, look like (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 87)) (reg/v:V2DF 85 [ x ]) (const_int 1 [0x1])])) and (vec_merge:V2DF (vec_duplicate:V2DF (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ]) (parallel [ (const_int 0 [0])])) (reg:DF 87)) (reg/v:V2DF 85 [ x ]) (const_int 1 [0x1])])) This patch adds and generates such standard floating point scalar operation patterns for +, -, *, /, > and <. Tested on x86-64. gcc/ PR target/54855 * config/i386/i386-expand.c (ix86_expand_vector_set): Generate standard scalar operation pattern for V2DF. * config/i386/sse.md (*<sse>_vm<plusminus_insn><mode>3): New. (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise. (*ieee_<ieee_maxmin><mode>3): Likewise. (vec_setv2df_0): Likewise. gcc/testsuite/ PR target/54855 * gcc.target/i386/pr54855-1.c: New test. * gcc.target/i386/pr54855-2.c: Likewise. * gcc.target/i386/pr54855-3.c: Likewise. * gcc.target/i386/pr54855-4.c: Likewise. * gcc.target/i386/pr54855-5.c: Likewise. * gcc.target/i386/pr54855-6.c: Likewise. * gcc.target/i386/pr54855-7.c: Likewise. * gcc.target/i386/pr54855-8.c: Likewise. * gcc.target/i386/pr54855-9.c: Likewise. * gcc.target/i386/pr54855-10.c: Likewise. --- gcc/config/i386/i386-expand.c | 12 +++ gcc/config/i386/sse.md | 88 ++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr54855-1.c | 16 ++++ gcc/testsuite/gcc.target/i386/pr54855-10.c | 13 ++++ gcc/testsuite/gcc.target/i386/pr54855-2.c | 15 ++++ gcc/testsuite/gcc.target/i386/pr54855-3.c | 14 ++++ gcc/testsuite/gcc.target/i386/pr54855-4.c | 14 ++++ gcc/testsuite/gcc.target/i386/pr54855-5.c | 16 ++++ gcc/testsuite/gcc.target/i386/pr54855-6.c | 14 ++++ gcc/testsuite/gcc.target/i386/pr54855-7.c | 14 ++++ gcc/testsuite/gcc.target/i386/pr54855-8.c | 14 ++++ gcc/testsuite/gcc.target/i386/pr54855-9.c | 14 ++++ 12 files changed, 244 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-9.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 87e0973e1ca..18ab22cacb0 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -14092,6 +14092,17 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) return; case E_V2DFmode: + /* NB: For ELT == 0, use standard scalar operation patterns which + preserve the rest of the vector for combiner: + + (vec_merge:V2DF + (vec_duplicate:V2DF (reg:DF)) + (reg:V2DF) + (const_int 1)) + */ + if (elt == 0) + goto do_vec_merge; + { rtx op0, op1; @@ -14389,6 +14400,7 @@ quarter: } else if (use_vec_merge) { +do_vec_merge: tmp = gen_rtx_VEC_DUPLICATE (mode, val); tmp = gen_rtx_VEC_MERGE (mode, tmp, target, GEN_INT (HOST_WIDE_INT_1U << elt)); diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 677e7023eb2..f36537dfb3f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1802,6 +1802,28 @@ (set_attr "type" "sseadd") (set_attr "mode" "<MODE>")]) +;; Standard scalar operation patterns which preserve the rest of the +;; vector for combiner. +(define_insn "*<sse>_vm<plusminus_insn><mode>3" + [(set (match_operand:VF_128 0 "register_operand" "=x,v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (plusminus:<ssescalarmode> + (vec_select:<ssescalarmode> + (match_operand:VF_128 1 "register_operand" "0,v") + (parallel [(const_int 0)])) + (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm"))) + (match_dup 1) + (const_int 1)))] + "TARGET_SSE" + "@ + <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2} + v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "<ssescalarmode>")]) + (define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 @@ -1856,6 +1878,29 @@ (set_attr "type" "ssemul") (set_attr "mode" "<MODE>")]) +;; Standard scalar operation patterns which preserve the rest of the +;; vector for combiner. +(define_insn "*<sse>_vm<multdiv_mnemonic><mode>3" + [(set (match_operand:VF_128 0 "register_operand" "=x,v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (multdiv:<ssescalarmode> + (vec_select:<ssescalarmode> + (match_operand:VF_128 1 "register_operand" "0,v") + (parallel [(const_int 0)])) + (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm"))) + (match_dup 1) + (const_int 1)))] + "TARGET_SSE" + "@ + <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2} + v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sse<multdiv_mnemonic>") + (set_attr "prefix" "orig,vex") + (set_attr "btver2_decode" "direct,double") + (set_attr "mode" "<ssescalarmode>")]) + (define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 @@ -2205,6 +2250,30 @@ (set_attr "prefix" "<mask_prefix3>") (set_attr "mode" "<MODE>")]) +;; Standard scalar operation patterns which preserve the rest of the +;; vector for combiner. +(define_insn "*ieee_<ieee_maxmin><mode>3" + [(set (match_operand:VF_128 0 "register_operand" "=x,v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (unspec:<ssescalarmode> + [(vec_select:<ssescalarmode> + (match_operand:VF_128 1 "register_operand" "0,v") + (parallel [(const_int 0)])) + (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")] + IEEE_MAXMIN)) + (match_dup 1) + (const_int 1)))] + "TARGET_SSE" + "@ + <ieee_maxmin><ssescalarmodesuffix>\t{%2, %0|%0, %2} + v<ieee_maxmin><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "btver2_sse_attr" "maxmin") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "<ssescalarmode>")]) + (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 @@ -7881,6 +7950,25 @@ [(set (match_dup 0) (match_dup 1))] "operands[0] = adjust_address (operands[0], <ssescalarmode>mode, 0);") +;; Standard scalar operation patterns which preserve the rest of the +;; vector for combiner. +(define_insn "vec_setv2df_0" + [(set (match_operand:V2DF 0 "register_operand" "=x,v,x,v") + (vec_merge:V2DF + (vec_duplicate:V2DF + (match_operand:DF 2 "nonimmediate_operand" " x,v,m,m")) + (match_operand:V2DF 1 "register_operand" " 0,v,0,v") + (const_int 1)))] + "TARGET_SSE2" + "@ + movsd\t{%2, %0|%0, %2} + vmovsd\t{%2, %1, %0|%0, %1, %2} + movlpd\t{%2, %0|%0, %2} + vmovlpd\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "mode" "DF")]) + (define_expand "vec_set<mode>" [(match_operand:V 0 "register_operand") (match_operand:<ssescalarmode> 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/pr54855-1.c b/gcc/testsuite/gcc.target/i386/pr54855-1.c new file mode 100644 index 00000000000..693aafa09ab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "addsd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double __v2df __attribute__ ((__vector_size__ (16))); +typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); + +__m128d +_mm_add_sd (__m128d x, __m128d y) +{ + __m128d z = __extension__ (__m128d)(__v2df) + { (((__v2df) x)[0] + ((__v2df) y)[0]), ((__v2df) x)[1] }; + return z; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-10.c b/gcc/testsuite/gcc.target/i386/pr54855-10.c new file mode 100644 index 00000000000..9e08a85723e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-10.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "movlpd" 1 } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double vec __attribute__((vector_size(16))); + +vec +foo (vec x, double *a) +{ + x[0] = *a; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-2.c b/gcc/testsuite/gcc.target/i386/pr54855-2.c new file mode 100644 index 00000000000..20c6f8eb529 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "mulsd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double __v2df __attribute__ ((__vector_size__ (16))); + +__v2df +_mm_mul_sd (__v2df x, __v2df y) +{ + __v2df z = x; + z[0] = x[0] * y[0]; + return z; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-3.c b/gcc/testsuite/gcc.target/i386/pr54855-3.c new file mode 100644 index 00000000000..3c15dfc93d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-3.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "subsd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double vec __attribute__((vector_size(16))); + +vec +foo (vec x) +{ + x[0] -= 1.; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-4.c b/gcc/testsuite/gcc.target/i386/pr54855-4.c new file mode 100644 index 00000000000..32eb28e852a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-4.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "subsd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double vec __attribute__((vector_size(16))); + +vec +foo (vec x, double a) +{ + x[0] -= a; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-5.c b/gcc/testsuite/gcc.target/i386/pr54855-5.c new file mode 100644 index 00000000000..e06999074e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-5.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "subsd" 1 } } */ +/* { dg-final { scan-assembler-times "mulpd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double __v2df __attribute__ ((__vector_size__ (16))); + +__v2df +foo (__v2df x, __v2df y) +{ + x[0] -= y[0]; + x *= y; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-6.c b/gcc/testsuite/gcc.target/i386/pr54855-6.c new file mode 100644 index 00000000000..8f44d17b6d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-6.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "divss" 1 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movss" } } */ + +typedef float vec __attribute__((vector_size(16))); + +vec +foo (vec x, float f) +{ + x[0] /= f; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-7.c b/gcc/testsuite/gcc.target/i386/pr54855-7.c new file mode 100644 index 00000000000..a551bd5c92f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-7.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "divss" 1 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movss" } } */ + +typedef float vec __attribute__((vector_size(16))); + +vec +foo (vec x) +{ + x[0] /= 2.1f; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-8.c b/gcc/testsuite/gcc.target/i386/pr54855-8.c new file mode 100644 index 00000000000..7602dc293a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "maxsd" 1 } } */ +/* { dg-final { scan-assembler-not "movapd" } } */ +/* { dg-final { scan-assembler-not "movsd" } } */ + +typedef double vec __attribute__((vector_size(16))); + +vec +foo (vec x, double a) +{ + x[0] = x[0] > a ? x[0] : a; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/pr54855-9.c b/gcc/testsuite/gcc.target/i386/pr54855-9.c new file mode 100644 index 00000000000..40add5f6763 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-9.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mfpmath=sse" } */ +/* { dg-final { scan-assembler-times "minss" 1 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movss" } } */ + +typedef float vec __attribute__((vector_size(16))); + +vec +foo (vec x, float a) +{ + x[0] = x[0] < a ? x[0] : a; + return x; +} -- 2.20.1