Re: VIS2 pattern review

David Miller Thu, 13 Oct 2011 12:56:25 -0700

From: Richard Henderson <r...@redhat.com>
Date: Wed, 12 Oct 2011 17:49:19 -0700


> The comment for fpmerge_vis is not correct.
> I believe that the operation is representable with
> 
>   (vec_select:V8QI
>     (vec_concat:V8QI
>       (match_operand:V4QI 1 ...)
>       (match_operand:V4QI 2 ...)
>     (parallel [
>       0 4 1 5 2 6 3 7
>       ]))
> 
> which can be used as the basis for both of the
> 
>   vec_interleave_lowv8qi
>   vec_interleave_highv8qi
> 
> named patterns.

Agreed.

> AFAICS, this needs an unspec, like fmul8x16al.
> Similarly for fmul8sux16_vis, fmuld8sux16_vis,

Yes, as we found all the partitioned multiplies need to be unspecs.

>> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
>> (define_code_attr vis3_addsub_ss_insn
>>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
>> 
>> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" 
>> "<vconstr>")
>>                              (match_operand:VASS 2 "register_operand" 
>> "<vconstr>")))]
>>   "TARGET_VIS3"
>>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")
> 
> These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Agreed.

I'm currently regstrapping the patch at the end of this mail and will
commit it to trunk if no regressions pop up.

I'll look into the rest of your feedback.  But I want to look into a more
fundamental issue with VIS support before moving much further.

I worked for several evenings on adding support for the VIS3
instructions that move directly between float and integer regs.  I tried
really hard to get the compiler to do something sensible but it's next
to impossible for two reasons:

1) We don't represent single entry vectors using vector modes, we just
   use SImode, DImode etc.

   I think this is a huge mistake, because the compiler now thinks it
   can do "SImode stuff" in the float regs.  Other backends are able
   to segregate vector vs. non-vector operations by using the single
   entry vector modes.

2) In addition to that, because of how we number the registers for
   allocation on sparc for leaf functions, the compiler starts trying
   to reload SImode and DImode values into the floating point registers
   before trying to use the non-leaf integer regs.

   Because the leaf allocation order is "leaf integer regs", "float
   regs", "non-leaf integer regs".

Even if I jacked up the register move cost for the cases where the
VIS3 instructions applied, it still did these kinds of reloads.

This also gets reload into trouble because it believes that if it can
move an address value (say Pmode == SImode) from one place to another,
then a plus on the same operands can be performed (with perhaps minor
reloading).  But that doesn't work when this "move" is "move float reg
to int reg" and therefore the operands are "%f12" and "%g3".  It
tries to do things like "(plus:SI (reg:SI %f12) (reg:SI %g3))"

All of these troubles would be eliminated if we used vector modes for
all the VIS operations instead of using SImode and DImode for the single
entry vector cases.

Unfortunately, that would involve some ABI changes for the VIS
builtins.  I'm trending towards considering just changing things
anyways since the VIS intrinsics were next to unusable beforehand.
I've scoured the net for examples of people actually using the GCC
intrinsics before all of my recent changes, and they all fall into two
categories: 1) they use inline assembler because the VIS intrinsics
don't work and 2) they try to use the intrinsics but the code is
disabled because it "doesn't work".

--------------------
Fix the RTL of some sparc VIS patterns.

        * config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
        (UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
        unspecs.
        (fpmerge_vis): Remove inaccurate comment, represent using vec_select
        of a vec_concat.
        (vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
        (fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
        Reimplement as unspecs and remove inaccurate comments.
        (vis3_shift_patname): New code attr.
        (<vis3_shift_insn><vbits>_vis): Rename to 
"v<vis3_shift_patname><mode>3".
        (vis3_addsub_ss_patname): New code attr.
        (<vis3_addsub_ss_insn><vbits>_vis): Rename to 
"<vis3_addsub_ss_patname><mode>".

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 017594f..ae36634 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,18 @@
 2011-10-12  David S. Miller  <da...@davemloft.net>
 
+       * config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
+       (UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
+       unspecs.
+       (fpmerge_vis): Remove inaccurate comment, represent using vec_select
+       of a vec_concat.
+       (vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
+       (fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
+       Reimplement as unspecs and remove inaccurate comments.
+       (vis3_shift_patname): New code attr.
+       (<vis3_shift_insn><vbits>_vis): Rename to 
"v<vis3_shift_patname><mode>3".
+       (vis3_addsub_ss_patname): New code attr.
+       (<vis3_addsub_ss_insn><vbits>_vis): Rename to 
"<vis3_addsub_ss_patname><mode>".
+
        * config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF
        to zero when assembler lacks support for such instructions.
        * config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index fa790b3..7211658 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -9492,21 +9492,21 @@ sparc_vis_init_builtins (void)
       def_builtin_const ("__builtin_vis_fchksm16", CODE_FOR_fchksm16_vis,
                         v4hi_ftype_v4hi_v4hi);
 
-      def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_fsll16_vis,
+      def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_vashlv4hi3,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_fslas16_vis,
+      def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_vssashlv4hi3,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_fsrl16_vis,
+      def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_vlshrv4hi3,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_fsra16_vis,
+      def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_vashrv4hi3,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_fsll32_vis,
+      def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_vashlv2si3,
                         v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_fslas32_vis,
+      def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_vssashlv2si3,
                         v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_fsrl32_vis,
+      def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_vlshrv2si3,
                         v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_fsra32_vis,
+      def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_vashrv2si3,
                         v2si_ftype_v2si_v2si);
 
       if (TARGET_ARCH64)
@@ -9523,21 +9523,21 @@ sparc_vis_init_builtins (void)
       def_builtin_const ("__builtin_vis_fpsub64", CODE_FOR_fpsub64_vis,
                         di_ftype_di_di);
 
-      def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_fpadds16_vis,
+      def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_ssaddv4hi,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_fpadds16s_vis,
+      def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_ssaddv2hi,
                         v2hi_ftype_v2hi_v2hi);
-      def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_fpsubs16_vis,
+      def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_sssubv4hi,
                         v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fpsubs16s", CODE_FOR_fpsubs16s_vis,
+      def_builtin_const ("__builtin_vis_fpsubs16s", CODE_FOR_sssubv2hi,
                         v2hi_ftype_v2hi_v2hi);
-      def_builtin_const ("__builtin_vis_fpadds32", CODE_FOR_fpadds32_vis,
+      def_builtin_const ("__builtin_vis_fpadds32", CODE_FOR_ssaddv2si,
                         v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fpadds32s", CODE_FOR_fpadds32s_vis,
+      def_builtin_const ("__builtin_vis_fpadds32s", CODE_FOR_ssaddsi,
                         v1si_ftype_v1si_v1si);
-      def_builtin_const ("__builtin_vis_fpsubs32", CODE_FOR_fpsubs32_vis,
+      def_builtin_const ("__builtin_vis_fpsubs32", CODE_FOR_sssubv2si,
                         v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fpsubs32s", CODE_FOR_fpsubs32s_vis,
+      def_builtin_const ("__builtin_vis_fpsubs32s", CODE_FOR_sssubsi,
                         v1si_ftype_v1si_v1si);
 
       if (TARGET_ARCH64)
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 24993fb..fb8fe8b 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -53,7 +53,7 @@
    (UNSPEC_FPACK32             41)
    (UNSPEC_FPACKFIX            42)
    (UNSPEC_FEXPAND             43)
-   (UNSPEC_FPMERGE             44)
+   (UNSPEC_MUL16AU             44)
    (UNSPEC_MUL16AL             45)
    (UNSPEC_MUL8UL              46)
    (UNSPEC_MULDUL              47)
@@ -89,6 +89,9 @@
    (UNSPEC_FHADD               83)
    (UNSPEC_FHSUB               84)
    (UNSPEC_XMUL                        85)
+   (UNSPEC_MUL8                        86)
+   (UNSPEC_MUL8SU              87)
+   (UNSPEC_MULDSU              88)
   ])
 
 (define_constants
@@ -8004,36 +8007,64 @@
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
-;; It may be possible to describe this operation as (1 indexed):
-;; (vec_select (vec_duplicate (vec_duplicate (vec_concat 1 2)))
-;;  1,5,10,14,19,23,28,32)
-;; Note that (vec_merge:V8QI [(V4QI) (V4QI)] (10101010 = 170) doesn't work
-;; because vec_merge expects all the operands to be of the same type.
 (define_insn "fpmerge_vis"
   [(set (match_operand:V8QI 0 "register_operand" "=e")
-        (unspec:V8QI [(match_operand:V4QI 1 "register_operand" "f")
-                      (match_operand:V4QI 2 "register_operand" "f")]
-         UNSPEC_FPMERGE))]
+        (vec_select:V8QI
+          (vec_concat:V8QI (match_operand:V4QI 1 "register_operand" "f")
+                           (match_operand:V4QI 2 "register_operand" "f"))
+          (parallel [(const_int 0) (const_int 4)
+                     (const_int 1) (const_int 5)
+                     (const_int 2) (const_int 6)
+                    (const_int 3) (const_int 7)])))]
  "TARGET_VIS"
  "fpmerge\t%1, %2, %0"
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
+(define_insn "vec_interleave_lowv8qi"
+  [(set (match_operand:V8QI 0 "register_operand" "=e")
+        (vec_select:V8QI
+          (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
+                            (match_operand:V8QI 2 "register_operand" "f"))
+          (parallel [(const_int 0) (const_int 8)
+                     (const_int 1) (const_int 9)
+                     (const_int 2) (const_int 10)
+                    (const_int 3) (const_int 11)])))]
+ "TARGET_VIS"
+ "fpmerge\t%L1, %L2, %0"
+ [(set_attr "type" "fga")
+  (set_attr "fptype" "double")])
+
+(define_insn "vec_interleave_highv8qi"
+  [(set (match_operand:V8QI 0 "register_operand" "=e")
+        (vec_select:V8QI
+          (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
+                            (match_operand:V8QI 2 "register_operand" "f"))
+          (parallel [(const_int 4) (const_int 12)
+                     (const_int 5) (const_int 13)
+                     (const_int 6) (const_int 14)
+                    (const_int 7) (const_int 15)])))]
+ "TARGET_VIS"
+ "fpmerge\t%H1, %H2, %0"
+ [(set_attr "type" "fga")
+  (set_attr "fptype" "double")])
+
 ;; Partitioned multiply instructions
 (define_insn "fmul8x16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V4HI 2 "register_operand" "e")))]
+        (unspec:V4HI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V4HI 2 "register_operand" "e")]
+         UNSPEC_MUL8))]
   "TARGET_VIS"
   "fmul8x16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmul8x16au_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V2HI 2 "register_operand" "f")))]
+        (unspec:V4HI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V2HI 2 "register_operand" "f")]
+         UNSPEC_MUL16AU))]
   "TARGET_VIS"
   "fmul8x16au\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8049,11 +8080,11 @@
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmul8sux16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V8QI 1 "register_operand" "e")
-                   (match_operand:V4HI 2 "register_operand" "e")))]
+        (unspec:V4HI [(match_operand:V8QI 1 "register_operand" "e")
+                      (match_operand:V4HI 2 "register_operand" "e")]
+         UNSPEC_MUL8SU))]
   "TARGET_VIS"
   "fmul8sux16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8069,11 +8100,11 @@
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmuld8sux16_vis"
   [(set (match_operand:V2SI 0 "register_operand" "=e")
-        (mult:V2SI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V2HI 2 "register_operand" "f")))]
+        (unspec:V2SI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V2HI 2 "register_operand" "f")]
+         UNSPEC_MULDSU))]
   "TARGET_VIS"
   "fmuld8sux16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8440,8 +8471,10 @@
 (define_code_iterator vis3_shift [ashift ss_ashift lshiftrt ashiftrt])
 (define_code_attr vis3_shift_insn
   [(ashift "fsll") (ss_ashift "fslas") (lshiftrt "fsrl") (ashiftrt "fsra")])
+(define_code_attr vis3_shift_patname
+  [(ashift "ashl") (ss_ashift "ssashl") (lshiftrt "lshr") (ashiftrt "ashr")])
    
-(define_insn "<vis3_shift_insn><vbits>_vis"
+(define_insn "v<vis3_shift_patname><mode>3"
   [(set (match_operand:V64N8 0 "register_operand" "=<vconstr>")
         (vis3_shift:V64N8 (match_operand:V64N8 1 "register_operand" 
"<vconstr>")
                           (match_operand:V64N8 2 "register_operand" 
"<vconstr>")))]
@@ -8490,8 +8523,10 @@
 (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
 (define_code_attr vis3_addsub_ss_insn
   [(ss_plus "fpadds") (ss_minus "fpsubs")])
+(define_code_attr vis3_addsub_ss_patname
+  [(ss_plus "ssadd") (ss_minus "sssub")])
 
-(define_insn "<vis3_addsub_ss_insn><vbits>_vis"
+(define_insn "<vis3_addsub_ss_patname><mode>"
   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" 
"<vconstr>")
                              (match_operand:VASS 2 "register_operand" 
"<vconstr>")))]

Re: VIS2 pattern review

Reply via email to