I've committed this patch to add a missing vector operator on amdgcn.

The architecture doesn't have a 64-bit not instruction so we didn't have an insn for it, but the vectorizer didn't like that and caused the v64df_pow function to use 2MB of stack frame. This is a problem when you typically have over 3000 threads and only want to allocate 32k of stack space each!

Andrew
amdgcn: Add 64-bit vector not

gcc/ChangeLog:

        * config/gcn/gcn-valu.md (one_cmpl<mode>2<exec>): New.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 44d107145db..c0b43fcfb64 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2791,6 +2791,23 @@ (define_expand "neg<mode>2"
     DONE;
   })
 
+(define_insn_and_split "one_cmpl<mode>2<exec>"
+  [(set (match_operand:V_DI 0 "register_operand"  "=   v")
+        (not:V_DI
+          (match_operand:V_DI 1 "gcn_alu_operand" "vSvDB")))]
+  ""
+  "#"
+  "reload_completed"
+  [(set (match_dup 3) (not:<VnSI> (match_dup 5)))
+   (set (match_dup 4) (not:<VnSI> (match_dup 6)))]
+  {
+    operands[3] = gcn_operand_part (<VnDI>mode, operands[0], 0);
+    operands[4] = gcn_operand_part (<VnDI>mode, operands[0], 1);
+    operands[5] = gcn_operand_part (<VnDI>mode, operands[1], 0);
+    operands[6] = gcn_operand_part (<VnDI>mode, operands[1], 1);
+  }
+  [(set_attr "type" "mult")])
+
 ;; }}}
 ;; {{{ FP binops - special cases
 

Reply via email to