I've committed this patch to add a missing vector operator on amdgcn.
The architecture doesn't have a 64-bit not instruction so we didn't have
an insn for it, but the vectorizer didn't like that and caused the
v64df_pow function to use 2MB of stack frame. This is a problem when you
typically have over 3000 threads and only want to allocate 32k of stack
space each!
Andrew
amdgcn: Add 64-bit vector not
gcc/ChangeLog:
* config/gcn/gcn-valu.md (one_cmpl<mode>2<exec>): New.
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 44d107145db..c0b43fcfb64 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2791,6 +2791,23 @@ (define_expand "neg<mode>2"
DONE;
})
+(define_insn_and_split "one_cmpl<mode>2<exec>"
+ [(set (match_operand:V_DI 0 "register_operand" "= v")
+ (not:V_DI
+ (match_operand:V_DI 1 "gcn_alu_operand" "vSvDB")))]
+ ""
+ "#"
+ "reload_completed"
+ [(set (match_dup 3) (not:<VnSI> (match_dup 5)))
+ (set (match_dup 4) (not:<VnSI> (match_dup 6)))]
+ {
+ operands[3] = gcn_operand_part (<VnDI>mode, operands[0], 0);
+ operands[4] = gcn_operand_part (<VnDI>mode, operands[0], 1);
+ operands[5] = gcn_operand_part (<VnDI>mode, operands[1], 0);
+ operands[6] = gcn_operand_part (<VnDI>mode, operands[1], 1);
+ }
+ [(set_attr "type" "mult")])
+
;; }}}
;; {{{ FP binops - special cases