Hi,

This patch makes sure that allocno copies are not created for unordered modes. The testcases in the PR highlighted a case where an allocno copy was being created for:
(insn 121 120 123 11 (parallel [
            (set (reg:VNx2QI 217)
                (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0)))
            (clobber (scratch:VNx16BI))
        ]) 4750 {*vec_duplicatevnx2qi_reg}
     (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
        (nil)))

As the compiler detected that the vec_duplicate<mode>_reg pattern allowed the input and output operand to be of the same register class, it tried to create an allocno copy for these two operands, stripping subregs in the process. However, this meant that the copy was between VNx2QI and SI, which have unordered mode precisions.

So at compile time we do not know which of the two modes is smaller which is a requirement when updating allocno copy costs.

Regression tested on aarch64-linux-gnu.

Is this OK for trunk (and after a week backport to gcc-10) ?

Regards,
Andre


gcc/ChangeLog:
2021-02-19  Andre Vieira  <andre.simoesdiasvie...@arm.com>

        PR rtl-optimization/98791
        * ira-conflicts.c (process_regs_for_copy): Don't create allocno copies for unordered modes.

gcc/testsuite/ChangeLog:
2021-02-19  Andre Vieira  <andre.simoesdiasvie...@arm.com>

        PR rtl-optimization/98791
        * gcc.target/aarch64/sve/pr98791.c: New test.

diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 
2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0
 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool 
constraint_p,
       ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
       ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
 
-      if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2)
+      if (!allocnos_conflict_for_copy_p (a1, a2)
+         && offset1 == offset2
+         && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
+                       GET_MODE_PRECISION (ALLOCNO_MODE (a2))))
        {
          cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
                                     ira_curr_loop_tree_node);
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
new file mode 100644
index 
0000000000000000000000000000000000000000..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/98791  */
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */
+extern char a[], b[];
+short c, d;
+long *e;
+void f() {
+  for (int g; g < c; g += 1) {
+    a[g] = d;
+    b[g] = e[g];
+  }
+}

Reply via email to