Richard Sandiford wrote:

> Conceptually what we're saying here is that if the given classes
> include both GENERAL_REGS and FP_REGS, we'll choose between them
> based on the mode of the register.  And that makes sense for any
> class that includes both GENERAL_REGS and FP_REGS.  We could write
> it that way if it seems better, i.e.:
>
>  if (!reg_class_subset_p (GENERAL_REGS, ...)
>      || !reg_class_subset_p (FP_REGS, ...))
>    ...
>
> That way we don't mention any union classes, and I think the meaning
> is clear in the context of eventually returning GENERAL_REGS or FP_REGS.
>
> reg_class_subset_p tests for the normal inclusive subset relation
> rather than "strict subset".

Right, checking for a subset of GENERAL_REGS and FP_REGS does make sense
and is more clear as well. It appears to behave identically, so here is the new 
version:


A recent commit removing '*' from the md files caused a large regression in 
h264ref.
It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective 
after the
SVE changes, and the combination results in the regression.  This patch fixes 
it by
explicitly checking for a subset of GENERAL_REGS and FP_REGS.
Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.

Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK 
to
backport to GCC8?

ChangeLog:
2018-05-25  Wilco Dijkstra  <wdijk...@arm.com>

        * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
        Check for subset of GENERAL_REGS and FP_REGS.
        * config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of 
r=w alternative.

--
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>"
 ;; is guaranteed so upper bits should be considered undefined.
 ;; RTL uses GCC vector extension indices throughout so flip only for assembly.
 (define_insn "aarch64_get_lane<mode>"
-  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, 
Utv")
        (vec_select:<VEL>
          (match_operand:VALL_F16 1 "register_operand" "w, w, w")
          (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
47d98dfd095cdcd15908a86091cf2f8a4d6137b1..6e7722187f0f79195c8b6c43f463a3ac9aa61742
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char 
*msg)
 }
 
 /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
-   The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
-   the same cost even if ALL_REGS has a much larger cost.  ALL_REGS is also
-   used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory
-   cost (in this case the best class is the lowest cost one).  Using ALL_REGS
-   irrespectively of its cost results in bad allocations with many redundant
-   int<->FP moves which are expensive on various cores.
-   To avoid this we don't allow ALL_REGS as the allocno class, but force a
-   decision between FP_REGS and GENERAL_REGS.  We use the allocno class if it
-   isn't ALL_REGS.  Similarly, use the best class if it isn't ALL_REGS.
-   Otherwise set the allocno class depending on the mode.
+   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
+   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much
+   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both FP_REGS
+   and GENERAL_REGS is lower than the memory cost (in this case the best class
+   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of its
+   cost results in bad allocations with many redundant int<->FP moves which
+   are expensive on various cores.
+   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but
+   force a decision between FP_REGS and GENERAL_REGS.  We use the allocno class
+   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it isn't
+   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on the mode.
    The result of this is that it is no longer inefficient to have a higher
    memory move cost than the register move cost.
 */
@@ -1079,10 +1080,12 @@ aarch64_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class,
 {
   machine_mode mode;
 
-  if (allocno_class != ALL_REGS)
+  if (reg_class_subset_p (allocno_class, GENERAL_REGS)
+      || reg_class_subset_p (allocno_class, FP_REGS))
     return allocno_class;
 
-  if (best_class != ALL_REGS)
+  if (reg_class_subset_p (best_class, GENERAL_REGS)
+      || reg_class_subset_p (best_class, FP_REGS))
     return best_class;
 
   mode = PSEUDO_REGNO_MODE (regno);

Reply via email to