Wilco Dijkstra <wilco.dijks...@arm.com> writes: > A recent commit removing '*' from the md files caused a large regression > in h264ref. > It turns out aarch64_ira_change_pseudo_allocno_class is no longer > effective after the > SVE changes, and the combination results in the regression. This patch > fixes it by > using the new POINTER_AND_FP_REGS register class which is now used > instead of ALL_REGS. > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. > > Passes regress, OK for commit? > > Since it is a regression introduced in GCC8, OK to backport to GCC8? > > ChangeLog: > 2018-05-22 Wilco Dijkstra <wdijk...@arm.com> > > * config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class): > Use POINTER_AND_FP_REGSinstead of ALL_REGS. > * config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase > cost of r=w alternative. > -- > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>" > ;; is guaranteed so upper bits should be considered undefined. > ;; RTL uses GCC vector extension indices throughout so flip only for > assembly. > (define_insn "aarch64_get_lane<mode>" > - [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, > Utv") > + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, > Utv") > (vec_select:<VEL> > (match_operand:VALL_F16 1 "register_operand" "w, w, w") > (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))] > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..a119760c7f332aded200fa1b5bcfb1bbac7b6420 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const > char *msg) > } > > /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS. > - The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have > - the same cost even if ALL_REGS has a much larger cost. ALL_REGS is also > - used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory > - cost (in this case the best class is the lowest cost one). Using ALL_REGS > - irrespectively of its cost results in bad allocations with many redundant > - int<->FP moves which are expensive on various cores. > - To avoid this we don't allow ALL_REGS as the allocno class, but force a > - decision between FP_REGS and GENERAL_REGS. We use the allocno class if it > - isn't ALL_REGS. Similarly, use the best class if it isn't ALL_REGS. > - Otherwise set the allocno class depending on the mode. > + The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and > + GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much > + higher cost. POINTER_AND_FP_REGS is also used if the cost of both FP_REGS > + and GENERAL_REGS is lower than the memory cost (in this case the best > class > + is the lowest cost one). Using POINTER_AND_FP_REGS irrespectively of its > + cost results in bad allocations with many redundant int<->FP moves which > + are expensive on various cores. > + To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but > + force a decision between FP_REGS and GENERAL_REGS. We use the allocno > class > + if it isn't POINTER_AND_FP_REGS. Similarly, use the best class if it > isn't > + POINTER_AND_FP_REGS. Otherwise set the allocno class depending on the > mode. > The result of this is that it is no longer inefficient to have a higher > memory move cost than the register move cost. > */ > @@ -1079,10 +1080,10 @@ aarch64_ira_change_pseudo_allocno_class (int regno, > reg_class_t allocno_class, > { > machine_mode mode; > > - if (allocno_class != ALL_REGS) > + if (allocno_class != POINTER_AND_FP_REGS) > return allocno_class; > > - if (best_class != ALL_REGS) > + if (best_class != POINTER_AND_FP_REGS) > return best_class; > > mode = PSEUDO_REGNO_MODE (regno);
I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...) instead of ... != POINTER_AND_FP_REGS, since this in principle still applies to ALL_REGS too. FWIW, the patch looks good to me with that change. Thanks, Richard