This patch makes a number of corrections to rs6000_register_move_cost,
adds a new register union class, GEN_OR_VSX_REGS, and adjusts insn
alternative to suit.

The patch initially just corrected register move cost when direct
moves are available, but that resulted in regressions.  Inspection of
those regressions showed ALL_REGS being used as the register allocno
class, which isn't ideal.  gcc/doc/tm.texi says: "You should define a
class for the union of two classes whenever some instruction allows
both classes".  Thus, define GEN_OR_VSX_REGS for the register
allocator.  (IRA wants to use the union of two register classes when
the costs of the classes are below memory cost, which happens more
often with the low direct move cost.)

As per https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89271#c11 we ought
to be returning the minimal cost for union classes.  That can be done
by rs6000_register_move_cost testing for vsx first, where the number
of regs for a given mode might be smaller than the same mode in gprs,
and adding another test to the LINK_OR_CTR_REGS case to exclude
SPEC_OR_GEN_REGS and NON_FLOAT_REGS.

I removed the VECTOR_MEM_VSX_P test since that leads to silly results
for scalar mode moves between altivec and float when TARGET_VSX.  eg.
rs6000_register_move_cost:, ret=2, mode=DF, from=FLOAT_REGS, to=FLOAT_REGS
rs6000_register_move_cost:, ret=16, mode=DF, from=FLOAT_REGS, to=ALTIVEC_REGS
rs6000_register_move_cost:, ret=2, mode=DF, from=FLOAT_REGS, to=VSX_REGS

The patch also fixes wrong results for moves within and between any of
the non-gpr, non-vsx special reg classes.  The comment about "moving
between two similar registers is just one instruction" is false.  We
can't move lr to ctr directly, for example.  I believe the intent of
the "reg_classes_intersect_p (to, from)" was to cover moves within
float or altivec, so I moved that test inside the code handling vsx,
and made sure the intersection wasn't anything besides vsx by masking
off everything else.  Masking isn't strictly necessary at the moment,
but would be if we create a GEN_OR_ALTIVEC_REGS class some time in the
future.

TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
as an allocno class.  It is similar to the aarch64 version but without
any selection by regno mode if the best class is a union class.

Without the .md file tweaks, these tests fail:
+FAIL: gcc.target/powerpc/p9-dimode1.c scan-assembler-not \\mmtvsrd\\M
+FAIL: gcc.target/powerpc/p9-splat-4.c scan-assembler mtvsrdd
+FAIL: gcc.target/powerpc/pr81348.c scan-assembler \\mlxsihzx\\M
+FAIL: gcc.target/powerpc/pr81348.c scan-assembler \\mvextsh2d\\M
+FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb
+FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw
+FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish
+FAIL: gcc.target/powerpc/vsx-simode3.c scan-assembler lxsiwzx

Also, using a register move cost of 2 for for power9 direct moves
gives these fails, even with the .md file tweaks:
+FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb
+FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler xxspltib
+FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw
+FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz
+FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish
+FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler xxspltib
+FAIL: gcc.target/powerpc/vsx-himode3.c scan-assembler lxsihzx
+FAIL: gcc.target/powerpc/vsx-qimode3.c scan-assembler lxsibzx
These can be all be fixed by removing "?"s disparaging vector
alternatives in movsi_internal1 and mov<mode>_internal.

As is, net testsuite change for powerpc64le-linux all langs bootstrap
at r269867 is:
-FAIL: gcc.target/powerpc/fold-vec-extract-double.p8.c scan-assembler-times 
\\mmtvsrd\\M 1
-FAIL: gcc.target/powerpc/fold-vec-extract-double.p8.c scan-assembler-times 
\\mrldic\\M|\\mrlwinm\\M 1
-FAIL: gcc.target/powerpc/fold-vec-extract-double.p8.c scan-assembler-times 
\\mvslo\\M 1
-FAIL: gcc.target/powerpc/fold-vec-extract-double.p8.c scan-assembler-times 
\\mxori\\M 1
-FAIL: gcc.target/powerpc/fold-vec-extract-double.p8.c scan-assembler-times 
\\mxxlor\\M 2
-FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p9.c scan-assembler-times 
\\mmfvsrd\\M 6
-FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p9.c scan-assembler-times 
\\mmtvsrdd\\M 3
-FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p9.c scan-assembler-times 
\\mrldic\\M 3
-FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p9.c scan-assembler-times 
\\mvslo\\M 3
-FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p9.c scan-assembler-times 
\\mxori\\M 3
-FAIL: gcc.target/powerpc/fold-vec-insert-char-p9.c scan-assembler-times 
\\mmtvsrwz\\M 4
-FAIL: gcc.target/powerpc/fold-vec-insert-short-p9.c scan-assembler-times 
\\mmtvsrwz\\M 4
-FAIL: gcc.target/powerpc/vsx-himode2.c scan-assembler mtvsrwz
-FAIL: gcc.target/powerpc/vsx-qimode2.c scan-assembler mtvsrwz
-FAIL: gcc.target/powerpc/vsx-simode2.c scan-assembler mtvsrwz

OK assuming Pat's latest spec test run doesn't contain any nasty
surprises?

        PR target/89271
        * config/rs6000/rs6000.h (enum reg_class, REG_CLASS_NAMES),
        (REG_CLASS_CONTENTS): Add GEN_OR_VSX_REGS class.
        * config/rs6000/darwin.h (PREFERRED_RELOAD_CLASS): Treat
        GEN_OR_VSX_REGS like GEN_OR_FLOAT_REGS.
        * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Likewise.
        (rs6000_register_move_cost): Correct cost for general <-> vsx
        when direct moves are available.  Cost union classes at minimal
        cost for any reg in the class.  Correct calculation for moves
        between vsx, float, and altivec.  Don't return a low cost for
        moves between special regs.
        (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define.
        (rs6000_ira_change_pseudo_allocno_class): New function.
        * config/rs6000/rs6000.md (movsi_internal1, mov<mode>_internal),
        (movdi_internal32, movdi_internal64): Remove '*' from vsx register
        alternatives.
        * config/rs6000/vsx.md (vsx_splat_<mode>_reg): Don't deprecate
        we <- b alternative.

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 9fb36e41e7d..20c59f89c8f 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
       && reg_class_subset_p (BASE_REGS, (CLASS)))              \
    ? BASE_REGS                                                 \
    : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT                        \
-      && (CLASS) == GEN_OR_FLOAT_REGS)                         \
+      && ((CLASS) == GEN_OR_FLOAT_REGS                         \
+         || (CLASS) == GEN_OR_VSX_REGS))                       \
    ? GENERAL_REGS                                              \
    : (CLASS))
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index cc8dc941537..fbc16c65de4 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1737,6 +1737,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #define TARGET_REGISTER_MOVE_COST rs6000_register_move_cost
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST rs6000_memory_move_cost
+#undef TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
+#define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \
+  rs6000_ira_change_pseudo_allocno_class
 #undef TARGET_CANNOT_COPY_INSN_P
 #define TARGET_CANNOT_COPY_INSN_P rs6000_cannot_copy_insn_p
 #undef TARGET_RTX_COSTS
@@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
rclass)
       return NO_REGS;
     }
 
-  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
+  if (GET_MODE_CLASS (mode) == MODE_INT
+      && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS))
     return GENERAL_REGS;
 
   return rclass;
@@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode,
                           reg_class_t from, reg_class_t to)
 {
   int ret;
+  reg_class_t rclass;
 
   if (TARGET_DEBUG_COST)
     dbg_cost_ctrl++;
 
+  /* If we have VSX, we can easily move between FPR or Altivec registers,
+     otherwise we can only easily move within classes.
+     Do this first so we give best-case answers for union classes
+     containing both gprs and vsx regs.  */
+  HARD_REG_SET to_vsx, from_vsx;
+  COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]);
+  AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]);
+  COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]);
+  AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]);
+  if (!hard_reg_set_empty_p (to_vsx)
+      && !hard_reg_set_empty_p (from_vsx)
+      && (TARGET_VSX
+         || hard_reg_set_intersect_p (to_vsx, from_vsx)))
+    {
+      int reg = FIRST_ALTIVEC_REGNO;
+      if (TARGET_VSX
+         || (TEST_HARD_REG_BIT (to_vsx, FIRST_FPR_REGNO)
+             && TEST_HARD_REG_BIT (from_vsx, FIRST_FPR_REGNO)))
+       reg = FIRST_FPR_REGNO;
+      ret = 2 * hard_regno_nregs (reg, mode);
+    }
+
   /*  Moves from/to GENERAL_REGS.  */
-  if (reg_classes_intersect_p (to, GENERAL_REGS)
-      || reg_classes_intersect_p (from, GENERAL_REGS))
+  else if ((rclass = from, reg_classes_intersect_p (to, GENERAL_REGS))
+          || (rclass = to, reg_classes_intersect_p (from, GENERAL_REGS)))
     {
-      reg_class_t rclass = from;
-
-      if (! reg_classes_intersect_p (to, GENERAL_REGS))
-       rclass = to;
-
       if (rclass == FLOAT_REGS || rclass == ALTIVEC_REGS || rclass == VSX_REGS)
-       ret = (rs6000_memory_move_cost (mode, rclass, false)
-              + rs6000_memory_move_cost (mode, GENERAL_REGS, false));
+       {
+         if (TARGET_DIRECT_MOVE
+             && (rs6000_tune == PROCESSOR_POWER8
+                 || rs6000_tune == PROCESSOR_POWER9))
+           {
+             if (rs6000_tune == PROCESSOR_POWER8)
+               ret = 4 * hard_regno_nregs (0, mode);
+             else
+               ret = 3 * hard_regno_nregs (0, mode);
+             /* SFmode requires a conversion when moving between gprs
+                and vsx.  */
+             if (mode == SFmode)
+               ret += 2;
+           }
+         else
+           ret = (rs6000_memory_move_cost (mode, rclass, false)
+                  + rs6000_memory_move_cost (mode, GENERAL_REGS, false));
+       }
 
       /* It's more expensive to move CR_REGS than CR0_REGS because of the
         shift.  */
@@ -34994,7 +35032,8 @@ rs6000_register_move_cost (machine_mode mode,
                || rs6000_tune == PROCESSOR_POWER7
                || rs6000_tune == PROCESSOR_POWER8
                || rs6000_tune == PROCESSOR_POWER9)
-              && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS))
+              && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS)
+              && !reg_classes_intersect_p (rclass, GENERAL_REGS))
         ret = 6 * hard_regno_nregs (0, mode);
 
       else
@@ -35002,16 +35041,6 @@ rs6000_register_move_cost (machine_mode mode,
        ret = 2 * hard_regno_nregs (0, mode);
     }
 
-  /* If we have VSX, we can easily move between FPR or Altivec registers.  */
-  else if (VECTOR_MEM_VSX_P (mode)
-          && reg_classes_intersect_p (to, VSX_REGS)
-          && reg_classes_intersect_p (from, VSX_REGS))
-    ret = 2 * hard_regno_nregs (FIRST_FPR_REGNO, mode);
-
-  /* Moving between two similar registers is just one instruction.  */
-  else if (reg_classes_intersect_p (to, from))
-    ret = (FLOAT128_2REG_P (mode)) ? 4 : 2;
-
   /* Everything else has to go through GENERAL_REGS.  */
   else
     ret = (rs6000_register_move_cost (mode, GENERAL_REGS, to)
@@ -35064,6 +35093,64 @@ rs6000_memory_move_cost (machine_mode mode, 
reg_class_t rclass,
   return ret;
 }
 
+/* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
+
+   The register allocator chooses GEN_OR_VSX_REGS for the allocno
+   class if GENERAL_REGS and VSX_REGS cost is lower than the memory
+   cost.  This happens a lot when TARGET_DIRECT_MOVE makes the register
+   move cost between GENERAL_REGS and VSX_REGS low.
+
+   It might seem reasonable to use a union class.  After all, if usage
+   of vsr is low and gpr high, it might make sense to spill gpr to vsr
+   rather than memory.  However, in cases where register pressure of
+   both is high, like the cactus_adm spec test, allowing
+   GEN_OR_VSX_REGS as the allocno class results in bad decisions in
+   the first scheduling pass.  This is partly due to an allocno of
+   GEN_OR_VSX_REGS wrongly contributing to the GENERAL_REGS pressure
+   class, which gives too high a pressure for GENERAL_REGS and too low
+   for VSX_REGS.  So, force a choice of the subclass here.
+
+   The best class is also the union if GENERAL_REGS and VSX_REGS have
+   the same cost.  In that case we do use GEN_OR_VSX_REGS as the
+   allocno class, since trying to narrow down the class by regno mode
+   is prone to error.  For example, SImode is allowed in VSX regs and
+   in some cases (eg. gcc.target/powerpc/p9-xxbr-3.c do_bswap32_vect)
+   it would be wrong to choose an allocno of GENERAL_REGS based on
+   SImode.  */
+
+static reg_class_t
+rs6000_ira_change_pseudo_allocno_class (int regno ATTRIBUTE_UNUSED,
+                                       reg_class_t allocno_class,
+                                       reg_class_t best_class)
+{
+  switch (allocno_class)
+    {
+    default:
+      break;
+
+    case GEN_OR_VSX_REGS:
+      /* best_class must be a subset of allocno_class.  */
+      gcc_checking_assert (best_class == GEN_OR_VSX_REGS
+                          || best_class == GEN_OR_FLOAT_REGS
+                          || best_class == VSX_REGS
+                          || best_class == ALTIVEC_REGS
+                          || best_class == FLOAT_REGS
+                          || best_class == GENERAL_REGS
+                          || best_class == BASE_REGS);
+      /* Use best_class but choose wider classes when copying from the
+        wider class to best_class is cheap.  This mimics IRA choice
+        of allocno class.  */
+      if (best_class == BASE_REGS)
+       return GENERAL_REGS;
+      if (TARGET_VSX
+         && (best_class == FLOAT_REGS || best_class == ALTIVEC_REGS))
+       return VSX_REGS;
+      return best_class;
+    }
+
+  return allocno_class;
+}
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index ff9449c2d45..f720368cfb4 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1139,6 +1139,7 @@ enum reg_class
   VSCR_REGS,
   SPR_REGS,
   GEN_OR_FLOAT_REGS,
+  GEN_OR_VSX_REGS,
   LINK_REGS,
   CTR_REGS,
   LINK_OR_CTR_REGS,
@@ -1168,6 +1169,7 @@ enum reg_class
   "VSCR_REGS",                                                         \
   "SPR_REGS",                                                          \
   "GEN_OR_FLOAT_REGS",                                                 \
+  "GEN_OR_VSX_REGS",                                                   \
   "LINK_REGS",                                                         \
   "CTR_REGS",                                                          \
   "LINK_OR_CTR_REGS",                                                  \
@@ -1206,6 +1208,8 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x00000000, 0x00010000 },                  \
   /* GEN_OR_FLOAT_REGS.  */                                            \
   { 0xffffffff, 0xffffffff, 0x00000008, 0x00008000 },                  \
+  /* GEN_OR_VSX_REGS.  */                                              \
+  { 0xffffffff, 0xffffffff, 0xffffe008, 0x00009fff },                  \
   /* LINK_REGS.  */                                                    \
   { 0x00000000, 0x00000000, 0x00000002, 0x00000000 },                  \
   /* CTR_REGS.  */                                                     \
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b8dd85905f3..c4fddbe6f35 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6832,10 +6832,10 @@ (define_insn "movsi_low"
 ;;             MF%1         MT%0         NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-               "=r,         r,           r,           ?*wI,        ?*wH,
+               "=r,         r,           r,           ?wI,         ?wH,
                 m,          ?Z,          ?Z,          r,           r,
-                r,          ?*wIwH,      ?*wJwK,      ?*wJwK,      ?*wu,
-                ?*wJwK,     ?*wH,        ?*wK,        ?*wIwH,      ?r,
+                r,          ?wIwH,       ?wJwK,       ?wJwK,       ?wu,
+                ?wJwK,      ?wH,         ?wK,         ?wIwH,       ?r,
                 r,          *h,          *h")
 
        (match_operand:SI 1 "input_operand"
@@ -7106,13 +7106,13 @@ (define_expand "mov<mode>"
 ;;             MTVSRWZ     MF%1       MT%1       NOP
 (define_insn "*mov<mode>_internal"
   [(set (match_operand:QHI 0 "nonimmediate_operand"
-               "=r,        r,         ?*wJwK,    m,         Z,         r,
-                ?*wJwK,    ?*wJwK,    ?*wJwK,    ?*wK,      ?*wK,      r,
-                ?*wJwK,    r,         *c*l,      *h")
+               "=r,        r,         ?wJwK,     m,         Z,         r,
+                ?wJwK,     ?wJwK,     ?wJwK,     ?wK,       ?wK,       r,
+                ?wJwK,     r,         *c*l,      *h")
 
        (match_operand:QHI 1 "input_operand"
                "r,         m,         Z,         r,         wJwK,      i,
-                wJwK,      O,         wM,        wB,        wS,        ?*wJwK,
+                wJwK,      O,         wM,        wB,        wS,        ?wJwK,
                 r,         *h,        r,         0"))]
 
   "gpc_reg_operand (operands[0], <MODE>mode)
@@ -8673,8 +8673,8 @@ (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
           r,         wY,        Z,         ^wb,       $wv,       ^wi,
-          *wo,       *wo,       *wv,       *wi,       *wi,       *wv,
-          *wv")
+          wo,        wo,        wv,        wi,        *i,        wv,
+          wv")
 
        (match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
@@ -8753,9 +8753,9 @@ (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
                "=YZ,       r,         r,         r,         r,          r,
                 m,         ^d,        ^d,        wY,        Z,          $wb,
-                $wv,       ^wi,       *wo,       *wo,       *wv,        *wi,
-                *wi,       *wv,       *wv,       r,         *h,         *h,
-                ?*r,       ?*wg,      ?*r,       ?*wj")
+                $wv,       ^wi,       wo,        wo,        wv,         wi,
+                wi,        wv,        wv,        r,         *h,         *h,
+                ?r,        ?wg,       ?r,        ?wj")
 
        (match_operand:DI 1 "input_operand"
                "r,         YZ,        r,         I,         L,          nF,
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f81d5fb1009..0d3d902cb86 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4106,7 +4106,7 @@ (define_expand "vsx_splat_<mode>"
 })
 
 (define_insn "vsx_splat_<mode>_reg"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSX_D:VSa>,?we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSX_D:VSa>,we")
        (vec_duplicate:VSX_D
         (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VSX_D:VS_64reg>,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"

-- 
Alan Modra
Australia Development Lab, IBM

Reply via email to