Unfortunately I accidentally dropped this patch series without posting V2
after Kewen's patch review. :-(  Here's V2 with the suggested changes made.
Here is the thread from the V1 patch submission:

https://inbox.sourceware.org/gcc-patches/1f32e2bf-83c2-4664-b7f3-4a6996978...@linux.ibm.com/


Changes from V1:
* Delete unneeded UNSPEC_MMA_EXTRACT.
* Use 16 rather than GET_MODE_SIZE (V16QImode) as requested by Kewen.
* Remove unneeded change to rs6000_modes_tieable_p.
* Rebase to current trunk.

PR109116 exposes an issue where using unspecs to access each vector component
of an opaque mode variable leads to unneeded register copies, because our rtl
optimizers cannot handle unspecs.  Instead, use subregs to access each vector
register component of the opaque mode variable, which our optimizers know how
to handle and optimize.

I did not include a test case with the patch, since writing a test case that
attempts to ensure we don't emit unneeded register copies is nearly impossible
since those copies can still be generated for reasons other than the causes
in this patch.  I have verified that this patch does improve code generation
for some unit tests and our AI libraries team has confirmed that performance
of their tests improved when using this patch.

This passed bootstrap and regtesting with no regressions on powerpc64le-linux
and powerpc64-linux.  Ok for trunk?

Peter


gcc/
        PR target/109116
        * config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_EXTRACT.
        (vsx_disassemble_pair): Expand into a vector register sized subreg.
        (mma_disassemble_acc): Likewise.
        (*vsx_disassemble_pair): Delete.
        (*mma_disassemble_acc): Likewise.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 50e577ab44d..85f3a926682 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -30,7 +30,6 @@ (define_constants [(MAX_MMA_OPERANDS 7)])
 
 (define_c_enum "unspec"
   [UNSPEC_VSX_ASSEMBLE
-   UNSPEC_MMA_EXTRACT
    UNSPEC_MMA_PMXVBF16GER2
    UNSPEC_MMA_PMXVBF16GER2NN
    UNSPEC_MMA_PMXVBF16GER2NP
@@ -398,29 +397,8 @@ (define_expand "vsx_disassemble_pair"
    (match_operand 2 "const_0_to_1_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-                       gen_rtvec (2, operands[1], GEN_INT (regoff)),
-                       UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*vsx_disassemble_pair"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-       (unspec:V16QI [(match_operand:OO 1 "vsx_register_operand" "wa")
-                     (match_operand 2 "const_0_to_1_operand")]
-                     UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && vsx_register_operand (operands[1], OOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * 16;
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], OOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })
@@ -472,29 +450,8 @@ (define_expand "mma_disassemble_acc"
    (match_operand 2 "const_0_to_3_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-                       gen_rtvec (2, operands[1], GEN_INT (regoff)),
-                       UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*mma_disassemble_acc"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-       (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d")
-                     (match_operand 2 "const_0_to_3_operand")]
-                     UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && fpr_reg_operand (operands[1], XOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * 16;
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], XOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })

Reply via email to