Unfortunately I accidentally dropped this patch series without posting V2 after Kewen's patch review. :-( Here's V2 with the suggested changes made. Here is the thread from the V1 patch submission:
https://inbox.sourceware.org/gcc-patches/1f32e2bf-83c2-4664-b7f3-4a6996978...@linux.ibm.com/ Changes from V1: * Delete unneeded UNSPEC_MMA_EXTRACT. * Use 16 rather than GET_MODE_SIZE (V16QImode) as requested by Kewen. * Remove unneeded change to rs6000_modes_tieable_p. * Rebase to current trunk. PR109116 exposes an issue where using unspecs to access each vector component of an opaque mode variable leads to unneeded register copies, because our rtl optimizers cannot handle unspecs. Instead, use subregs to access each vector register component of the opaque mode variable, which our optimizers know how to handle and optimize. I did not include a test case with the patch, since writing a test case that attempts to ensure we don't emit unneeded register copies is nearly impossible since those copies can still be generated for reasons other than the causes in this patch. I have verified that this patch does improve code generation for some unit tests and our AI libraries team has confirmed that performance of their tests improved when using this patch. This passed bootstrap and regtesting with no regressions on powerpc64le-linux and powerpc64-linux. Ok for trunk? Peter gcc/ PR target/109116 * config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_EXTRACT. (vsx_disassemble_pair): Expand into a vector register sized subreg. (mma_disassemble_acc): Likewise. (*vsx_disassemble_pair): Delete. (*mma_disassemble_acc): Likewise. diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 50e577ab44d..85f3a926682 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -30,7 +30,6 @@ (define_constants [(MAX_MMA_OPERANDS 7)]) (define_c_enum "unspec" [UNSPEC_VSX_ASSEMBLE - UNSPEC_MMA_EXTRACT UNSPEC_MMA_PMXVBF16GER2 UNSPEC_MMA_PMXVBF16GER2NN UNSPEC_MMA_PMXVBF16GER2NP @@ -398,29 +397,8 @@ (define_expand "vsx_disassemble_pair" (match_operand 2 "const_0_to_1_operand")] "TARGET_MMA" { - rtx src; - int regoff = INTVAL (operands[2]); - src = gen_rtx_UNSPEC (V16QImode, - gen_rtvec (2, operands[1], GEN_INT (regoff)), - UNSPEC_MMA_EXTRACT); - emit_move_insn (operands[0], src); - DONE; -}) - -(define_insn_and_split "*vsx_disassemble_pair" - [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa") - (unspec:V16QI [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand 2 "const_0_to_1_operand")] - UNSPEC_MMA_EXTRACT))] - "TARGET_MMA - && vsx_register_operand (operands[1], OOmode)" - "#" - "&& reload_completed" - [(const_int 0)] -{ - int reg = REGNO (operands[1]); - int regoff = INTVAL (operands[2]); - rtx src = gen_rtx_REG (V16QImode, reg + regoff); + int regoff = INTVAL (operands[2]) * 16; + rtx src = simplify_gen_subreg (V16QImode, operands[1], OOmode, regoff); emit_move_insn (operands[0], src); DONE; }) @@ -472,29 +450,8 @@ (define_expand "mma_disassemble_acc" (match_operand 2 "const_0_to_3_operand")] "TARGET_MMA" { - rtx src; - int regoff = INTVAL (operands[2]); - src = gen_rtx_UNSPEC (V16QImode, - gen_rtvec (2, operands[1], GEN_INT (regoff)), - UNSPEC_MMA_EXTRACT); - emit_move_insn (operands[0], src); - DONE; -}) - -(define_insn_and_split "*mma_disassemble_acc" - [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa") - (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d") - (match_operand 2 "const_0_to_3_operand")] - UNSPEC_MMA_EXTRACT))] - "TARGET_MMA - && fpr_reg_operand (operands[1], XOmode)" - "#" - "&& reload_completed" - [(const_int 0)] -{ - int reg = REGNO (operands[1]); - int regoff = INTVAL (operands[2]); - rtx src = gen_rtx_REG (V16QImode, reg + regoff); + int regoff = INTVAL (operands[2]) * 16; + rtx src = simplify_gen_subreg (V16QImode, operands[1], XOmode, regoff); emit_move_insn (operands[0], src); DONE; })