This patch is my attempt to address the compile-time hog issue in PR rtl-optimization/110587. Richard Biener's analysis shows that compilation of pr28071.c with -O0 currently spends ~70% in timer "LRA non-specific" due to return_regno_p failing to filter a large number of calls to regno_in_use_p, resulting in quadratic behaviour.
For this pathological test case, things can be improved significantly. Although the return register (%rax) is indeed mentioned a large number of times in this function, due to inlining, the inlined functions access the returned register in TImode, whereas the current function returns a DImode. Hence the check to see if we're the last SET of the return register, which should be followed by a USE, can be improved by also testing the mode. Implementation-wise, rather than pass an additional mode parameter to LRA's local return_regno_p function, which only has a single caller, it's more convenient to pass the rtx REG_P, and from this extract both the REGNO and the mode in the callee, and rename this function to return_reg_p. The good news is that with this change "LRA non-specific" drops from 70% to 13%. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, with no new failures. Ok for mainline? 2023-07-22 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * lra-spills.cc (return_regno_p): Change argument and rename to... (return_reg_p): Check if the given register RTX has the same REGNO and machine mode as the function's return value. (lra_final_code_change): Update call to return_reg_p. Thanks in advance, Roger --
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index 3a7bb7e..ae147ad 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -705,10 +705,10 @@ alter_subregs (rtx *loc, bool final_p) return res; } -/* Return true if REGNO is used for return in the current - function. */ +/* Return true if register REG, known to be REG_P, is used for return + in the current function. */ static bool -return_regno_p (unsigned int regno) +return_reg_p (rtx reg) { rtx outgoing = crtl->return_rtx; @@ -716,7 +716,8 @@ return_regno_p (unsigned int regno) return false; if (REG_P (outgoing)) - return REGNO (outgoing) == regno; + return REGNO (outgoing) == REGNO (reg) + && GET_MODE (outgoing) == GET_MODE (reg); else if (GET_CODE (outgoing) == PARALLEL) { int i; @@ -725,7 +726,9 @@ return_regno_p (unsigned int regno) { rtx x = XEXP (XVECEXP (outgoing, 0, i), 0); - if (REG_P (x) && REGNO (x) == regno) + if (REG_P (x) + && REGNO (x) == REGNO (reg) + && GET_MODE (x) == GET_MODE (reg)) return true; } } @@ -821,7 +824,7 @@ lra_final_code_change (void) if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat)) && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat)) - && (! return_regno_p (REGNO (SET_SRC (pat))) + && (! return_reg_p (SET_SRC (pat)) || ! regno_in_use_p (insn, REGNO (SET_SRC (pat))))) { lra_invalidate_insn_data (insn);