This patch is my attempt to address the compile-time hog issue
in PR rtl-optimization/110587.  Richard Biener's analysis shows that
compilation of pr28071.c with -O0 currently spends ~70% in timer
"LRA non-specific" due to return_regno_p failing to filter a large
number of calls to regno_in_use_p, resulting in quadratic behaviour.

For this pathological test case, things can be improved significantly.
Although the return register (%rax) is indeed mentioned a large
number of times in this function, due to inlining, the inlined functions
access the returned register in TImode, whereas the current function
returns a DImode.  Hence the check to see if we're the last SET of the
return register, which should be followed by a USE, can be improved
by also testing the mode.  Implementation-wise, rather than pass an
additional mode parameter to LRA's local return_regno_p function, which
only has a single caller, it's more convenient to pass the rtx REG_P,
and from this extract both the REGNO and the mode in the callee, and
rename this function to return_reg_p.

The good news is that with this change "LRA non-specific" drops from
70% to 13%.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2023-07-22  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        PR middle-end/28071
        PR rtl-optimization/110587
        * lra-spills.cc (return_regno_p): Change argument and rename to...
        (return_reg_p): Check if the given register RTX has the same
        REGNO and machine mode as the function's return value.
        (lra_final_code_change): Update call to return_reg_p.


Thanks in advance,
Roger
--

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e..ae147ad 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,10 +705,10 @@ alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
-/* Return true if REGNO is used for return in the current
-   function.  */
+/* Return true if register REG, known to be REG_P, is used for return
+   in the current function.  */
 static bool
-return_regno_p (unsigned int regno)
+return_reg_p (rtx reg)
 {
   rtx outgoing = crtl->return_rtx;
 
@@ -716,7 +716,8 @@ return_regno_p (unsigned int regno)
     return false;
 
   if (REG_P (outgoing))
-    return REGNO (outgoing) == regno;
+    return REGNO (outgoing) == REGNO (reg)
+          && GET_MODE (outgoing) == GET_MODE (reg);
   else if (GET_CODE (outgoing) == PARALLEL)
     {
       int i;
@@ -725,7 +726,9 @@ return_regno_p (unsigned int regno)
        {
          rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
 
-         if (REG_P (x) && REGNO (x) == regno)
+         if (REG_P (x)
+             && REGNO (x) == REGNO (reg)
+             && GET_MODE (x) == GET_MODE (reg))
            return true;
        }
     }
@@ -821,7 +824,7 @@ lra_final_code_change (void)
          if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
              && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
              && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
-             && (! return_regno_p (REGNO (SET_SRC (pat)))
+             && (! return_reg_p (SET_SRC (pat))
                  || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)))))
            {
              lra_invalidate_insn_data (insn);

Reply via email to