> Am 27.07.2023 um 19:12 schrieb Roger Sayle <ro...@nextmovesoftware.com>: > > > Hi Richard, > > You're 100% right. It’s possible to significantly clean-up this code, > replacing > the body of the conditional with a call to force_reg and simplifying the > conditions > under which it is called. These improvements are implemented in the patch > below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and > make -k check, both with and without -m32, as usual. > > Interestingly, the CONCAT clause afterwards is still required (I've learned > something > new), as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a > CONCAT > instead of a REG, Heh, interesting. > so although the code looks dead, it's required to build libgcc during > a bootstrap. But the remaining clean-up is good, reducing the number of > source lines > and making the logic easier to understand. > > Ok for mainline? Ok. Thanks, Richard > 2023-07-27 Roger Sayle <ro...@nextmovesoftware.com> > Richard Biener <rguent...@suse.de> > > gcc/ChangeLog > PR middle-end/28071 > PR rtl-optimization/110587 > * expr.cc (emit_group_load_1): Simplify logic for calling > force_reg on ORIG_SRC, to avoid making a copy if the source > is already in a pseudo register. > > Roger > -- > >> -----Original Message----- >> From: Richard Biener <richard.guent...@gmail.com> >> Sent: 25 July 2023 12:50 >> >>> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle <ro...@nextmovesoftware.com> >>> wrote: >>> >>> This patch is the third in series of fixes for PR >>> rtl-optimization/110587, a compile-time regression with -O0, that >>> attempts to address the underlying cause. As noted previously, the >>> pathological test case pr28071.c contains a large number of useless >>> register-to-register moves that can produce quadratic behaviour (in >>> LRA). These move are generated during RTL expansion in >>> emit_group_load_1, where the middle-end attempts to simplify the >>> source before calling extract_bit_field. This is reasonable if the >>> source is a complex expression (from before the tree-ssa optimizers), >>> or a SUBREG, or a hard register, but it's not particularly useful to >>> copy a pseudo register into a new pseudo register. This patch eliminates >>> that >> redundancy. >>> >>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently >>> contains 777K lines, with this patch it contains 717K lines, i.e. >>> saving about 60K lines (admittedly of debugging text output, but it makes >>> the >> point). >>> >>> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >>> and make -k check, both with and without --target_board=unix{-m32} >>> with no new failures. Ok for mainline? >>> >>> As always, I'm happy to revert this change quickly if there's a >>> problem, and investigate why this additional copy might (still) be >>> needed on other >>> non-x86 targets. >> >> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, >> tree type, >> be loaded directly into the destination. */ >> src = orig_src; >> if (!MEM_P (orig_src) >> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) >> && (!CONSTANT_P (orig_src) >> || (GET_MODE (orig_src) != mode >> && GET_MODE (orig_src) != VOIDmode))) >> >> so that means the code guarded by the conditional could instead be >> transformed >> to >> >> src = force_reg (mode, orig_src); >> >> ? Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != >> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the >> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) >> == >> VOIDmode ? mode : GET_MODE (orig_src), orig_src)) >> >> Otherwise I think this is OK, as said, using force_reg somehow would improve >> readability here I think. >> >> I also wonder how the >> >> else if (GET_CODE (src) == CONCAT) >> >> case will ever trigger with the current code. >> >> Richard. >> >>> >>> 2023-07-25 Roger Sayle <ro...@nextmovesoftware.com> >>> >>> gcc/ChangeLog >>> PR middle-end/28071 >>> PR rtl-optimization/110587 >>> * expr.cc (emit_group_load_1): Avoid copying a pseudo register into >>> a new pseudo register, i.e. only copy hard regs into a new pseudo. >>> >>> > > <patchhg4.txt>
Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
Richard Biener via Gcc-patches Thu, 27 Jul 2023 12:05:49 -0700
- [PATCH] PR rtl-optimization/110587: Reduce ... Roger Sayle
- Re: [PATCH] PR rtl-optimization/110587... Richard Biener via Gcc-patches
- RE: [PATCH] PR rtl-optimization/11... Roger Sayle
- Re: [PATCH] PR rtl-optimizatio... Richard Biener via Gcc-patches