https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863

--- Comment #47 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:abe07a74bb7a2692eff2af151ca54e749ed5eba6

commit r11-7246-gabe07a74bb7a2692eff2af151ca54e749ed5eba6
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Feb 15 15:05:22 2021 +0000

    rtl-ssa: Reduce the amount of temporary memory needed [PR98863]

    The rtl-ssa code uses an on-the-side IL and needs to build that IL
    for each block and RTL insn.  I'd originally not used the classical
    dominance frontier method for placing phis on the basis that it seemed
    like more work in this context: we're having to visit everything in
    an RPO walk anyway, so for non-backedge cases we can tell immediately
    whether a phi node is needed.  We then speculatively created phis for
    registers that are live across backedges and simplified them later.
    This avoided having to walk most of the IL twice (once to build the
    initial IL, and once to link uses to phis).

    However, as shown in PR98863, this leads to excessive temporary
    memory in extreme cases, since we had to record the value of
    every live register on exit from every block.  In that PR,
    there were many registers that were live (but unused) across
    a large region of code.

    This patch does use the classical approach to placing phis, but tries
    to use the existing DF defs information to avoid two walks of the IL.
    We still use the previous approach for memory, since there is no
    up-front information to indicate whether a block defines memory or not.
    However, since memory is just treated as a single unified thing
    (like for gimple vops), memory doesn't suffer from the same
    scalability problems as registers.

    With this change, fwprop no longer seems to be a memory-hog outlier
    in the PR: the maximum RSS is similar with and without fwprop.

    The PR also shows the problems inherent in using bitmap operations
    involving the live-in and live-out sets, which in the testcase are
    very large.  I've therefore tried to reduce those operations to the
    bare minimum.

    The patch also includes other compile-time optimisations motivated
    by the PR; see the changelog for details.

    I tried adding:

        for (int i = 0; i < 200; ++i)
          {
            crtl->ssa = new rtl_ssa::function_info (cfun);
            delete crtl->ssa;
          }

    to fwprop.c to stress the code.  fwprop then took 35% of the compile
    time for the problematic partition in the PR (measured on a release
    build).  fwprop takes less than .5% of the compile time when running
    normally.

    The command:

      git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \
        patch -p1 -R

    still gives a working compiler that uses the old fwprop.c.  The compile
    time with that version is very similar.

    For a more reasonable testcase like optabs.ii at -O, I saw a 6.7%
    compile time regression with the loop above added (i.e. creating
    the info 201 times per pass instead of once per pass).  That goes
    down to 4.8% with -O -g.  I can't measure a significant difference
    with a normal compiler (no 200-iteration loop).

    So I think that (as expected) the patch does make things a bit
    slower in the normal case.  But like Richi says, peak memory usage
    is harder for users to work around than slighter slower compile times.

    gcc/
            PR rtl-optimization/98863
            * rtl-ssa/functions.h (function_info::bb_live_out_info): Delete.
            (function_info::build_info): Turn into a declaration, moving the
            definition to internals.h.
            (function_info::bb_walker): Declare.
            (function_info::create_reg_use): Likewise.
            (function_info::calculate_potential_phi_regs): Take a build_info
            parameter.
            (function_info::place_phis, function_info::create_ebbs): Declare.
            (function_info::calculate_ebb_live_in_for_debug): Likewise.
            (function_info::populate_backedge_phis): Delete.
            (function_info::start_block, function_info::end_block): Declare.
            (function_info::populate_phi_inputs): Delete.
            (function_info::m_potential_phi_regs): Move information to
build_info.
            * rtl-ssa/internals.h: New file.
            (function_info::bb_phi_info): New class.
            (function_info::build_info): Moved from functions.h.
            Add a constructor and destructor.
            (function_info::build_info::ebb_use): Delete.
            (function_info::build_info::ebb_def): Likewise.
            (function_info::build_info::bb_live_out): Likewise.
            (function_info::build_info::tmp_ebb_live_in_for_debug): New
variable.
            (function_info::build_info::potential_phi_regs): Likewise.
            (function_info::build_info::potential_phi_regs_for_debug):
Likewise.
            (function_info::build_info::ebb_def_regs): Likewise.
            (function_info::build_info::bb_phis): Likewise.
            (function_info::build_info::bb_mem_live_out): Likewise.
            (function_info::build_info::bb_to_rpo): Likewise.
            (function_info::build_info::def_stack): Likewise.
            (function_info::build_info::old_def_stack_limit): Likewise.
            * rtl-ssa/internals.inl
(function_info::build_info::record_reg_def):
            Remove the regno argument.  Push the previous definition onto the
            definition stack where necessary.
            * rtl-ssa/accesses.cc: Include internals.h.
            * rtl-ssa/changes.cc: Likewise.
            * rtl-ssa/blocks.cc: Likewise.
            (function_info::build_info::build_info): Define.
            (function_info::build_info::~build_info): Likewise.
            (function_info::bb_walker): New class.
            (function_info::bb_walker::bb_walker): Define.
            (function_info::add_live_out_use): Convert a logarithmic-complexity
            test into a linear one.  Allow the same definition to be passed
            multiple times.
            (function_info::calculate_potential_phi_regs): Moved from
            functions.cc.  Take a build_info parameter and store the
            information there instead.
            (function_info::place_phis): New function.
            (function_info::add_entry_block_defs): Update call to
record_reg_def.
            (function_info::calculate_ebb_live_in_for_debug): New function.
            (function_info::add_phi_nodes): Use bb_phis to decide which
            registers need phi nodes and initialize ebb_def_regs accordingly.
            Do not add degenerate phis here.
            (function_info::add_artificial_accesses): Use create_reg_use.
            Assert that all definitions are listed in the DF LR sets.
            Update call to record_reg_def.
            (function_info::record_block_live_out): Record live-out register
            values in the phis of successor blocks.  Use the live-out set
            when processing the last block in an EBB, instead of always
            using the live-in sets of successor blocks.  AND the live sets
            with the set of registers that have been defined in the EBB,
            rather than with all potential phi registers.  Cope correctly
            with branches back to the start of the current EBB.
            (function_info::start_block): New function.
            (function_info::end_block): Likewise.
            (function_info::populate_phi_inputs): Likewise.
            (function_info::create_ebbs): Likewise.
            (function_info::process_all_blocks): Rewrite into a multi-phase
            process.
            * rtl-ssa/functions.cc: Include internals.h.
            (function_info::calculate_potential_phi_regs): Move to blocks.cc.
            (function_info::init_function_data): Remove caller.
            * rtl-ssa/insns.cc: Include internals.h
            (function_info::create_reg_use): New function.  Lazily any
            degenerate phis needed by the linear RPO view.
            (function_info::record_use): Use create_reg_use.  When processing
            debug uses, use potential_phi_regs and test it before checking
            whether the register is live on entry to the current EBB.  Lazily
            calculate ebb_live_in_for_debug.
            (function_info::record_call_clobbers): Update call to
record_reg_def.
            (function_info::record_def): Likewise.

Reply via email to