https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Martin Jambor <jamb...@gcc.gnu.org>:

https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363

commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
Author: Martin Jambor <mjam...@suse.cz>
Date:   Fri Nov 24 17:32:35 2023 +0100

    sra: SRA of non-escaped aggregates passed by reference to calls

    PR109849 shows that a loop that heavily pushes and pops from a stack
    implemented by a C++ std::vec results in slow code, mainly because the
    vector structure is not split by SRA and so we end up in many loads
    and stores into it.  This is because it is passed by reference
    to (re)allocation methods and so needs to live in memory, even though
    it does not escape from them and so we could SRA it if we
    re-constructed it before the call and then separated it to distinct
    replacements afterwards.

    This patch does exactly that, first relaxing the selection of
    candidates to also include those which are addressable but do not
    escape and then adding code to deal with the calls.  The
    micro-benchmark that is also the (scan-dump) testcase in this patch
    runs twice as fast with it than with current trunk.  Honza measured
    its effect on the libjxl benchmark and it almost closes the
    performance gap between Clang and GCC while not requiring excessive
    inlining and thus code growth.

    The patch disallows creation of replacements for such aggregates which
    are also accessed with a precision smaller than their size because I
    have observed that this led to excessive zero-extending of data
    leading to slow-downs of perlbench (on some CPUs).  Apart from this
    case I have not noticed any regressions, at least not so far.

    Gimple call argument flags can tell if an argument is unused (and then
    we do not need to generate any statements for it) or if it is not
    written to and then we do not need to generate statements loading
    replacements from the original aggregate after the call statement.
    Unfortunately, we cannot symmetrically use flags that an aggregate is
    not read because to avoid re-constructing the aggregate before the
    call because flags don't tell which what parts of aggregates were not
    written to, so we load all replacements, and so all need to have the
    correct value before the call.

    This version of the patch also takes care to avoid attempts to modify
    abnormal edges, something which was missing in the previosu version.

    gcc/ChangeLog:

    2023-11-23  Martin Jambor  <mjam...@suse.cz>

            PR middle-end/109849
            * tree-sra.cc (passed_by_ref_in_call): New.
            (sra_initialize): Allocate passed_by_ref_in_call.
            (sra_deinitialize): Free passed_by_ref_in_call.
            (create_access): Add decl pool candidates only if they are not
            already candidates.
            (build_access_from_expr_1): Bail out on ADDR_EXPRs.
            (build_access_from_call_arg): New function.
            (asm_visit_addr): Rename to scan_visit_addr, change the
            disqualification dump message.
            (scan_function): Check taken addresses for all non-call statements,
            including phi nodes.  Process all call arguments, including the
static
            chain, build_access_from_call_arg.
            (maybe_add_sra_candidate): Relax need_to_live_in_memory check to
allow
            non-escaped local variables.
            (sort_and_splice_var_accesses): Disallow smaller-than-precision
            replacements for aggregates passed by reference to functions.
            (sra_modify_expr): Use a separate stmt iterator for adding
satements
            before the processed statement and after it.
            (enum out_edge_check): New type.
            (abnormal_edge_after_stmt_p): New function.
            (sra_modify_call_arg): New function.
            (sra_modify_assign): Adjust calls to sra_modify_expr.
            (sra_modify_function_body): Likewise, use sra_modify_call_arg to
            process call arguments, including the static chain.

    gcc/testsuite/ChangeLog:

    2023-11-23  Martin Jambor  <mjam...@suse.cz>

            PR middle-end/109849
            * g++.dg/tree-ssa/pr109849.C: New test.
            * g++.dg/tree-ssa/sra-eh-1.C: Likewise.
            * gcc.dg/tree-ssa/pr109849.c: Likewise.
            * gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
            * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.

Reply via email to