Hi Vladimir,
Hi Jeff,

Richard and Alexander have reviewed this patch and [I assume] have no
further comments.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org>
wrote:

> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
>
> [For simplicity, let's assume find_inc(backwards==true)].
> In this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/96388
>         PR rtl-optimization/111554
>         * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..005fc0f567e 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii,
> rtx_insn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding data
>     in MII has been filled in, this function is called to find a suitable
>     add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies
> between
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if
> backwards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if
> !backwards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>    sd_iterator_def sd_it;
>    dep_t dep;
> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK :
> SD_LIST_FORW;
> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it = sd_iterator_start (mii->mem_insn,
> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>    while (sd_iterator_cond (&sd_it, &dep))
>      {
>        dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>        rtx_insn *pro = DEP_PRO (dep);
>        rtx_insn *con = DEP_CON (dep);
> -      rtx_insn *inc_cand = backwards ? pro : con;
> +      rtx_insn *inc_cand;
> +      int n_inc_deps;
> +
>        if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
>         goto next;
> +
> +      if (backwards)
> +       {
> +         inc_cand = pro;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> +       }
> +      else
> +       {
> +         inc_cand = con;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> +       }
> +
> +      /* In the FOR_EACH_DEP loop below we will create additional
> n_inc_deps
> +        for mem_insn.  This by itself is not a problem, since each
> mem_insn
> +        will have only a few inc_insns associated with it.  However, if
> +        we consider that a single inc_insn may have a lot of mem_insns,
> AND,
> +        on top of that, a few other inc_insns associated with it --
> +        those _other inc_insns_ will get (n_mem_deps * number of MEM
> insns)
> +        dependencies created for them.  This may cause an exponential
> +        growth of memory usage and scheduling time.
> +        See PR96388 for details.
> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
> +        insns, and drop opportunities for breaking modifiable_mem
> dependencies
> +        when dependency lists grow beyond reasonable size.  */
> +      if (n_mem_deps * n_inc_deps
> +         >= param_max_pending_list_length * param_max_pending_list_length)
> +       goto next;
> +
>        if (parse_add_or_inc (mii, inc_cand, backwards))
>         {
>           struct dep_replacement *desc;
> @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>           desc->insn = mii->mem_insn;
>           move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
>                          INSN_SPEC_BACK_DEPS (con));
> +
> +         /* Make sure that n_inc_deps above is consistent with
> dependencies
> +            we create.  */
> +         gcc_assert (mii->inc_insn == inc_cand);
> +
>           if (backwards)
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

Reply via email to