> On Nov 20, 2023, at 17:45, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov
> <maxim.kuvyr...@linaro.org> wrote:
>> 
>>> On Nov 20, 2023, at 17:09, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>> 
>>> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
>>> <maxim.kuvyr...@linaro.org> wrote:
>>>> 
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> ===
>>>> sp=sp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=sp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> ===
>>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>>>> 
>>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
>>>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>>>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>>>> and compilation time is down to 3-4 minutes, with sched2 no longer
>>>> standing out on -ftime-report or memory usage.
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>>       PR rtl-optimization/96388
>>>>       PR rtl-optimization/111554
>>>>       * sched-deps.cc (find_inc): Avoid exponential behavior.
>>>> ---
>>>> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
>>>> 1 file changed, 41 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>>>> index c23218890f3..397bb9fd462 100644
>>>> --- a/gcc/sched-deps.cc
>>>> +++ b/gcc/sched-deps.cc
>>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, 
>>>> rtx_insn *insn, bool before_mem)
>>>> /* Once a suitable mem reference has been found and the corresponding data
>>>>   in MII has been filled in, this function is called to find a suitable
>>>>   add or inc insn involving the register we found in the memory
>>>> -   reference.  */
>>>> +   reference.
>>>> +   If successful, this function will create additional dependencies 
>>>> between
>>>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if 
>>>> backwards)
>>>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if 
>>>> !backwards).
>>>> +*/
>>>> 
>>>> static bool
>>>> find_inc (struct mem_inc_info *mii, bool backwards)
>>>> {
>>>>  sd_iterator_def sd_it;
>>>>  dep_t dep;
>>>> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : 
>>>> SD_LIST_FORW;
>>>> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>>>> 
>>>> -  sd_it = sd_iterator_start (mii->mem_insn,
>>>> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
>>>> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>>>>  while (sd_iterator_cond (&sd_it, &dep))
>>>>    {
>>>>      dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>>>>      rtx_insn *pro = DEP_PRO (dep);
>>>>      rtx_insn *con = DEP_CON (dep);
>>>> -      rtx_insn *inc_cand = backwards ? pro : con;
>>>> +      rtx_insn *inc_cand;
>>>> +      int n_inc_deps;
>>>> +
>>>> +      if (backwards)
>>>> +       {
>>>> +         inc_cand = pro;
>>>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
>>>> +       }
>>>> +      else
>>>> +       {
>>>> +         inc_cand = con;
>>>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
>>>> +       }
>>>> +
>>>> +      /* In the FOR_EACH_DEP loop below we will create additional 
>>>> n_inc_deps
>>>> +        for mem_insn.  This by itself is not a problem, since each 
>>>> mem_insn
>>>> +        will have only a few inc_insns associated with it.  However, if
>>>> +        we consider that a single inc_insn may have a lot of mem_insns, 
>>>> AND,
>>>> +        on top of that, a few other inc_insns associated with it --
>>>> +        those _other inc_insns_ will get (n_mem_deps * number of MEM 
>>>> insns)
>>>> +        dependencies created for them.  This may cause an exponential
>>>> +        growth of memory usage and scheduling time.
>>>> +        See PR96388 for details.
>>>> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
>>>> +        insns, and drop opportunities for breaking modifiable_mem 
>>>> dependencies
>>>> +        when dependency lists grow beyond reasonable size.  */
>>>> +      if (n_mem_deps * n_inc_deps
>>>> +         >= param_max_pending_list_length * param_max_pending_list_length)
>>>> +       goto next;
>>>> +
>>>>      if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
>>> 
>>> it looks like this check is a lot cheaper than computing sd_lists_size so
>>> can we keep that first?  sd_lists_size might be even more expensive
>>> than parse_add_or_inc,
>> 
>> sd_lists_size() is cheap, it doesn't walk or count dependencies; it just 
>> adds 2-3 integers together.
>> 
>> The reason why sd_lists_size() has a loop is to be able to transparently 
>> handle dependencies split among sub-lists, e.g., 
>> sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total 
>> number of backward dependencies.
> 
> I see.  It still better to move the DEP_NONREG/DEP_MULTIPLE checks,
> they do not depend on any
> of the code above it, no?
Yes.  Adjusted in v2.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

Reply via email to