> On Nov 20, 2023, at 17:45, Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov > <maxim.kuvyr...@linaro.org> wrote: >> >>> On Nov 20, 2023, at 17:09, Richard Biener <richard.guent...@gmail.com> >>> wrote: >>> >>> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov >>> <maxim.kuvyr...@linaro.org> wrote: >>>> >>>> This patch avoids sched-deps.cc:find_inc() creating exponential number >>>> of dependencies, which become memory and compilation time hogs. >>>> Consider example (simplified from PR96388) ... >>>> === >>>> sp=sp-4 // sp_insnA >>>> mem_insnA1[sp+A1] >>>> ... >>>> mem_insnAN[sp+AN] >>>> sp=sp-4 // sp_insnB >>>> mem_insnB1[sp+B1] >>>> ... >>>> mem_insnBM[sp+BM] >>>> === >>>> ... in this example find_modifiable_mems() will arrange for mem_insnA* >>>> to be able to pass sp_insnA, and, while doing this, will create >>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >>>> is a consumer of sp_insnA. After this sp_insnB will have N new >>>> backward dependencies. >>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >>>> N new dependencies for _every_ mem_insnB*. This gets us N*M new >>>> dependencies. >>>> >>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of >>>> 30GB and compilation time of 30 minutes, with sched2 accounting for >>>> 95% of both metrics. After this patch the RAM usage is down to 1GB >>>> and compilation time is down to 3-4 minutes, with sched2 no longer >>>> standing out on -ftime-report or memory usage. >>>> >>>> gcc/ChangeLog: >>>> >>>> PR rtl-optimization/96388 >>>> PR rtl-optimization/111554 >>>> * sched-deps.cc (find_inc): Avoid exponential behavior. >>>> --- >>>> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++---- >>>> 1 file changed, 41 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc >>>> index c23218890f3..397bb9fd462 100644 >>>> --- a/gcc/sched-deps.cc >>>> +++ b/gcc/sched-deps.cc >>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, >>>> rtx_insn *insn, bool before_mem) >>>> /* Once a suitable mem reference has been found and the corresponding data >>>> in MII has been filled in, this function is called to find a suitable >>>> add or inc insn involving the register we found in the memory >>>> - reference. */ >>>> + reference. >>>> + If successful, this function will create additional dependencies >>>> between >>>> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if >>>> backwards) >>>> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if >>>> !backwards). >>>> +*/ >>>> >>>> static bool >>>> find_inc (struct mem_inc_info *mii, bool backwards) >>>> { >>>> sd_iterator_def sd_it; >>>> dep_t dep; >>>> + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : >>>> SD_LIST_FORW; >>>> + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); >>>> >>>> - sd_it = sd_iterator_start (mii->mem_insn, >>>> - backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); >>>> + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); >>>> while (sd_iterator_cond (&sd_it, &dep)) >>>> { >>>> dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); >>>> rtx_insn *pro = DEP_PRO (dep); >>>> rtx_insn *con = DEP_CON (dep); >>>> - rtx_insn *inc_cand = backwards ? pro : con; >>>> + rtx_insn *inc_cand; >>>> + int n_inc_deps; >>>> + >>>> + if (backwards) >>>> + { >>>> + inc_cand = pro; >>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); >>>> + } >>>> + else >>>> + { >>>> + inc_cand = con; >>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); >>>> + } >>>> + >>>> + /* In the FOR_EACH_DEP loop below we will create additional >>>> n_inc_deps >>>> + for mem_insn. This by itself is not a problem, since each >>>> mem_insn >>>> + will have only a few inc_insns associated with it. However, if >>>> + we consider that a single inc_insn may have a lot of mem_insns, >>>> AND, >>>> + on top of that, a few other inc_insns associated with it -- >>>> + those _other inc_insns_ will get (n_mem_deps * number of MEM >>>> insns) >>>> + dependencies created for them. This may cause an exponential >>>> + growth of memory usage and scheduling time. >>>> + See PR96388 for details. >>>> + We [heuristically] use n_inc_deps as a proxy for the number of MEM >>>> + insns, and drop opportunities for breaking modifiable_mem >>>> dependencies >>>> + when dependency lists grow beyond reasonable size. */ >>>> + if (n_mem_deps * n_inc_deps >>>> + >= param_max_pending_list_length * param_max_pending_list_length) >>>> + goto next; >>>> + >>>> if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) >>> >>> it looks like this check is a lot cheaper than computing sd_lists_size so >>> can we keep that first? sd_lists_size might be even more expensive >>> than parse_add_or_inc, >> >> sd_lists_size() is cheap, it doesn't walk or count dependencies; it just >> adds 2-3 integers together. >> >> The reason why sd_lists_size() has a loop is to be able to transparently >> handle dependencies split among sub-lists, e.g., >> sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total >> number of backward dependencies. > > I see. It still better to move the DEP_NONREG/DEP_MULTIPLE checks, > they do not depend on any > of the code above it, no?
Yes. Adjusted in v2. Thanks, -- Maxim Kuvyrkov https://www.linaro.org