https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #10)
> The partially reduced (In reply to Martin Liška from comment #9)
> > Created attachment 48962 [details]
> > Partially reduced test-case
> > 
> > The reduction is quite stuck at this point.
> 
> No longer keys on -fPIC though, so the bisection for this is likely wrong.
> -fno-schedule-insns2 improves it from 18s to 5s compile time and from
> 1.1GB of peak RSS to 320MB.
> 
>  scheduling 2                       :  12.69 ( 71%)   0.10 ( 67%)  12.79 (
> 70%)   11128 kB ( 16%)
> 
> -fmem-report doesn't show anything interesting, looking for heap allocations
> now to find the offender.
> 
> Can you bisect your reduced testcase again?  GCC 8.4 behaves the same for it
> rather than being good but GCC 4.8.5 is fine.

For the testcase most time is spent in constrain_operands and
update_conflict_hard_regno_costs.  It looks like the main issue
is a very large chain of dependences and thus going from
27000 schedule_insn calls to 10 000 000 calls to try_ready
which means the sd_iterator iterates over many dependent instructions,
not stopping at "common dependences".  That's likely also the source
of the memory use (the dn_pool), though memory reporting with
--enable-gather-detailed-mem-stats doesn't seem to work for this pool?

dep_node                        sched-deps.c:4107 (sched_deps_init)            
     1         0 :  0.0%        0         0 :  0.0%          80
deps_list                       sched-deps.c:4105 (sched_deps_init)            
     1         0 :  0.0%     2179k      136k:  0.9%          16

There's also 10 million dep_replacement nodes which are all allocated
via XCNEW ... another object_allocator would be more efficient here
I guess.  Could it be that sched-deps makes a tree out of a dependence
graph?

CCing the only active haifa scheduler maintainer...

Reply via email to