On Wed, May 12, 2021 at 11:03 AM Andrew Pinski <pins...@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 1:19 AM Richard Biener
> <richard.guent...@gmail.com> wrote:
> >
> > On Tue, May 11, 2021 at 5:01 PM Segher Boessenkool
> > <seg...@kernel.crashing.org> wrote:
> > >
> > > On Tue, May 04, 2021 at 10:40:38AM +0200, Richard Biener via Gcc wrote:
> > > > On Mon, May 3, 2021 at 11:10 PM Andrew Pinski via Gcc <gcc@gcc.gnu.org> 
> > > > wrote:
> > > > >   I noticed my (highly, -j24) parallel build of GCC is serialized on
> > > > > compiling gimple-match.c.  Has anyone looked into splitting this
> > > > > generated file into multiple files?
> > > >
> > > > There were threads about this in the past, yes.  There's the
> > > > possibility to use LTO for this as well (also mentioned in those
> > > > threads).  Note it's not easy to split in a meaningful way in
> > > > genmatch.c
> > >
> > > But it will have to be handled at least somewhat soon: on not huge
> > > parallelism (-j120 for example) building *-match.c takes longer than
> > > building everything else in gcc/ together (wallclock time), and it is a
> > > huge part of regstrap time (bigger than running all of the testsuite!)
> >
> > I would classify -j120 as "huge parallelism" ;)  Testing time still
> > dominates my builds (with -j24) where bootstrap takes ~20 mins
> > but testing another 40.
>
> For me, it is around 1 hour bootstrapping and 1 hour testing.
>
> > Is it building stage2 gimple-match.c that you are worried about?
> > (it's built using the -O0 compiled stage1 compiler - but we at
> > least should end up using -fno-checking for this build)
>
> Yes.  It takes on the machine I was using 15 minutes to compile
> gimple-match.c, dominating the whole time for bootstrapping.
> Everything else was done in 1-3 minutes max even.
> This is on an aarch64 machine with 24 cores (not threads).

I'm usually using STAGE1_CFLAGS="-O2" to speed up the
"useless" part of the bootstrap cycle...

> Thanks,
> Andrew Pinski
>
> >
> > Maybe you can do some experiments - like add
> > -fno-inline-functions-called-once and change
> > genmatch.c:3766 to split out single uses as well
> > (should decrease function sizes).
> >
> > There's the option to make all functions external in
> > gimple-match.c so splitting the file at arbitrary points
> > will be possible (directly from genmatch), we'll need
> > some internal header with all declarations then
> > as well or alternatively some clever logic in
> > genmatch to only externalize functions needed from
> > mutliple split files.
> >
> > That said - ideas to reduce the size of the generated
> > code are welcome as well.
> >
> > There's also pattern ordering in match.pd that can
> > make a difference because we're honoring
> > first-match and thus have to re-start matching from
> > outermost on conflicts (most of the time the actual
> > oder in match.pd is just random).  If you add -v
> > to genmatch then you'll see
> >
> > /home/rguenther/src/gcc3/gcc/match.pd:6092:10 warning: failed to merge
> > decision tree node
> >    (cmp (op@3 @0 INTEGER_CST@1) INTEGER_CST@2)
> >          ^
> > /home/rguenther/src/gcc3/gcc/match.pd:4263:11 warning: with the following
> >     (cmp (op @0 REAL_CST@1) REAL_CST@2)
> >           ^
> > /home/rguenther/src/gcc3/gcc/match.pd:5164:6 warning: because of the
> > following which serves as ordering barrier
> >  (eq @0 integer_onep)
> >      ^
> >
> > that means that the simple (eq @0 integer_onep) should match after
> > 4263 but before 6092
> > (only the latter will actually match the same - the former has
> > REAL_CST@2 but 5164
> > uses a predicate integer_onep).  This causes us to emit three switch
> > (code){ case EQ_EXPR: }
> > instead of one.
> >
> > There might be legitimate cases of such order constraints but most of them
> > are spurious.  "Fixing" them will also make the matching process faster, but
> > it's quite some legwork where moving a pattern can fix one occurance but
> > result in new others.
> >
> > For me building stage3 gimple-match.o (on a fully loaded system.. :/) is
> >
> > 95.05user 0.42system 1:35.51elapsed 99%CPU (0avgtext+0avgdata
> > 929400maxresident)k
> > 0inputs+0outputs (0major+393349minor)pagefaults 0swaps
> >
> > and when I use -Wno-error -flto=24 -flinker-output=nolto-rel -r
> >
> > 139.95user 1.79system 0:25.92elapsed 546%CPU (0avgtext+0avgdata
> > 538852maxresident)k
> > 0inputs+0outputs (0major+1139679minor)pagefaults 0swaps
> >
> > the issue of course is that we can't use this for the stage1 build
> > (unless we detect working
> > GCC LTO in the host compiler setup).  I suppose those measures show the 
> > lower
> > bound of what should be possible with splitting up the file (LTO
> > splits to 128 pieces),
> > so for me it's a 4x speedup in wallclock time despite the overhead of
> > LTO which is
> > quite noticable.  -fno-checking also makes a dramatic difference for me.
> >
> > Richard.
> >
> > >
> > > Segher

Reply via email to