On Tue, May 11, 2021 at 5:01 PM Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > On Tue, May 04, 2021 at 10:40:38AM +0200, Richard Biener via Gcc wrote: > > On Mon, May 3, 2021 at 11:10 PM Andrew Pinski via Gcc <gcc@gcc.gnu.org> > > wrote: > > > I noticed my (highly, -j24) parallel build of GCC is serialized on > > > compiling gimple-match.c. Has anyone looked into splitting this > > > generated file into multiple files? > > > > There were threads about this in the past, yes. There's the > > possibility to use LTO for this as well (also mentioned in those > > threads). Note it's not easy to split in a meaningful way in > > genmatch.c > > But it will have to be handled at least somewhat soon: on not huge > parallelism (-j120 for example) building *-match.c takes longer than > building everything else in gcc/ together (wallclock time), and it is a > huge part of regstrap time (bigger than running all of the testsuite!)
I would classify -j120 as "huge parallelism" ;) Testing time still dominates my builds (with -j24) where bootstrap takes ~20 mins but testing another 40. Is it building stage2 gimple-match.c that you are worried about? (it's built using the -O0 compiled stage1 compiler - but we at least should end up using -fno-checking for this build) Maybe you can do some experiments - like add -fno-inline-functions-called-once and change genmatch.c:3766 to split out single uses as well (should decrease function sizes). There's the option to make all functions external in gimple-match.c so splitting the file at arbitrary points will be possible (directly from genmatch), we'll need some internal header with all declarations then as well or alternatively some clever logic in genmatch to only externalize functions needed from mutliple split files. That said - ideas to reduce the size of the generated code are welcome as well. There's also pattern ordering in match.pd that can make a difference because we're honoring first-match and thus have to re-start matching from outermost on conflicts (most of the time the actual oder in match.pd is just random). If you add -v to genmatch then you'll see /home/rguenther/src/gcc3/gcc/match.pd:6092:10 warning: failed to merge decision tree node (cmp (op@3 @0 INTEGER_CST@1) INTEGER_CST@2) ^ /home/rguenther/src/gcc3/gcc/match.pd:4263:11 warning: with the following (cmp (op @0 REAL_CST@1) REAL_CST@2) ^ /home/rguenther/src/gcc3/gcc/match.pd:5164:6 warning: because of the following which serves as ordering barrier (eq @0 integer_onep) ^ that means that the simple (eq @0 integer_onep) should match after 4263 but before 6092 (only the latter will actually match the same - the former has REAL_CST@2 but 5164 uses a predicate integer_onep). This causes us to emit three switch (code){ case EQ_EXPR: } instead of one. There might be legitimate cases of such order constraints but most of them are spurious. "Fixing" them will also make the matching process faster, but it's quite some legwork where moving a pattern can fix one occurance but result in new others. For me building stage3 gimple-match.o (on a fully loaded system.. :/) is 95.05user 0.42system 1:35.51elapsed 99%CPU (0avgtext+0avgdata 929400maxresident)k 0inputs+0outputs (0major+393349minor)pagefaults 0swaps and when I use -Wno-error -flto=24 -flinker-output=nolto-rel -r 139.95user 1.79system 0:25.92elapsed 546%CPU (0avgtext+0avgdata 538852maxresident)k 0inputs+0outputs (0major+1139679minor)pagefaults 0swaps the issue of course is that we can't use this for the stage1 build (unless we detect working GCC LTO in the host compiler setup). I suppose those measures show the lower bound of what should be possible with splitting up the file (LTO splits to 128 pieces), so for me it's a 4x speedup in wallclock time despite the overhead of LTO which is quite noticable. -fno-checking also makes a dramatic difference for me. Richard. > > Segher