On Mon, 6 May 2019, Martin Liška wrote: > On 5/2/19 3:18 PM, Richard Biener wrote: > > On Mon, 29 Apr 2019, Martin Liška wrote: > > > >> On 9/10/18 1:43 PM, Martin Liška wrote: > >>> On 09/04/2018 05:07 PM, Martin Liška wrote: > >>>> - in order to achieve real speed up we need to split also other > >>>> generated (and also dwarf2out.c, i386.c, ..) files: > >>>> here I'm most concerned about insn-recog.c, which can't be split the > >>>> same way without ending up with a single huge SCC component. > >>> > >>> About the insn-recog.c file: all functions are static and using SCC one > >>> ends > >>> up with all functions in one component. In order to split the callgraph > >>> one > >>> needs to promote some functions to be extern and then split would be > >>> possible. > >>> In order to do that we'll probably need to teach splitter how to do > >>> partitioning > >>> based on minimal number of edges to be removed. > >>> > >>> I need to inspire in lto_balanced_map, or is there some simple algorithm > >>> I can start with? > >>> > >>> Martin > >>> > >> > >> I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c > >> file. > >> As mentioned, I'm seeking for a way how to split the generated file. Or how > >> to learn the generator to process a reasonable splitting. > > > > Somewhen earlier this year I've done the experiment with using > > a compile with -flto -fno-fat-lto-objects > > -fno-fat-lto-objects is default, isn't it?
Where linker plugin support is detected, yes. > > and a link > > via -flto -r -flinker-output=rel into the object file. This cut > > compile-time more than in half with less maintainance overhead. > > Can you please provide exact command line how to do that? gcc t.c -o t.o -flto=8 -r -flinker-output=nolto-rel there's an annoying warning: cc1plus: warning: command line option ‘-flinker-output=nolto-rel’ is valid for LTO but not for C++ which can be avoided by splitting the above into a compile and a separate LTO "link" step. Using -Wl,-flinker-.... doesn't work unfortunately (ld doesn't understand it). Using installed GCC 9.1 compiling trunk gimple-match.c with -O2 -g takes 58.7s while with the LTO trick it takes 23.3s (combined CPU time is up to 96s). That was with -flto=8 on a CPU with 4 physical and 8 logical cores. As it includes -g it includes the debug copy dance as well. > bloaty gimple-match.o -- gimple-match.o.nolto VM SIZE FILE SIZE ++++++++++++++ GROWING ++++++++++++++ [ = ] 0 .rela.debug_info +3.62Mi +45% [ = ] 0 .rela.debug_ranges +161Ki +1.8% [ = ] 0 .debug_str +95.8Ki +19% [ = ] 0 .rela.text +77.6Ki +10% [ = ] 0 .debug_ranges +58.9Ki +1.7% [ = ] 0 .symtab +22.9Ki +68% [ = ] 0 .debug_abbrev +21.1Ki +394% [ = ] 0 .strtab +11.4Ki +9.5% +8.1% +5.34Ki .eh_frame +5.34Ki +8.1% +84% +4.09Ki .rodata.str1.8 +4.09Ki +84% [ = ] 0 .rela.text.unlikely +3.87Ki +1.0% [ = ] 0 .rela.debug_aranges +3.68Ki +872% [ = ] 0 .debug_aranges +3.02Ki +10e2% +42% +2.59Ki .rodata.str1.1 +2.59Ki +42% +0.2% +2.41Ki [Other] +2.45Ki +0.2% [ = ] 0 .rela.debug_line +2.09Ki +16% [ = ] 0 .rela.eh_frame +1.17Ki +4.3% [NEW] +1.09Ki .rodata._Z7get_defPFP9tree_nodeS0_ES0_.str1.8 +1.09Ki [NEW] [ = ] 0 .shstrtab +784 +44% [ = ] 0 [ELF Headers] +768 +16% [ = ] 0 .comment +666 +37e2% -------------- SHRINKING -------------- [ = ] 0 .debug_line -256Ki -17.3% [ = ] 0 .rela.debug_loc -73.6Ki -0.6% [ = ] 0 .debug_info -63.4Ki -1.6% [ = ] 0 .debug_loc -39.3Ki -0.6% +1.1% +15.5Ki TOTAL +3.67Mi +7.8% .debug_line probably shrinks because we drop columns with LTO. Richard.