On Mon, 3 Sep 2018, Martin Liška wrote: > On 09/03/2018 04:00 PM, Richard Biener wrote: > > On Mon, 3 Sep 2018, Martin Liška wrote: > > > >> On 09/03/2018 02:54 PM, Martin Liška wrote: > >>> On 09/03/2018 02:41 PM, Richard Biener wrote: > >>>> On Mon, 3 Sep 2018, Martin Liška wrote: > >>>> > >>>>> On 04/25/2018 01:42 PM, Richard Biener wrote: > >>>>>> > >>>>>> The following patch^Whack splits $subject files into three, one > >>>>>> for the predicates (due to an implementation detail) and two for > >>>>>> the rest - for now into similar LOC size files. > >>>>>> > >>>>>> I'd like to get help on the makefile changes to make them less > >>>>>> verbose, somehow globbing the -[12p] parts. > >>>>>> > >>>>>> Also you can see the split point is manually chosen which means > >>>>>> it will bitrot. Timings for the stage2 compiles on a x86_64 > >>>>>> box are > >>>>>> > >>>>>> gimple-match-p.c 5s > >>>>>> generic-match-p.c 3s > >>>>>> gimple-match-1.c 85s > >>>>>> generic-match-1.c 56s > >>>>>> gimple-match-2.c 82s > >>>>>> generic-match-2.c 31s > >>>>>> > >>>>>> the required header files are quite big (and of course everything > >>>>>> needs to be exported without the analysis work becoming too > >>>>>> cumbersome), > >>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for > >>>>>> generic-match-head.h > >>>>>> > >>>>>> The machine I tested is quite fast so the 80ish second timings are > >>>>>> still > >>>>>> too slow I guess and thus splitting up into four files for gimple and > >>>>>> three files for generic looks better. > >>>>>> > >>>>>> Note we lose some inlining/cloning capability in the splitting process > >>>>>> (I see quite a bit of constprop/isra work being done on the generated > >>>>>> files). I didn't try to measure the runtime impact though. > >>>>>> > >>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd > >>>>>> like to get feedback on the approach and I didn't want to spend time > >>>>>> on programatically finding optimal split points (so everything is > >>>>>> output > >>>>>> in the same semi-random order as before). > >>>>>> > >>>>>> Richard. ... > >>>>> I took a look at gimple-match.c and what about doing a split in > >>>>> following way: > >>>>> - all gimple_simplify_$number going into a separate header file (~12000 > >>>>> LOC) > >>>>> - all the function can be marked as static inline > >>>>> - all other gimple_simplify_$code can be split into arbitrary number of > >>>>> parts > >>>>> - we have 287 such functions where each function only calls > >>>>> gimple_simplify_$number and > >>>>> on average there 10 of such calls > >>>>> - that would allow to remove most of gimple_simplify_$number functions > >>>>> from the header file > >>>>> > >>>>> Richi do you think it will be viable? > >>>> > >>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number > >>>> functions from the header fast as they are now effectively duplicated > >>>> into all parts, correct? Also I'm not sure if we actually want to inline > >>>> them... they are split out to get both code size and compile-time > >>>> under control. Unfortunately we have still high max-inline-insns-single > >>>> which is used for inline marked functions. > >>>> > >>>> Eventually doing a "proper" partitioning algorithm is viable, that is, > >>>> partition based on gimple_simplify_$code and put gimple_simplify_$number > >>>> where they are used. If they are used across different codes then > >>>> merge those partitions. I guess you'll see that that'll merge the > >>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR). > >>> > >>> Yes, that should be much better. I'm attaching a 'callgraph' that was > >>> done by grepping. > >>> Function starting at the beginning of a line is function definition, with > >>> an indentation > >>> one can see calls. > >>> > >>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls. > >>> > >>> Well, generating some simple call graph format for the source file and a > >>> source file > >>> annotation of nodes can be input for a partitioning tool that can do the > >>> split. > >>> > >>> Issue with the generated files is that one needs to fix the most > >>> offenders (*-match.c, insn-recog.c, insn-emit.c, ..) > >>> in order to see some improvement. > >>> > >>> Looking at insn-recog.c, maybe similar callgraph-based split can be done > >>> for recog_$number functions? > >>> > >>> Martin > >>> > >>>> > >>>> Richard. > >>>> > >>> > >> > >> I'm sending SCC components for gimple-match.c. So there are 3 quite big > >> one and rest is small. It's questionable > >> whether partitioning based on that will provide desired speed up? > > > > When I experimented split based on # of functions wasn't working well, > > only split based on # of lines did. I'd still expect that eventually > > basing the split on the SCC components makes sense if you use say, > > the biggest 4 (but measure size in # lines) and merge the rest evenly. > > I see! Note that shrinking gimple-match.o 4 times will be probably sufficient > for general > speed up: > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440 > > > > > It would be nice if that all would be scriptable instead of coding it > > into genmatch.c but that's of course possible as well - just add > > some extra "passes" over code-gen as I did in the hac^Wpatch. You > > That would be my plan, genmatch can mark in C comments function that can be > partitioned > and callgraph of these functions. > > > could use graphds.c routines to compute SCCs for example. Knowing > > # lines beforehand is a bit hard though - code-generating into > > a set of character buffers might be possible but I wired everything > > to use FILE ... (and no stringstreams in the C library). > > And no, please do not convert to C++ streams ;)) > > ... and a C++ splitter can do the rest: read content, do SCC, split to N parts > and stream out. > > I can work on that. Questionable is still Makefile integration of such > parallelism?
Well, just hard-code the number of pieces and thus the pices to compile in the end... Of course we shouldn't add any additional build dependencies (a "C++ splitter"). Adding additional annotation to the genmatch generated sources may be OK but then eventually doing everything in genmatch isn't too complicated (putting aside that # of lines metric...). Richard.