On 9/4/18 5:07 PM, Martin Liška wrote: > On 09/03/2018 04:43 PM, Richard Biener wrote: >> On Mon, 3 Sep 2018, Martin Liška wrote: >> >>> On 09/03/2018 04:00 PM, Richard Biener wrote: >>>> On Mon, 3 Sep 2018, Martin Liška wrote: >>>> >>>>> On 09/03/2018 02:54 PM, Martin Liška wrote: >>>>>> On 09/03/2018 02:41 PM, Richard Biener wrote: >>>>>>> On Mon, 3 Sep 2018, Martin Liška wrote: >>>>>>> >>>>>>>> On 04/25/2018 01:42 PM, Richard Biener wrote: >>>>>>>>> >>>>>>>>> The following patch^Whack splits $subject files into three, one >>>>>>>>> for the predicates (due to an implementation detail) and two for >>>>>>>>> the rest - for now into similar LOC size files. >>>>>>>>> >>>>>>>>> I'd like to get help on the makefile changes to make them less >>>>>>>>> verbose, somehow globbing the -[12p] parts. >>>>>>>>> >>>>>>>>> Also you can see the split point is manually chosen which means >>>>>>>>> it will bitrot. Timings for the stage2 compiles on a x86_64 >>>>>>>>> box are >>>>>>>>> >>>>>>>>> gimple-match-p.c 5s >>>>>>>>> generic-match-p.c 3s >>>>>>>>> gimple-match-1.c 85s >>>>>>>>> generic-match-1.c 56s >>>>>>>>> gimple-match-2.c 82s >>>>>>>>> generic-match-2.c 31s >>>>>>>>> >>>>>>>>> the required header files are quite big (and of course everything >>>>>>>>> needs to be exported without the analysis work becoming too >>>>>>>>> cumbersome), >>>>>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for >>>>>>>>> generic-match-head.h >>>>>>>>> >>>>>>>>> The machine I tested is quite fast so the 80ish second timings are >>>>>>>>> still >>>>>>>>> too slow I guess and thus splitting up into four files for gimple and >>>>>>>>> three files for generic looks better. >>>>>>>>> >>>>>>>>> Note we lose some inlining/cloning capability in the splitting process >>>>>>>>> (I see quite a bit of constprop/isra work being done on the generated >>>>>>>>> files). I didn't try to measure the runtime impact though. >>>>>>>>> >>>>>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd >>>>>>>>> like to get feedback on the approach and I didn't want to spend time >>>>>>>>> on programatically finding optimal split points (so everything is >>>>>>>>> output >>>>>>>>> in the same semi-random order as before). >>>>>>>>> >>>>>>>>> Richard. >> ... >>>>>>>> I took a look at gimple-match.c and what about doing a split in >>>>>>>> following way: >>>>>>>> - all gimple_simplify_$number going into a separate header file >>>>>>>> (~12000 LOC) >>>>>>>> - all the function can be marked as static inline >>>>>>>> - all other gimple_simplify_$code can be split into arbitrary number >>>>>>>> of parts >>>>>>>> - we have 287 such functions where each function only calls >>>>>>>> gimple_simplify_$number and >>>>>>>> on average there 10 of such calls >>>>>>>> - that would allow to remove most of gimple_simplify_$number functions >>>>>>>> from the header file >>>>>>>> >>>>>>>> Richi do you think it will be viable? >>>>>>> >>>>>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number >>>>>>> functions from the header fast as they are now effectively duplicated >>>>>>> into all parts, correct? Also I'm not sure if we actually want to >>>>>>> inline >>>>>>> them... they are split out to get both code size and compile-time >>>>>>> under control. Unfortunately we have still high max-inline-insns-single >>>>>>> which is used for inline marked functions. >>>>>>> >>>>>>> Eventually doing a "proper" partitioning algorithm is viable, that is, >>>>>>> partition based on gimple_simplify_$code and put gimple_simplify_$number >>>>>>> where they are used. If they are used across different codes then >>>>>>> merge those partitions. I guess you'll see that that'll merge the >>>>>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR). >>>>>> >>>>>> Yes, that should be much better. I'm attaching a 'callgraph' that was >>>>>> done by grepping. >>>>>> Function starting at the beginning of a line is function definition, >>>>>> with an indentation >>>>>> one can see calls. >>>>>> >>>>>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls. >>>>>> >>>>>> Well, generating some simple call graph format for the source file and a >>>>>> source file >>>>>> annotation of nodes can be input for a partitioning tool that can do the >>>>>> split. >>>>>> >>>>>> Issue with the generated files is that one needs to fix the most >>>>>> offenders (*-match.c, insn-recog.c, insn-emit.c, ..) >>>>>> in order to see some improvement. >>>>>> >>>>>> Looking at insn-recog.c, maybe similar callgraph-based split can be done >>>>>> for recog_$number functions? >>>>>> >>>>>> Martin >>>>>> >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>> >>>>> >>>>> I'm sending SCC components for gimple-match.c. So there are 3 quite big >>>>> one and rest is small. It's questionable >>>>> whether partitioning based on that will provide desired speed up? >>>> >>>> When I experimented split based on # of functions wasn't working well, >>>> only split based on # of lines did. I'd still expect that eventually >>>> basing the split on the SCC components makes sense if you use say, >>>> the biggest 4 (but measure size in # lines) and merge the rest evenly. >>> >>> I see! Note that shrinking gimple-match.o 4 times will be probably >>> sufficient for general >>> speed up: >>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440 >>> >>>> >>>> It would be nice if that all would be scriptable instead of coding it >>>> into genmatch.c but that's of course possible as well - just add >>>> some extra "passes" over code-gen as I did in the hac^Wpatch. You >>> >>> That would be my plan, genmatch can mark in C comments function that can be >>> partitioned >>> and callgraph of these functions. >>> >>>> could use graphds.c routines to compute SCCs for example. Knowing >>>> # lines beforehand is a bit hard though - code-generating into >>>> a set of character buffers might be possible but I wired everything >>>> to use FILE ... (and no stringstreams in the C library). >>>> And no, please do not convert to C++ streams ;)) >>> >>> ... and a C++ splitter can do the rest: read content, do SCC, split to N >>> parts >>> and stream out. >>> >>> I can work on that. Questionable is still Makefile integration of such >>> parallelism? >> >> Well, just hard-code the number of pieces and thus the pices to >> compile in the end... >> >> Of course we shouldn't add any additional build dependencies >> (a "C++ splitter"). Adding additional annotation to the genmatch >> generated sources may be OK but then eventually doing everything in >> genmatch isn't too complicated (putting aside that # of lines metric...). >> >> Richard. >> > > Hi. > > There's working solution for {gimple,generic}-match.c. It introduces > splitter.cc which > parses annotated generated files and split it according to call graph > provided by annotations. > Currently I copied a SCC implementation as graphds.c has quite some > dependencies (xrealloc, bitmap,..). > Questionable how to reduce the dependencies? > > For gimple-match.c the biggest SCC component contains about 1/4 LOC, which > means maximal reasonable number > of parts is equal to 4. Doing that the biggest component is compiled with -O2 > 14.5s, which the original gimple-match.c > 50.0s. That sounds promising: > > l gimple-match-part*.c > -rw-r--r-- 1 marxin users 791442 Sep 4 17:02 gimple-match-part-0.c > -rw-r--r-- 1 marxin users 748610 Sep 4 17:02 gimple-match-part-1.c > -rw-r--r-- 1 marxin users 783749 Sep 4 17:02 gimple-match-part-2.c > -rw-r--r-- 1 marxin users 828990 Sep 4 17:02 gimple-match-part-3.c > -rw-r--r-- 1 marxin users 103130 Sep 4 17:02 gimple-match-part-footer.c > > for generic-match.c: > > l generic-match-part*.c > -rw-r--r-- 1 marxin users 528712 Sep 4 17:02 generic-match-part-0.c > -rw-r--r-- 1 marxin users 517120 Sep 4 17:02 generic-match-part-1.c > -rw-r--r-- 1 marxin users 519777 Sep 4 17:02 generic-match-part-2.c > -rw-r--r-- 1 marxin users 221971 Sep 4 17:02 generic-match-part-3.c > -rw-r--r-- 1 marxin users 14596 Sep 4 17:02 generic-match-part-footer.c > > There are some issues with functions that live in gimple-match-head.c, these > should live in a header file the parts can include. I also renamed > gimple_simplify function that live in gimple-match.c to > gimple_simplify_generated > as they clash with these that are in gimple-match-head.c. > > There are still various questions: > - how to make the split process optional, how to modify then Makefile? > - SCC implementation should be shared with graphds.c > - splitter.cc should be cleaned up > - in order to achieve real speed up we need to split also other generated > (and also dwarf2out.c, i386.c, ..) files: > here I'm most concerned about insn-recog.c, which can't be split the same way > without ending up with a single huge SCC component. > Ideas what to do with that file? > > Thoughts? > > Martin >
Long time, no see. Richi, can we please return to this in next stage1? Martin