On Mon, 3 Sep 2018, Martin Liška wrote:

> On 09/03/2018 04:00 PM, Richard Biener wrote:
> > On Mon, 3 Sep 2018, Martin Liška wrote:
> > 
> >> On 09/03/2018 02:54 PM, Martin Liška wrote:
> >>> On 09/03/2018 02:41 PM, Richard Biener wrote:
> >>>> On Mon, 3 Sep 2018, Martin Liška wrote:
> >>>>
> >>>>> On 04/25/2018 01:42 PM, Richard Biener wrote:
> >>>>>>
> >>>>>> The following patch^Whack splits $subject files into three, one
> >>>>>> for the predicates (due to an implementation detail) and two for
> >>>>>> the rest - for now into similar LOC size files.
> >>>>>>
> >>>>>> I'd like to get help on the makefile changes to make them less
> >>>>>> verbose, somehow globbing the -[12p] parts.
> >>>>>>
> >>>>>> Also you can see the split point is manually chosen which means
> >>>>>> it will bitrot.  Timings for the stage2 compiles on a x86_64
> >>>>>> box are
> >>>>>>
> >>>>>> gimple-match-p.c   5s
> >>>>>> generic-match-p.c  3s
> >>>>>> gimple-match-1.c  85s
> >>>>>> generic-match-1.c 56s
> >>>>>> gimple-match-2.c  82s
> >>>>>> generic-match-2.c 31s
> >>>>>>
> >>>>>> the required header files are quite big (and of course everything
> >>>>>> needs to be exported without the analysis work becoming too 
> >>>>>> cumbersome),
> >>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for 
> >>>>>> generic-match-head.h
> >>>>>>
> >>>>>> The machine I tested is quite fast so the 80ish second timings are 
> >>>>>> still
> >>>>>> too slow I guess and thus splitting up into four files for gimple and
> >>>>>> three files for generic looks better.
> >>>>>>
> >>>>>> Note we lose some inlining/cloning capability in the splitting process
> >>>>>> (I see quite a bit of constprop/isra work being done on the generated 
> >>>>>> files).  I didn't try to measure the runtime impact though.
> >>>>>>
> >>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd
> >>>>>> like to get feedback on the approach and I didn't want to spend time
> >>>>>> on programatically finding optimal split points (so everything is 
> >>>>>> output
> >>>>>> in the same semi-random order as before).
> >>>>>>
> >>>>>> Richard.
...
> >>>>> I took a look at gimple-match.c and what about doing a split in 
> >>>>> following way:
> >>>>> - all gimple_simplify_$number going into a separate header file (~12000 
> >>>>> LOC)
> >>>>> - all the function can be marked as static inline
> >>>>> - all other gimple_simplify_$code can be split into arbitrary number of 
> >>>>> parts
> >>>>> - we have 287 such functions where each function only calls 
> >>>>> gimple_simplify_$number and
> >>>>>   on average there 10 of such calls
> >>>>> - that would allow to remove most of gimple_simplify_$number functions 
> >>>>> from the header file
> >>>>>
> >>>>> Richi do you think it will be viable?
> >>>>
> >>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number
> >>>> functions from the header fast as they are now effectively duplicated
> >>>> into all parts, correct?  Also I'm not sure if we actually want to inline
> >>>> them...  they are split out to get both code size and compile-time
> >>>> under control.  Unfortunately we have still high max-inline-insns-single
> >>>> which is used for inline marked functions.
> >>>>
> >>>> Eventually doing a "proper" partitioning algorithm is viable, that is,
> >>>> partition based on gimple_simplify_$code and put gimple_simplify_$number
> >>>> where they are used.  If they are used across different codes then
> >>>> merge those partitions.  I guess you'll see that that'll merge the 
> >>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR).
> >>>
> >>> Yes, that should be much better. I'm attaching a 'callgraph' that was 
> >>> done by grepping.
> >>> Function starting at the beginning of a line is function definition, with 
> >>> an indentation
> >>> one can see calls.
> >>>
> >>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls.
> >>>
> >>> Well, generating some simple call graph format for the source file and a 
> >>> source file
> >>> annotation of nodes can be input for a partitioning tool that can do the 
> >>> split.
> >>>
> >>> Issue with the generated files is that one needs to fix the most 
> >>> offenders (*-match.c, insn-recog.c, insn-emit.c, ..)
> >>> in order to see some improvement.
> >>>
> >>> Looking at insn-recog.c, maybe similar callgraph-based split can be done 
> >>> for recog_$number functions?
> >>>
> >>> Martin
> >>>
> >>>>
> >>>> Richard.
> >>>>
> >>>
> >>
> >> I'm sending SCC components for gimple-match.c. So there are 3 quite big 
> >> one and rest is small. It's questionable
> >> whether partitioning based on that will provide desired speed up?
> > 
> > When I experimented split based on # of functions wasn't working well,
> > only split based on # of lines did.  I'd still expect that eventually
> > basing the split on the SCC components makes sense if you use say,
> > the biggest 4 (but measure size in # lines) and merge the rest evenly.
> 
> I see! Note that shrinking gimple-match.o 4 times will be probably sufficient 
> for general
> speed up:
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> 
> > 
> > It would be nice if that all would be scriptable instead of coding it
> > into genmatch.c but that's of course possible as well - just add
> > some extra "passes" over code-gen as I did in the hac^Wpatch.  You
> 
> That would be my plan, genmatch can mark in C comments function that can be 
> partitioned
> and callgraph of these functions.
> 
> > could use graphds.c routines to compute SCCs for example.  Knowing
> > # lines beforehand is a bit hard though - code-generating into
> > a set of character buffers might be possible but I wired everything
> > to use FILE ... (and no stringstreams in the C library).
> > And no, please do not convert to C++ streams ;))
> 
> ... and a C++ splitter can do the rest: read content, do SCC, split to N parts
> and stream out.
> 
> I can work on that. Questionable is still Makefile integration of such 
> parallelism?

Well, just hard-code the number of pieces and thus the pices to
compile in the end...

Of course we shouldn't add any additional build dependencies
(a "C++ splitter").  Adding additional annotation to the genmatch
generated sources may be OK but then eventually doing everything in
genmatch isn't too complicated (putting aside that # of lines metric...).

Richard.

Reply via email to