On 9/4/18 5:07 PM, Martin Liška wrote:
> On 09/03/2018 04:43 PM, Richard Biener wrote:
>> On Mon, 3 Sep 2018, Martin Liška wrote:
>>
>>> On 09/03/2018 04:00 PM, Richard Biener wrote:
>>>> On Mon, 3 Sep 2018, Martin Liška wrote:
>>>>
>>>>> On 09/03/2018 02:54 PM, Martin Liška wrote:
>>>>>> On 09/03/2018 02:41 PM, Richard Biener wrote:
>>>>>>> On Mon, 3 Sep 2018, Martin Liška wrote:
>>>>>>>
>>>>>>>> On 04/25/2018 01:42 PM, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>> The following patch^Whack splits $subject files into three, one
>>>>>>>>> for the predicates (due to an implementation detail) and two for
>>>>>>>>> the rest - for now into similar LOC size files.
>>>>>>>>>
>>>>>>>>> I'd like to get help on the makefile changes to make them less
>>>>>>>>> verbose, somehow globbing the -[12p] parts.
>>>>>>>>>
>>>>>>>>> Also you can see the split point is manually chosen which means
>>>>>>>>> it will bitrot.  Timings for the stage2 compiles on a x86_64
>>>>>>>>> box are
>>>>>>>>>
>>>>>>>>> gimple-match-p.c   5s
>>>>>>>>> generic-match-p.c  3s
>>>>>>>>> gimple-match-1.c  85s
>>>>>>>>> generic-match-1.c 56s
>>>>>>>>> gimple-match-2.c  82s
>>>>>>>>> generic-match-2.c 31s
>>>>>>>>>
>>>>>>>>> the required header files are quite big (and of course everything
>>>>>>>>> needs to be exported without the analysis work becoming too 
>>>>>>>>> cumbersome),
>>>>>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for 
>>>>>>>>> generic-match-head.h
>>>>>>>>>
>>>>>>>>> The machine I tested is quite fast so the 80ish second timings are 
>>>>>>>>> still
>>>>>>>>> too slow I guess and thus splitting up into four files for gimple and
>>>>>>>>> three files for generic looks better.
>>>>>>>>>
>>>>>>>>> Note we lose some inlining/cloning capability in the splitting process
>>>>>>>>> (I see quite a bit of constprop/isra work being done on the generated 
>>>>>>>>> files).  I didn't try to measure the runtime impact though.
>>>>>>>>>
>>>>>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd
>>>>>>>>> like to get feedback on the approach and I didn't want to spend time
>>>>>>>>> on programatically finding optimal split points (so everything is 
>>>>>>>>> output
>>>>>>>>> in the same semi-random order as before).
>>>>>>>>>
>>>>>>>>> Richard.
>> ...
>>>>>>>> I took a look at gimple-match.c and what about doing a split in 
>>>>>>>> following way:
>>>>>>>> - all gimple_simplify_$number going into a separate header file 
>>>>>>>> (~12000 LOC)
>>>>>>>> - all the function can be marked as static inline
>>>>>>>> - all other gimple_simplify_$code can be split into arbitrary number 
>>>>>>>> of parts
>>>>>>>> - we have 287 such functions where each function only calls 
>>>>>>>> gimple_simplify_$number and
>>>>>>>>   on average there 10 of such calls
>>>>>>>> - that would allow to remove most of gimple_simplify_$number functions 
>>>>>>>> from the header file
>>>>>>>>
>>>>>>>> Richi do you think it will be viable?
>>>>>>>
>>>>>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number
>>>>>>> functions from the header fast as they are now effectively duplicated
>>>>>>> into all parts, correct?  Also I'm not sure if we actually want to 
>>>>>>> inline
>>>>>>> them...  they are split out to get both code size and compile-time
>>>>>>> under control.  Unfortunately we have still high max-inline-insns-single
>>>>>>> which is used for inline marked functions.
>>>>>>>
>>>>>>> Eventually doing a "proper" partitioning algorithm is viable, that is,
>>>>>>> partition based on gimple_simplify_$code and put gimple_simplify_$number
>>>>>>> where they are used.  If they are used across different codes then
>>>>>>> merge those partitions.  I guess you'll see that that'll merge the 
>>>>>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR).
>>>>>>
>>>>>> Yes, that should be much better. I'm attaching a 'callgraph' that was 
>>>>>> done by grepping.
>>>>>> Function starting at the beginning of a line is function definition, 
>>>>>> with an indentation
>>>>>> one can see calls.
>>>>>>
>>>>>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls.
>>>>>>
>>>>>> Well, generating some simple call graph format for the source file and a 
>>>>>> source file
>>>>>> annotation of nodes can be input for a partitioning tool that can do the 
>>>>>> split.
>>>>>>
>>>>>> Issue with the generated files is that one needs to fix the most 
>>>>>> offenders (*-match.c, insn-recog.c, insn-emit.c, ..)
>>>>>> in order to see some improvement.
>>>>>>
>>>>>> Looking at insn-recog.c, maybe similar callgraph-based split can be done 
>>>>>> for recog_$number functions?
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>
>>>>>
>>>>> I'm sending SCC components for gimple-match.c. So there are 3 quite big 
>>>>> one and rest is small. It's questionable
>>>>> whether partitioning based on that will provide desired speed up?
>>>>
>>>> When I experimented split based on # of functions wasn't working well,
>>>> only split based on # of lines did.  I'd still expect that eventually
>>>> basing the split on the SCC components makes sense if you use say,
>>>> the biggest 4 (but measure size in # lines) and merge the rest evenly.
>>>
>>> I see! Note that shrinking gimple-match.o 4 times will be probably 
>>> sufficient for general
>>> speed up:
>>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
>>>
>>>>
>>>> It would be nice if that all would be scriptable instead of coding it
>>>> into genmatch.c but that's of course possible as well - just add
>>>> some extra "passes" over code-gen as I did in the hac^Wpatch.  You
>>>
>>> That would be my plan, genmatch can mark in C comments function that can be 
>>> partitioned
>>> and callgraph of these functions.
>>>
>>>> could use graphds.c routines to compute SCCs for example.  Knowing
>>>> # lines beforehand is a bit hard though - code-generating into
>>>> a set of character buffers might be possible but I wired everything
>>>> to use FILE ... (and no stringstreams in the C library).
>>>> And no, please do not convert to C++ streams ;))
>>>
>>> ... and a C++ splitter can do the rest: read content, do SCC, split to N 
>>> parts
>>> and stream out.
>>>
>>> I can work on that. Questionable is still Makefile integration of such 
>>> parallelism?
>>
>> Well, just hard-code the number of pieces and thus the pices to
>> compile in the end...
>>
>> Of course we shouldn't add any additional build dependencies
>> (a "C++ splitter").  Adding additional annotation to the genmatch
>> generated sources may be OK but then eventually doing everything in
>> genmatch isn't too complicated (putting aside that # of lines metric...).
>>
>> Richard.
>>
> 
> Hi.
> 
> There's working solution for {gimple,generic}-match.c. It introduces 
> splitter.cc which
> parses annotated generated files and split it according to call graph 
> provided by annotations.
> Currently I copied a SCC implementation as graphds.c has quite some 
> dependencies (xrealloc, bitmap,..).
> Questionable how to reduce the dependencies?
> 
> For gimple-match.c the biggest SCC component contains about 1/4 LOC, which 
> means maximal reasonable number
> of parts is equal to 4. Doing that the biggest component is compiled with -O2 
> 14.5s, which the original gimple-match.c
> 50.0s. That sounds promising:
> 
>  l gimple-match-part*.c
> -rw-r--r-- 1 marxin users 791442 Sep  4 17:02 gimple-match-part-0.c
> -rw-r--r-- 1 marxin users 748610 Sep  4 17:02 gimple-match-part-1.c
> -rw-r--r-- 1 marxin users 783749 Sep  4 17:02 gimple-match-part-2.c
> -rw-r--r-- 1 marxin users 828990 Sep  4 17:02 gimple-match-part-3.c
> -rw-r--r-- 1 marxin users 103130 Sep  4 17:02 gimple-match-part-footer.c
> 
> for generic-match.c:
> 
> l generic-match-part*.c
> -rw-r--r-- 1 marxin users 528712 Sep  4 17:02 generic-match-part-0.c
> -rw-r--r-- 1 marxin users 517120 Sep  4 17:02 generic-match-part-1.c
> -rw-r--r-- 1 marxin users 519777 Sep  4 17:02 generic-match-part-2.c
> -rw-r--r-- 1 marxin users 221971 Sep  4 17:02 generic-match-part-3.c
> -rw-r--r-- 1 marxin users  14596 Sep  4 17:02 generic-match-part-footer.c
> 
> There are some issues with functions that live in gimple-match-head.c, these
> should live in a header file the parts can include. I also renamed
> gimple_simplify function that live in gimple-match.c to 
> gimple_simplify_generated
> as they clash with these that are in gimple-match-head.c.
> 
> There are still various questions:
> - how to make the split process optional, how to modify then Makefile?
> - SCC implementation should be shared with graphds.c
> - splitter.cc should be cleaned up
> - in order to achieve real speed up we need to split also other generated 
> (and also dwarf2out.c, i386.c, ..) files:
> here I'm most concerned about insn-recog.c, which can't be split the same way 
> without ending up with a single huge SCC component.
> Ideas what to do with that file?
> 
> Thoughts?
> 
> Martin
> 

Long time, no see. Richi, can we please return to this in next stage1?

Martin

Reply via email to