Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Xinliang David Li Fri, 30 Apr 2010 12:07:59 -0700

On Fri, Apr 30, 2010 at 11:12 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> >
>> > Interesting.  My plan for profiling with LTO is to ultimately make it 
>> > linktime
>> > transform.  This will be more difficult with WHOPR (i.e. instrumenting need
>> > function bodies that are not available at WPA time), but I believe it is
>> > solvable: just assign uids to the edges and do instrumentation at ltrans.  
>> > Then
>> > we will save cgraph profile in some easier way so WHOPR can read it in and 
>> > read
>> > rest of stuff in ltrans.  This would invovlve shipping the correct 
>> > profiles for
>> > given function etc so it will be a bit of implementation challenge.
>>
>> This can be tricky -- to maximize FDO benefit, the
>> profile-use/annotation needs to happen early which means
>> instrumentation also needs to happen early (to avoid cfg mismatches).
>
> I don't see much problem in this particular area.
>
> GCC optimization queue is organized in a way that we first do early
> optimizatoins that all are intended to be simple cleanups without size/speed
> tradeoffs.  Then we do IPA and late optimizations that are both driven by
> profile (estimated or read).
> Profile reading happens early because we use same infrastructure for gcov and
> profile feedback.  This is not giving profile feedback better benefit, quite a
> converse since early passes may not be able to update profile precisely and we
> also get higher profile overhead.
>
> So I think decoupling gcov and profile feedback and pushing profile feedback
> back in queue is going to be win.
>


There are two parts of profile-feedback
1) cfg edge counts annotation.

  For this part, yes, most of the early phases (other than possibly
einline-2) do not need/depend on, and can probably pushed back (in
fact the static/guessed profile pass is later).

2) value profile transformations:

This part may benefit more from doing early -- not only because of
more cleanups, but also due to the requirement for getting more
precise inline summary.


> Yes, optimization must match, but with LTO this is not problem and in general
> the early optimization should be stable wrt memory layout (nothing else
> changes).  This used to be excercised before profiling was updated to tree
> level in 4.x.


You mean CFG layout is stable? but ccp, copy_prop, dce, tail recursion
etc all can change cfg.

>
> I would be very interested in the low overhead support - there is a lot to 
> gain
> especially because the profiling resuls are less dependent on setup and can be
> better reused.  I know part of code was contributed (the support for reading 
> not
> 100% valid profiles). Is there any extra info available on this?
>

For profile smoothing, Neil may point to more information.

> Main problem IMO is how to get profile into WHOPR without having function 
> bodies.
> I guess we will end up with summarizing the info in WHOR firendly way and
> letting it to stream the other counters to LTRANS that will annotate the 
> function
> body once read in from the file.
>>

I am a little lost here :)

>>
>> >
>> >> 2) comdat function resolution -- since LIPO uses aux module functions
>> >> for inlining purpose only, it has the freedom to choose which copy to
>> >> use. The current scheme chooses copy in current module with priority
>> >> for better profile data context sensitivity (see below)
>> >
>> > This is interesting.  How do you solve the problem when given comdat 
>> > function
>> > "loose"? I.e. it is replaced at linktime by other function that may or may
>> > not be profiled from other unit?
>>
>> Whatever function that is selected will have profile data (assuming it
>> called at runtime) -- but the profile data are merged from different
>> contexts including from calls in different modules.   For instance,
>> both a.C and b.C define foo. and b.C:foo is selected at runtime, and
>> a.C:foo is not inlined (after instrumentation) anywhere in a.C, then
>> a.C:foo won't have any profile data, and b.C:foo has merged profile
>> data resulting from calls in both a.C and b.C.
>
> Yes, but this is what I am concerned about.  Without LTO at least when
> compiling a.C with profile feedback we will have foo with 0 counts.
> We might however work out that calls of foo are frequent and decide to
> inline foo. We will take the counts and rescale resulting in inlining
> foo optimized for size

Not always ideal though -- scaling does not expose whether foo is hot
or not (the call edge may be cold, but is still worth inlining).

.
>
> When comdats are resolved within LTO, this will not be deal, but LTO
> still produce comdats that are later resolved with library etc., so we don't
> solve the problem this way.
> At very least we should be able to figure out that we are having function
> that has no profile and do something more sane.

You mean LTO does not discard duplicate bodies? Why ?

>
> Do you have any idea how common these scenarios are?

I don't have direct data, but I think it can be common.

Thanks,

David

>
> Honza
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Reply via email to