Resend in plain text mode.
On Wed, Jul 27, 2016 at 9:07 AM, Xinliang David Li wrote:
> Our experience is that non-atomic counter update (the current
> implementation) rarely result in corrupted profile (in heavily threaded
> environment) -- it usually results in some profile insanity
I did -- but very likely there was a process error in my side. Will
fix them soon.
David
On Thu, May 22, 2014 at 2:12 AM, Richard Biener
wrote:
> On Thu, May 22, 2014 at 10:49 AM, Paolo Carlini
> wrote:
>> Hi,
>>
>> is somebody already working on the regressions which appeared yesterday,
>> see
The fix is attached. Ok to commit?
David
On Thu, May 22, 2014 at 9:11 AM, Xinliang David Li wrote:
> I did -- but very likely there was a process error in my side. Will
> fix them soon.
>
> David
>
> On Thu, May 22, 2014 at 2:12 AM, Richard Biener
> wrote:
>> On Th
It may be helpful to document the following in msan's official page:
1) success stories (chrome land?)
2) runtime overhead comparison with valgrind
David
On Wed, Oct 1, 2014 at 9:07 AM, Kostya Serebryany wrote:
> [as text for real this time]
> Sanitizer compiler module sizes in LLVM (in lines):
Something seems wrong:
in tree_function_version:
initialize_cfun (new_decl, old_decl,
old_entry_block->count);
>From the above we can see new_decl's entry BB's count will be the same
as old_decl (no scaling).
In copy_bb, new BB's profile count will also be the same as ol
LBR is used for both cfg edge profiling and indirect call Target value
profiling.
David
On Fri, Apr 10, 2015 at 3:26 PM, Xinliang David Li wrote:
> LBR is used for both cfg edge profiling and indirect call Target value
> profiling.
>
> David
>
> On Apr 10, 2015 10:39 AM, &q
On Tue, Apr 7, 2015 at 7:45 AM, Ilya Palachev wrote:
> Hi,
>
> Here are some questions about AutoFDO.
>
> On 08.05.2014 02:55, Dehao Chen wrote:
>>
>> We have open-sourced AutoFDO profile toolchain in:
>>
>> https://github.com/google/autofdo
>>
>> For GCC developers, the most important tool is cre
On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote:
>> LBR is used for both cfg edge profiling and indirect call Target value
>> profiling.
> I see, that makes sense ;) I guess if we want to support profile collection
> on targets w/o this feature we could still use one of the algorithms that
> t
Note that the multi-versioning support is currently only in C++, not in C yet.
David
On Wed, Apr 29, 2015 at 1:24 AM, Christian Bruel wrote:
> Hi Ramana, Richard
>
> After playing with the attritute ((target ("[thumb,arm]")), during the
> pending review, I added the "default" selector to neutral
Using AutoFDO is one way. For PGO, you may want to to try using
__gcov_dump interface to explicitly control the timing and order of
the profile dump --- i.e., invoke __gcov_dump in main process after
work processes exit and before the main process exits.
David
On Mon, Jun 1, 2015 at 8:08 PM, Peng
On Mon, Jan 31, 2011 at 9:52 AM, Joseph S. Myers
wrote:
> On Sun, 30 Jan 2011, Ian Lance Taylor wrote:
>
>> I think that the call to lang_hooks.init_option_struct must be moved
>> after the call to default_options_optimization, one way or another.
>
> No, that is wrong; by design this structure in
What are the base option set used in all the comparison? O2, O3? Some
of the build time results look weired -- e.g., adding -march speeds up
*compile time* by 35%.
David
On Tue, Feb 8, 2011 at 8:08 AM, Tony Poppleton wrote:
> Hi,
>
> The following article has a fairly comprehensive set of bench
On Wed, Apr 20, 2011 at 12:03 PM, Cary Coutant wrote:
>> This brings out 2 questions. Why don't GCC 4.4/4.6/4.7 warn it?
>> Why doesn't 64bit GCC 4.2 warn it?
>
> Good question. It seems that the difference is whether the compiler
> generates a field-by-field copy or a call to memcpy(). According
FYI. If you just want text dump of gcc passes and their on|off
settings, option -fdump-passes can be used. This can be enhanced to
dump properties and TODOs.
David
On Tue, Jul 12, 2011 at 9:07 AM, David Malcolm wrote:
> On Tue, 2011-07-12 at 09:43 +0100, Paulo J. Matos wrote:
>> On 12/07/11 08:2
On Tue, Jul 12, 2011 at 10:55 AM, David Malcolm wrote:
> On Tue, 2011-07-12 at 09:15 -0700, Xinliang David Li wrote:
>> FYI. If you just want text dump of gcc passes and their on|off
>> settings, option -fdump-passes can be used. This can be enhanced to
>> dump properties
FYI the performance impact of this option with SPEC06 (built with
google_46 compiler and measured on a core2 box). The base line number
is FDO, and ref number is FDO + reorder_with_partitioning.
xalancbmk improves > 3.5%
perlbench improves > 1.5%
dealII and bzip2 degrades about 1.4%.
Note the p
On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini wrote:
> On 07/25/2011 06:42 AM, Xinliang David Li wrote:
>>
>> FYI the performance impact of this option with SPEC06 (built with
>> google_46 compiler and measured on a core2 box). The base line number
>> is
Without partition:
-
52348639025 branches
454417666 L1-icache-load-misses
14470953 iTLB-load-misses
On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini wrote:
> On 07/25/2011 06:42 AM, Xinliang David Li wrote:
>>
>> FYI the performance impact of t
On Mon, Jul 25, 2011 at 6:30 PM, Joern Rennecke wrote:
> Quoting Xinliang David Li :
>
>> In xalancbmk, with the partition option, most of object files have
>> nonzero size cold sections generated. The text size of the binary is
>> increased to 3572728 bytes from 34667
My guess is inlining differences. Try more aggressive inline
parameters to see if helps. Also try FDO to see there is any
performance difference between two versions. You will probably need to
do first level triage and file bug reports.
David
On Fri, Jul 29, 2011 at 10:56 AM, Oleg Smolsky
wrote
un the suite with -flto and there are no significant
> differences in performance
>
> What else is there?
>
> Oleg.
>
> On 2011/7/29 11:07, Xinliang David Li wrote:
>>
>> My guess is inlining differences. Try more aggressive inline
>> parameters to see if helps
instance set
--param large-function-insns=1
--param large-unit-insns=2
David
On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky wrote:
> On 2011/7/29 14:07, Xinliang David Li wrote:
>>
>> Profiling tools are your best friend here. If you don't have access to
>> any,
s much faster.
David
On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky wrote:
> On 2011/7/29 14:07, Xinliang David Li wrote:
>>
>> Profiling tools are your best friend here. If you don't have access to
>> any, the least you can do is to build the program with -pg option and
&g
On Wed, Aug 3, 2011 at 2:06 PM, Jan Hubicka wrote:
>> In xalancbmk, with the partition option, most of object files have
>> nonzero size cold sections generated. The text size of the binary is
>> increased to 3572728 bytes from 3466790 bytes. Profiling the program
>> using the training input show
+Mark who has done size optimization tuning with FDO.
On Thu, Aug 4, 2011 at 7:05 AM, Mike Hommey wrote:
> Hi,
>
> We (Mozilla) are trying to get the best of the ARM toolchain for our
> Android build. I recently built an Android Native-code Development Kit
> with GCC 4.6.1 and binutils 2.21.53, i
On Fri, Aug 5, 2011 at 7:40 AM, Jan Hubicka wrote:
> Am Fri 05 Aug 2011 09:32:05 AM CEST schrieb Richard Guenther
> :
>
>> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka wrote:
>
> Did you try using FDO with -Os? FDO should make hot code parts
> optimized similar to -O3 but leave other
On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
wrote:
> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka wrote:
Did you try using FDO with -Os? FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposit
>
> In a way I like the current scheme since it is simple and extending it
> should IMO have some good reason. We could refine -Os behaviour without
> changing current predicates to optimize for speed in
> a) functions declared as "hot" by user and BBs in them that are not proved
> cold.
> b) based
On Fri, Aug 5, 2011 at 3:24 PM, Jan Hubicka wrote:
>> >
>> > In a way I like the current scheme since it is simple and extending it
>> > should IMO have some good reason. We could refine -Os behaviour without
>> > changing current predicates to optimize for speed in
>> > a) functions declared as "
On Thu, Aug 11, 2011 at 7:21 AM, Mike Hommey wrote:
> On Thu, Aug 04, 2011 at 04:05:25PM +0200, Mike Hommey wrote:
>> Hi,
>>
>> We (Mozilla) are trying to get the best of the ARM toolchain for our
>> Android build. I recently built an Android Native-code Development Kit
>> with GCC 4.6.1 and binut
> * Case II - User Guided Versioning where the function bodies for each
> version differ and is provided by the user.
>
> This case pertains to multi-versioning when the source bodies of the
> two or more versions are different and are provided by the user. Here
> too, I want to use a new attribute
The specifications should apply to virtual member functions too --
though the underlying implementation for MVed virtual functions and
virtual calls can be quite different.
David
On Tue, Aug 16, 2011 at 1:37 PM, Sriraman Tallam wrote:
> Hi,
>
> I am working on supporting function multi-versioni
The gist of previous discussion is to use function overloading instead
of exposing underlying implementation such as builtin_dispatch to the
user. This new refined proposal has not changed in that, but is more
elaborate on various use cases which has been carefully thought out.
Please be specific o
On Wed, Aug 17, 2011 at 8:12 AM, Richard Guenther
wrote:
> On Wed, Aug 17, 2011 at 4:52 PM, Xinliang David Li wrote:
>> The gist of previous discussion is to use function overloading instead
>> of exposing underlying implementation such as builtin_dispatch to the
>> u
On Wed, Aug 17, 2011 at 8:35 AM, Mike Hommey wrote:
> On Thu, Aug 11, 2011 at 09:27:23AM -0700, Xinliang David Li wrote:
>> > Maybe I have an idea as to why FDO doesn't work so well. Does the
>> > instrumentation code support running several times in parallel (as in,
&
On Thu, Aug 18, 2011 at 12:51 AM, Richard Guenther
wrote:
> On Wed, Aug 17, 2011 at 6:37 PM, Xinliang David Li wrote:
>> On Wed, Aug 17, 2011 at 8:12 AM, Richard Guenther
>> wrote:
>>> On Wed, Aug 17, 2011 at 4:52 PM, Xinliang David Li
>>> wrote:
>>>&
Partial register stall happens when there is a 32bit register read
followed by a partial register write. In your case, the stall probably
happens in the next iteration when 'add eax, 0Ah' executes, so your
manual patch does not work. Try change
add al, [dx] into two instructions (assuming esi is
On Wed, Aug 24, 2011 at 12:50 PM, Oleg Smolsky
wrote:
> On 2011/8/23 11:38, Xinliang David Li wrote:
>>
>> Partial register stall happens when there is a 32bit register read
>> followed by a partial register write. In your case, the stall probably
>> happens in the n
Thanks.
Can you make the test case a standalone preprocessed file (using -E)?
David
On Wed, Aug 24, 2011 at 2:26 PM, Oleg Smolsky wrote:
> On 2011/8/24 13:02, Xinliang David Li wrote:
>>>
>>> On 2011/8/23 11:38, Xinliang David Li wrote:
>>>>
>>>> P
Why is lto/whole program mode not used in LLVM for peak performance
comparison? (of course, peak performance should really use FDO..)
thanks,
David
On Wed, Sep 7, 2011 at 8:15 AM, Vladimir Makarov wrote:
> Some people asked me to do comparison of GCC-4.6 and LLVM-2.9 (both
> released this spr
The idea sounds useful to me ..
Or perhaps introduce template into C :)
David
On Thu, Dec 29, 2011 at 1:12 PM, Ian Lance Taylor wrote:
> R A writes:
>
>>> The gcc developers, and everyone else involved in the development of C
>>> as a language, are perhaps not superhuman - but I suspect their
libacml from AMD is also a good candidate to try:
http://www.ualberta.ca/AICT/RESEARCH/LinuxClusters/doc/acml350/Linking_002fWindows.html
David
On Thu, Jan 19, 2012 at 2:59 AM, Richard Guenther
wrote:
> On Thu, Jan 19, 2012 at 7:37 AM, Marc Glisse wrote:
>> On Wed, 18 Jan 2012, willus.com wrote
Similar situation happens in non loop context as well. PRE commoned
address computation without knowing the existence of advanced
addressing mode, which result in unnecessary address computation
instruction. The forward substitution code makes local heuristics and
looks at each use individually --
This optimization is usually done with whole program analysis (WPA) or
with function cloning or inlining -- e.g., 'i' is an address of a
local variable in inlined/cloned callsite ...
David
On Sun, Jan 3, 2010 at 3:19 PM, Joshua Haberman wrote:
> By the way, here is one case I tested where I was
Diego, thanks for brining LIPO into discussion.
There is a common misunderstanding of LIPO. It is not about
partitioning, but about extending single module compilation scope to
multiple/cross module. For instance for a build with a.c, b.c, c.c,
and d.c, LIPO does not partition them into
{a.c b.c}
The dead store problem seems to be a regression in SRA. In 4.4, the
struct with array is properly expanded in to scalars allowing copy
prop and dead code elimination -- in 4.5, this does not happen. You
should file a bug .
David
On Wed, Apr 21, 2010 at 7:30 PM, tbp wrote:
> Hello,
>
> having fin
On Thu, Apr 22, 2010 at 12:44 AM, Dave Korn
wrote:
> On 22/04/2010 03:30, tbp wrote:
>
>> What's the deal with constexpr (or what can i reasonably expect)?
>
> You can *reasonably* expect the documented behaviour from the compiler. Or
> you can *un*reasonably ignore the documentation, make ill-i
Seongbae Park ??? ??? wrote:
David,
Just in case you haven't noticed this thread - I figured you may want
to comment on it.
Seongbae
-- Forwarded message --
From: Manuel López-Ibáñez <[EMAIL PROTECTED]>
Date: Wed, Oct 22, 2008 at 2:00 PM
Subject: Re: conditional assigments vs.
Hi, I am going to create a gcc branch for the functionality of
lightweight IPO. The description of the project and current status can
be found in http://gcc.gnu.org/wiki/LightweightIpo. Some highlights:
1) If you already use FDO in your build, you also get IPO almost for free;
2) It is an IPO sol
Andi,
On Tue, May 5, 2009 at 1:49 AM, Andi Kleen wrote:
> Xinliang David Li writes:
>>
>> If the idea is generally accepted, I will prepare a series of patches
>> and submit them to gcc trunk.
>
> I was reading your wiki page. Interesting idea.
>
> One aspect t
On Tue, May 5, 2009 at 2:47 AM, Richard Guenther
wrote:
> On Tue, May 5, 2009 at 7:00 AM, Xinliang David Li wrote:
>> Hi, I am going to create a gcc branch for the functionality of
>> lightweight IPO. The description of the project and current status can
>> be found in ht
On Tue, May 5, 2009 at 10:38 AM, Andi Kleen wrote:
> On Tue, May 05, 2009 at 10:25:13AM -0700, Xinliang David Li wrote:
>> Andi,
>>
>> On Tue, May 5, 2009 at 1:49 AM, Andi Kleen wrote:
>> > Xinliang David Li writes:
>> >>
>> >> If the
Forgot to copy the reply to the mailing list.
David
-- Forwarded message --
From: Xinliang David Li
Date: Wed, May 6, 2009 at 10:08 AM
Subject: Re: [Announcement] Creating lightweight IPO branch
To: Richard Guenther
On Wed, May 6, 2009 at 2:00 AM, Richard Guenther
wrote
Just curious, what is the base line size of your comparison? Did you
turn on GC (-ffunction-sections -fdata-sections -Wl,--gc-sections)?
David
On Wed, Apr 28, 2010 at 2:44 AM, Bingfeng Mei wrote:
> Thanks, I will check what I can do with collect2. LTO
> seems to save 6-9% code size for applicati
On Thu, Apr 29, 2010 at 9:28 AM, Bingfeng Mei wrote:
> I turned on -ffunction-sections and compiled with -Os.
> The size gain at -O2 is less though.
Interesting.
Thanks,
David
>
> Bingfeng
>
>> -Original Message-----
>> From: Xinliang David Li [mailto:davi...@go
On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov wrote:
> GCC-4.5.0 and LLVM-2.7 were released recently. To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
> Eve
>>
>
> Thanks for the comments. FDO will probably improve SPEC2000 score.
> Although it is not obvious for some tests because the train data sets for
> them are different from the reference data sets and it might actually
> mislead the compiler.
>
> FDO is important for optimizations where all p
Point well put. The benchmark suite should have good mixture of
programs with different sizes. SPEC2k programs cluster at the lower
end of the spectrum though.
David
On Thu, Apr 29, 2010 at 12:43 PM, Vladimir Makarov wrote:
> Xinliang David Li wrote:
>>>
>>> Thanks for th
On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote:
>> Thanks for the comments. FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different from the reference data sets and it might
>> actually mislead the compiler.
I noticed eon's peak options do not include FDO, is that intended?
David
On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote:
>> Thanks for the comments. FDO will probably improve SPEC2000 score.
>> Although it is not obvious for some tests because the train data sets
>> for them are different
Thanks for the suggestion. Raksit currently is busy with merging trunk
changes back to lw-ipo branch which can be a daunting task. After that
this can be done. (Our internal release is based on 4.4).
David
On Thu, Apr 29, 2010 at 2:38 PM, Steven Bosscher wrote:
> On Thu, Apr 29, 2010 at 11:27 P
On Thu, Apr 29, 2010 at 4:03 PM, Jan Hubicka wrote:
>> 2010/4/30 Jan Hubicka :
>> >> Thanks for the suggestion. Raksit currently is busy with merging trunk
>> >> changes back to lw-ipo branch which can be a daunting task. After that
>> >> this can be done. (Our internal release is based on 4.4).
On Fri, Apr 30, 2010 at 1:37 AM, Jan Hubicka wrote:
>> In theory, LIPO should not generate better results than LTO+FDO. What
>> makes LIPO attractive is that it allows distributed build from the
>> beginning. Its integration with large distributed build system is also
>> easy. Another point is th
On Fri, Apr 30, 2010 at 11:12 AM, Jan Hubicka wrote:
>> >
>> > Interesting. My plan for profiling with LTO is to ultimately make it
>> > linktime
>> > transform. This will be more difficult with WHOPR (i.e. instrumenting need
>> > function bodies that are not available at WPA time), but I belie
On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote:
>>
>> Vortex needs -fno-strict-aliasing. It casts between two record types
>> with one record being a 'prefix' of another.
>
> So today runs are complette. Thanks to Richi who fixed ICE in symtab merging
> that affected perl and GCC. With vorte
On Sun, May 2, 2010 at 6:45 AM, Jan Hubicka wrote:
>> On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote:
>> >>
>> >> Vortex needs -fno-strict-aliasing. It casts between two record types
>> >> with one record being a 'prefix' of another.
>> >
>> > So today runs are complette. Thanks to Richi who
On Thu, May 20, 2010 at 2:09 PM, Ian Lance Taylor wrote:
> Steven Bosscher writes:
>
>> And finally: expand. This should be just a change of IR format, from
>> GIMPLE to RTL. I have no idea why this pass always shows up in the top
>> 10 of slowest parts of GCC. Lowering passes on e.g. WHIRL, or
On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher wrote:
> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li
> wrote:
>> stack variable overlay and stack slot assignments is here too.
>
> Yes, and for these I would like to add a separate timevar. Agree?
Yes. (By the way
On Fri, May 21, 2010 at 2:24 AM, Richard Guenther
wrote:
> On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li
> wrote:
>> On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher
>> wrote:
>>> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li
>>> wrote:
>&g
On Fri, May 21, 2010 at 2:24 AM, Richard Guenther
wrote:
> On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li
> wrote:
>> On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher
>> wrote:
>>> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li
>>> wrote:
>&g
On Fri, May 21, 2010 at 10:35 AM, Richard Guenther
wrote:
> On Fri, May 21, 2010 at 7:30 PM, Xinliang David Li wrote:
>> On Fri, May 21, 2010 at 2:24 AM, Richard Guenther
>> wrote:
>>> On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li
>>> wrote:
>>>
On Wed, May 26, 2010 at 2:58 AM, Richard Guenther
wrote:
> On Tue, May 25, 2010 at 10:02 PM, Easwaran Raman wrote:
>> On Fri, May 21, 2010 at 10:30 AM, Xinliang David Li
>> wrote:
>>>
>>> On Fri, May 21, 2010 at 2:24 AM, Richard Guenther
>>> wro
On Thu, May 27, 2010 at 2:38 AM, Richard Guenther
wrote:
> On Wed, May 26, 2010 at 6:05 PM, Richard Guenther
> wrote:
>> On Wed, May 26, 2010 at 5:42 PM, Xinliang David Li
>> wrote:
>>> On Wed, May 26, 2010 at 2:58 AM, Richard Guenther
>>> wrote:
&
Thanks for testing it out. There are probably more tuning
opportunities for fortran (e.g. larger solution search space, more
aggressive pruning, and more advanced loop invariants and register
pressure estimation), which I hope someone can continue working on (or
me if I find more time).
David
On
Another main thing missing is to consider profile information (if
available) so that most frequent cases can be peeled out.
David
On Fri, Aug 27, 2010 at 8:03 AM, Richard Guenther
wrote:
> On Fri, Aug 27, 2010 at 4:47 PM, Ian Lance Taylor wrote:
>> "Paulo J. Matos" writes:
>>
>>> In the first
Briefly looked at it -- the trunk gcc also regresses a lot compared to
the binary you attached. (To match your binary, also added
-mfpmath=387 -m32 options)
Two problems:
1) more register spills in the trunk version -- the old compiler seems
more effective in using fp stack registers;
2) the comp
Right -- I missed Richard's previous email regarding the options.
Thanks,
David
On Fri, Aug 27, 2010 at 5:21 PM, Andrew Pinski wrote:
> On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li wrote:
>> Briefly looked at it -- the trunk gcc also regresses a lot compared to
>
The optimization does look bad -- splitting backedge to allow
expression hoisting rarely removes any redundancy -- unless the loop
is really short trip counted. Besides it introduces extra copy, jump
instruction and increases register pressure.
David
On Wed, Sep 29, 2010 at 5:55 AM, Bingfeng Mei
I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-frame-pointer to clang/llvm as
this is gcc's default. The b
On Sat, Nov 13, 2010 at 2:39 PM, Paolo Bonzini wrote:
> On 11/13/2010 10:08 PM, Xinliang David Li wrote:
>>
>> Though gcc leads LLVM in performance overrall, there are a couple of
>> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
>> twolf (32
Thanks, this works.
gcc vs llvm
176.gcc: +3.7%
252.eon: +6.1%
David
On Sat, Nov 13, 2010 at 3:14 PM, H.J. Lu wrote:
> On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li wrote:
>>
>> Though gcc leads LLVM in performance overrall, there are a couple of
>> benchmarks
wrote:
> Hello,
>
> On 14.11.2010 0:08, Xinliang David Li wrote:
>>
>> I re-measured the performance difference using trunk gcc and trunk
>> clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc
>> because clang/llvm's type based aliasing is not i
t be the case for gcc for now.
>
> Function inlining definitly helps. -O3 also imply vectorization and other
> stuff.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>> On Mon, Nov 15, 2010 at 4:29 AM, Andrey Belevantsev wrote:
>> > Hello,
>> >
Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
data come later.
164.gzip13241322 -0.10%
175.vpr16941703 0.51%
176.gcc22932347
This means O3 level inlining should be turned on also for lto build by
default -- as -O2 lto performance is too unimpressive.
David
On Mon, Nov 15, 2010 at 3:36 PM, Xinliang David Li wrote:
> Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
> data come
On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote:
>> This means O3 level inlining should be turned on also for lto build by
>> default -- as -O2 lto performance is too unimpressive.
>
> I am just re-tunning the inliner and hope to get more speedups for smaller
> costs than we get right now. I h
> Fortunately linker plugin solves the problem here and this is why I want to
> have it by default. GCC then can do effectively -fwhole-program for binaries
> (since linker knows what will be bound elsewhere) and take advantage of
> visibility((hidden)) hints for shared libraries same way. Most o
On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote:
>> > Fortunately linker plugin solves the problem here and this is why I want to
>> > have it by default. GCC then can do effectively -fwhole-program for
>> > binaries
>> > (since linker knows what will be bound elsewhere) and take advantage of
More performance data:
-O2 -funroll-all-loops vs O2: +1.1% geomean
O2 O2 unroll-all-loops
164.gzip13241336 0.94%
175.vpr16941670 -1.44%
23923294 37.69%
256.bzip217191956 13.77%
300.twolf22882404 5.07%
David
On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li wrote:
> More performance data:
>
On Tue, Nov 16, 2010 at 6:35 AM, Jan Hubicka wrote:
>> More FDO related performance numbers
>>
>> Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance
>> by 5% geomean
>> Experiment 2: our internal gcc compiler (4.4.3 based with many local
>> patches) O2 + FDO vs O2 (trunk gcc):
63927590687 4.75%
175.vpr/139321135276 -2.90%
252.eon/607704585954 -3.58%
254.gap/496262487289 -1.81%
size_sum 4227793 4243308 0.37%
On Tue, Nov 16, 2010 at 12:26 AM, Xinliang David Li wrote:
>
On Thu, Nov 18, 2010 at 3:58 AM, Jan Hubicka wrote:
>> Some text size measurement.
>>
>> Summary:
>> 1) LTO with -O3 bloats up code considerably;
> Yes, you need either -fwhole-program or -fuse-linker-plugin to make it behave
> sanely.
>
> For Mozilla I have best experience with -fuse-linker-plugi
I found an error in my size experiment set up -- (libstdc++ shared vs
non shared) -- please discard the size numbers -- will remeasure.
Thanks,
David
On Thu, Nov 18, 2010 at 4:02 AM, Jan Hubicka wrote:
> Hi,
> and for size, could you please also do -Os comparsions? I am aware that -O2
> inline
On Thu, Nov 18, 2010 at 4:12 PM, Jan Hubicka wrote:
> Hi,
>> I'll get back to you with our local inlining changes. We're looking to move
>> development closer to trunk to reduce this divergence in the future.
>>
>> Our tuning was done primarily on big c++ programs. A significant size
>> improvem
New size data -- hopefully it is sane this time.
Changes in experiment
1) shared libstdc++ is used with trunk gcc
2) bfd linker is used in both trunk and patched 4.4.3 compiler (which
used gold).
The size comparison for all C benchmarks in previous report is still
valid. The following is the corr
A good optimizing compiler tries hard to preserve restrict aliasing of
a callee function in its inline instance, and this is usually a hard
problem because the use of restrict qualified pointers are now mixed
with the caller context. In many cases, the compiler may choose not
to inline the functio
As Ian said, you want to make your emulation inline functions
available when __SSE2__ macro is not defined so that you get the
definitions when -msse2 is not specified, but not getting them when
-msse2 is specified. In the future, gcc may be enhanced to exposed
those mm intrinsics unconditionally (
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote:
>> On 12/07/2010 04:20 PM, Andi Kleen wrote:
>>>
>>> The only problem left is mixing of lto and non lto objects. this right
>>> now is not handled. IMHO still the best way to handle it is to use
>>> slim lto and then simply separate link the "left
Any comment on the following?
http://blog.regehr.org/archives/320
1) is due to lack of non-linear induction variable support
5) is the same problem mentioned in pr35363
I have not looked at the details of others -- there are probably
related missed-optimization bugs already filed.
Thanks,
Davi
1 - 100 of 193 matches
Mail list logo