Re: Thread-safety of a profiled binary (and GCOV runtime library)

2016-07-27 Thread Xinliang David Li
Resend in plain text mode. On Wed, Jul 27, 2016 at 9:07 AM, Xinliang David Li wrote: > Our experience is that non-atomic counter update (the current > implementation) rarely result in corrupted profile (in heavily threaded > environment) -- it usually results in some profile insanity

Re: New C++ IPA fails

2014-05-22 Thread Xinliang David Li
I did -- but very likely there was a process error in my side. Will fix them soon. David On Thu, May 22, 2014 at 2:12 AM, Richard Biener wrote: > On Thu, May 22, 2014 at 10:49 AM, Paolo Carlini > wrote: >> Hi, >> >> is somebody already working on the regressions which appeared yesterday, >> see

Re: New C++ IPA fails

2014-05-22 Thread Xinliang David Li
The fix is attached. Ok to commit? David On Thu, May 22, 2014 at 9:11 AM, Xinliang David Li wrote: > I did -- but very likely there was a process error in my side. Will > fix them soon. > > David > > On Thu, May 22, 2014 at 2:12 AM, Richard Biener > wrote: >> On Th

Re: msan and gcc ?

2014-10-01 Thread Xinliang David Li
It may be helpful to document the following in msan's official page: 1) success stories (chrome land?) 2) runtime overhead comparison with valgrind David On Wed, Oct 1, 2014 at 9:07 AM, Kostya Serebryany wrote: > [as text for real this time] > Sanitizer compiler module sizes in LLVM (in lines):

Re: cgraph node profile update in cgraph_rebuild_references causes a performance issue

2014-10-30 Thread Xinliang David Li
Something seems wrong: in tree_function_version: initialize_cfun (new_decl, old_decl, old_entry_block->count); >From the above we can see new_decl's entry BB's count will be the same as old_decl (no scaling). In copy_bb, new BB's profile count will also be the same as ol

Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
LBR is used for both cfg edge profiling and indirect call Target value profiling. David On Fri, Apr 10, 2015 at 3:26 PM, Xinliang David Li wrote: > LBR is used for both cfg edge profiling and indirect call Target value > profiling. > > David > > On Apr 10, 2015 10:39 AM, &q

Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Tue, Apr 7, 2015 at 7:45 AM, Ilya Palachev wrote: > Hi, > > Here are some questions about AutoFDO. > > On 08.05.2014 02:55, Dehao Chen wrote: >> >> We have open-sourced AutoFDO profile toolchain in: >> >> https://github.com/google/autofdo >> >> For GCC developers, the most important tool is cre

Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote: >> LBR is used for both cfg edge profiling and indirect call Target value >> profiling. > I see, that makes sense ;) I guess if we want to support profile collection > on targets w/o this feature we could still use one of the algorithms that > t

Re: interest for ARM/thumb multiversionning ?

2015-04-30 Thread Xinliang David Li
Note that the multi-versioning support is currently only in C++, not in C yet. David On Wed, Apr 29, 2015 at 1:24 AM, Christian Bruel wrote: > Hi Ramana, Richard > > After playing with the attritute ((target ("[thumb,arm]")), during the > pending review, I added the "default" selector to neutral

Re: Data race in PGO profile collection for multi-process program

2015-06-01 Thread Xinliang David Li
Using AutoFDO is one way. For PGO, you may want to to try using __gcov_dump interface to explicitly control the timing and order of the profile dump --- i.e., invoke __gcov_dump in main process after work processes exit and before the main process exits. David On Mon, Jun 1, 2015 at 8:08 PM, Peng

Re: Confusion in setting default options for non-C/C++ languages

2011-01-31 Thread Xinliang David Li
On Mon, Jan 31, 2011 at 9:52 AM, Joseph S. Myers wrote: > On Sun, 30 Jan 2011, Ian Lance Taylor wrote: > >> I think that the call to lang_hooks.init_option_struct must be moved >> after the call to default_options_optimization, one way or another. > > No, that is wrong; by design this structure in

Re: GCC 4.6 performance regressions

2011-02-08 Thread Xinliang David Li
What are the base option set used in all the comparison? O2, O3? Some of the build time results look weired -- e.g., adding -march speeds up *compile time* by 35%. David On Tue, Feb 8, 2011 at 8:08 AM, Tony Poppleton wrote: > Hi, > > The following article has a fairly comprehensive set of bench

Re: GCC 4.4/4.6/4.7 uninitialized warning regression?

2011-04-20 Thread Xinliang David Li
On Wed, Apr 20, 2011 at 12:03 PM, Cary Coutant wrote: >> This brings out 2 questions.  Why don't GCC 4.4/4.6/4.7 warn it? >> Why doesn't 64bit GCC 4.2 warn it? > > Good question. It seems that the difference is whether the compiler > generates a field-by-field copy or a call to memcpy(). According

Re: A visualization of GCC's passes, as a subway map

2011-07-12 Thread Xinliang David Li
FYI. If you just want text dump of gcc passes and their on|off settings, option -fdump-passes can be used. This can be enhanced to dump properties and TODOs. David On Tue, Jul 12, 2011 at 9:07 AM, David Malcolm wrote: > On Tue, 2011-07-12 at 09:43 +0100, Paulo J. Matos wrote: >> On 12/07/11 08:2

Re: A visualization of GCC's passes, as a subway map

2011-07-12 Thread Xinliang David Li
On Tue, Jul 12, 2011 at 10:55 AM, David Malcolm wrote: > On Tue, 2011-07-12 at 09:15 -0700, Xinliang David Li wrote: >> FYI. If you just want text dump of gcc passes and their on|off >> settings, option -fdump-passes can be used. This can be enhanced to >> dump properties

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-24 Thread Xinliang David Li
FYI the performance impact of this option with SPEC06 (built with google_46 compiler and measured on a core2 box). The base line number is FDO, and ref number is FDO + reorder_with_partitioning. xalancbmk improves > 3.5% perlbench improves > 1.5% dealII and bzip2 degrades about 1.4%. Note the p

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-25 Thread Xinliang David Li
On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini wrote: > On 07/25/2011 06:42 AM, Xinliang David Li wrote: >> >> FYI  the performance impact of this option with SPEC06 (built with >> google_46 compiler and measured on a core2 box).  The base line number >> is

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-25 Thread Xinliang David Li
Without partition: - 52348639025 branches 454417666 L1-icache-load-misses 14470953 iTLB-load-misses On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini wrote: > On 07/25/2011 06:42 AM, Xinliang David Li wrote: >> >> FYI  the performance impact of t

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-26 Thread Xinliang David Li
On Mon, Jul 25, 2011 at 6:30 PM, Joern Rennecke wrote: > Quoting Xinliang David Li : > >> In xalancbmk, with the partition option, most of object files have >> nonzero size cold sections generated. The text size of the binary is >> increased to 3572728 bytes from 34667

Re: Performance degradation on g++ 4.6

2011-07-29 Thread Xinliang David Li
My guess is inlining differences. Try more aggressive inline parameters to see if helps. Also try FDO to see there is any performance difference between two versions. You will probably need to do first level triage and file bug reports. David On Fri, Jul 29, 2011 at 10:56 AM, Oleg Smolsky wrote

Re: Performance degradation on g++ 4.6

2011-07-29 Thread Xinliang David Li
un the suite with -flto and there are no significant > differences in performance > > What else is there? > > Oleg. > > On 2011/7/29 11:07, Xinliang David Li wrote: >> >> My guess is inlining differences. Try more aggressive inline >> parameters to see if helps

Re: Performance degradation on g++ 4.6

2011-08-01 Thread Xinliang David Li
instance set --param large-function-insns=1 --param large-unit-insns=2 David On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky wrote: > On 2011/7/29 14:07, Xinliang David Li wrote: >> >> Profiling tools are your best friend here. If you don't have access to >> any,

Re: Performance degradation on g++ 4.6

2011-08-03 Thread Xinliang David Li
s much faster. David On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky wrote: > On 2011/7/29 14:07, Xinliang David Li wrote: >> >> Profiling tools are your best friend here. If you don't have access to >> any, the least you can do is to build the program with -pg option and &g

Re: [RFC] Remove -freorder-blocks-and-partition

2011-08-03 Thread Xinliang David Li
On Wed, Aug 3, 2011 at 2:06 PM, Jan Hubicka wrote: >> In xalancbmk, with the partition option, most of object files have >> nonzero size cold sections generated. The text size of the binary is >> increased to 3572728 bytes from 3466790 bytes.  Profiling the program >> using the training input show

Re: FDO and LTO on ARM

2011-08-04 Thread Xinliang David Li
+Mark who has done size optimization tuning with FDO. On Thu, Aug 4, 2011 at 7:05 AM, Mike Hommey wrote: > Hi, > > We (Mozilla) are trying to get the best of the ARM toolchain for our > Android build. I recently built an Android Native-code Development Kit > with GCC 4.6.1 and binutils 2.21.53, i

Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
On Fri, Aug 5, 2011 at 7:40 AM, Jan Hubicka wrote: > Am Fri 05 Aug 2011 09:32:05 AM CEST schrieb Richard Guenther > : > >> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka wrote: > > Did you try using FDO with -Os?  FDO should make hot code parts > optimized similar to -O3 but leave other

Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther wrote: > On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka wrote: Did you try using FDO with -Os?  FDO should make hot code parts optimized similar to -O3 but leave other pieces optimized for size. Using FDO with -O3 gives you the opposit

Re: FDO and LTO on ARM

2011-08-05 Thread Xinliang David Li
> > In a way I like the current scheme since it is simple and extending it > should IMO have some good reason. We could refine -Os behaviour without > changing current predicates to optimize for speed in > a) functions declared as "hot" by user and BBs in them that are not proved > cold. > b) based

Re: FDO and LTO on ARM

2011-08-08 Thread Xinliang David Li
On Fri, Aug 5, 2011 at 3:24 PM, Jan Hubicka wrote: >> > >> > In a way I like the current scheme since it is simple and extending it >> > should IMO have some good reason. We could refine -Os behaviour without >> > changing current predicates to optimize for speed in >> > a) functions declared as "

Re: FDO and LTO on ARM

2011-08-11 Thread Xinliang David Li
On Thu, Aug 11, 2011 at 7:21 AM, Mike Hommey wrote: > On Thu, Aug 04, 2011 at 04:05:25PM +0200, Mike Hommey wrote: >> Hi, >> >> We (Mozilla) are trying to get the best of the ARM toolchain for our >> Android build. I recently built an Android Native-code Development Kit >> with GCC 4.6.1 and binut

Re: Function Multiversioning Usability.

2011-08-16 Thread Xinliang David Li
> * Case II - User Guided Versioning where the function bodies for each > version differ and is provided by the user. > > This case pertains to multi-versioning when the source bodies of the > two or more versions are different and are provided by the user. Here > too, I want to use a new attribute

Re: Function Multiversioning Usability.

2011-08-16 Thread Xinliang David Li
The specifications should apply to virtual member functions too -- though the underlying implementation for MVed virtual functions and virtual calls can be quite different. David On Tue, Aug 16, 2011 at 1:37 PM, Sriraman Tallam wrote: > Hi, > >  I am working on supporting function multi-versioni

Re: Function Multiversioning Usability.

2011-08-17 Thread Xinliang David Li
The gist of previous discussion is to use function overloading instead of exposing underlying implementation such as builtin_dispatch to the user. This new refined proposal has not changed in that, but is more elaborate on various use cases which has been carefully thought out. Please be specific o

Re: Function Multiversioning Usability.

2011-08-17 Thread Xinliang David Li
On Wed, Aug 17, 2011 at 8:12 AM, Richard Guenther wrote: > On Wed, Aug 17, 2011 at 4:52 PM, Xinliang David Li wrote: >> The gist of previous discussion is to use function overloading instead >> of exposing underlying implementation such as builtin_dispatch to the >> u

Re: FDO and LTO on ARM

2011-08-17 Thread Xinliang David Li
On Wed, Aug 17, 2011 at 8:35 AM, Mike Hommey wrote: > On Thu, Aug 11, 2011 at 09:27:23AM -0700, Xinliang David Li wrote: >> > Maybe I have an idea as to why FDO doesn't work so well. Does the >> > instrumentation code support running several times in parallel (as in, &

Re: Function Multiversioning Usability.

2011-08-18 Thread Xinliang David Li
On Thu, Aug 18, 2011 at 12:51 AM, Richard Guenther wrote: > On Wed, Aug 17, 2011 at 6:37 PM, Xinliang David Li wrote: >> On Wed, Aug 17, 2011 at 8:12 AM, Richard Guenther >> wrote: >>> On Wed, Aug 17, 2011 at 4:52 PM, Xinliang David Li >>> wrote: >>>&

Re: Performance degradation on g++ 4.6

2011-08-23 Thread Xinliang David Li
Partial register stall happens when there is a 32bit register read followed by a partial register write. In your case, the stall probably happens in the next iteration when 'add eax, 0Ah' executes, so your manual patch does not work. Try change add al, [dx] into two instructions (assuming esi is

Re: Performance degradation on g++ 4.6

2011-08-24 Thread Xinliang David Li
On Wed, Aug 24, 2011 at 12:50 PM, Oleg Smolsky wrote: > On 2011/8/23 11:38, Xinliang David Li wrote: >> >> Partial register stall happens when there is a 32bit register read >> followed by a partial register write. In your case, the stall probably >> happens in the n

Re: Performance degradation on g++ 4.6

2011-08-24 Thread Xinliang David Li
Thanks. Can you make the test case a standalone preprocessed file (using -E)? David On Wed, Aug 24, 2011 at 2:26 PM, Oleg Smolsky wrote: > On 2011/8/24 13:02, Xinliang David Li wrote: >>> >>> On 2011/8/23 11:38, Xinliang David Li wrote: >>>> >>>> P

Re: Comparison of GCC-4.6.1 and LLVM-2.9 on x86/x86-64 targets

2011-09-07 Thread Xinliang David Li
Why is lto/whole program mode not used in LLVM for peak performance comparison? (of course, peak performance should really use FDO..) thanks, David On Wed, Sep 7, 2011 at 8:15 AM, Vladimir Makarov wrote: >  Some people asked me to do comparison of  GCC-4.6 and LLVM-2.9 (both > released this spr

Re: a nifty feature for c preprocessor

2011-12-29 Thread Xinliang David Li
The idea sounds useful to me .. Or perhaps introduce template into C :) David On Thu, Dec 29, 2011 at 1:12 PM, Ian Lance Taylor wrote: > R A writes: > >>> The gcc developers, and everyone else involved in the development of C >>> as a language, are perhaps not superhuman - but I suspect their

Re: C Compiler benchmark: gcc 4.6.3 vs. Intel v11 and others

2012-01-19 Thread Xinliang David Li
libacml from AMD is also a good candidate to try: http://www.ualberta.ca/AICT/RESEARCH/LinuxClusters/doc/acml350/Linking_002fWindows.html David On Thu, Jan 19, 2012 at 2:59 AM, Richard Guenther wrote: > On Thu, Jan 19, 2012 at 7:37 AM, Marc Glisse wrote: >> On Wed, 18 Jan 2012, willus.com wrote

Re: Unnecessary PRE optimization

2009-12-23 Thread Xinliang David Li
Similar situation happens in non loop context as well. PRE commoned address computation without knowing the existence of advanced addressing mode, which result in unnecessary address computation instruction.  The forward substitution code makes local heuristics and looks at each use individually --

Re: GCC aliasing rules: more aggressive than C99?

2010-01-03 Thread Xinliang David Li
This optimization is usually done with whole program analysis (WPA) or with function cloning or inlining -- e.g., 'i' is an address of a local variable in inlined/cloned callsite ... David On Sun, Jan 3, 2010 at 3:19 PM, Joshua Haberman wrote: > By the way, here is one case I tested where I was

Re: WHOPR bootstrap, when/how?

2010-04-08 Thread Xinliang David Li
Diego, thanks for brining LIPO into discussion. There is a common misunderstanding of LIPO. It is not about partitioning, but about extending single module compilation scope to multiple/cross module. For instance for a build with a.c, b.c, c.c, and d.c, LIPO does not partition them into {a.c b.c}

Re: g++ 4.5.0, end-user disappointment and interrogations

2010-04-21 Thread Xinliang David Li
The dead store problem seems to be a regression in SRA. In 4.4, the struct with array is properly expanded in to scalars allowing copy prop and dead code elimination -- in 4.5, this does not happen. You should file a bug . David On Wed, Apr 21, 2010 at 7:30 PM, tbp wrote: > Hello, > > having fin

Re: g++ 4.5.0, end-user disappointment and interrogations

2010-04-22 Thread Xinliang David Li
On Thu, Apr 22, 2010 at 12:44 AM, Dave Korn wrote: > On 22/04/2010 03:30, tbp wrote: > >> What's the deal with constexpr (or what can i reasonably expect)? > >  You can *reasonably* expect the documented behaviour from the compiler.  Or > you can *un*reasonably ignore the documentation, make ill-i

Re: Fwd: conditional assigments vs. "may be used uninitialized"

2008-10-22 Thread Xinliang David Li
Seongbae Park ??? ??? wrote: David, Just in case you haven't noticed this thread - I figured you may want to comment on it. Seongbae -- Forwarded message -- From: Manuel López-Ibáñez <[EMAIL PROTECTED]> Date: Wed, Oct 22, 2008 at 2:00 PM Subject: Re: conditional assigments vs.

[Announcement] Creating lightweight IPO branch

2009-05-04 Thread Xinliang David Li
Hi, I am going to create a gcc branch for the functionality of lightweight IPO. The description of the project and current status can be found in http://gcc.gnu.org/wiki/LightweightIpo. Some highlights: 1) If you already use FDO in your build, you also get IPO almost for free; 2) It is an IPO sol

Re: [Announcement] Creating lightweight IPO branch

2009-05-05 Thread Xinliang David Li
Andi, On Tue, May 5, 2009 at 1:49 AM, Andi Kleen wrote: > Xinliang David Li writes: >> >> If the idea is generally accepted, I will prepare a series of patches >> and submit them to gcc trunk. > > I was reading your wiki page. Interesting idea. > > One aspect t

Re: [Announcement] Creating lightweight IPO branch

2009-05-05 Thread Xinliang David Li
On Tue, May 5, 2009 at 2:47 AM, Richard Guenther wrote: > On Tue, May 5, 2009 at 7:00 AM, Xinliang David Li wrote: >> Hi, I am going to create a gcc branch for the functionality of >> lightweight IPO. The description of the project and current status can >> be found in ht

Re: [Announcement] Creating lightweight IPO branch

2009-05-05 Thread Xinliang David Li
On Tue, May 5, 2009 at 10:38 AM, Andi Kleen wrote: > On Tue, May 05, 2009 at 10:25:13AM -0700, Xinliang David Li wrote: >> Andi, >> >> On Tue, May 5, 2009 at 1:49 AM, Andi Kleen wrote: >> > Xinliang David Li writes: >> >> >> >> If the

Fwd: [Announcement] Creating lightweight IPO branch

2009-05-07 Thread Xinliang David Li
Forgot to copy the reply to the mailing list. David -- Forwarded message -- From: Xinliang David Li Date: Wed, May 6, 2009 at 10:08 AM Subject: Re: [Announcement] Creating lightweight IPO branch To: Richard Guenther On Wed, May 6, 2009 at 2:00 AM, Richard Guenther wrote

Re: LTO question

2010-04-29 Thread Xinliang David Li
Just curious, what is the base line size of your comparison? Did you turn on GC (-ffunction-sections -fdata-sections -Wl,--gc-sections)? David On Wed, Apr 28, 2010 at 2:44 AM, Bingfeng Mei wrote: > Thanks, I will check what I can do with collect2. LTO > seems to save 6-9% code size for applicati

Re: LTO question

2010-04-29 Thread Xinliang David Li
On Thu, Apr 29, 2010 at 9:28 AM, Bingfeng Mei wrote: > I turned on -ffunction-sections and compiled with -Os. > The size gain at -O2 is less though. Interesting. Thanks, David > > Bingfeng > >> -Original Message----- >> From: Xinliang David Li [mailto:davi...@go

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov wrote: >  GCC-4.5.0 and LLVM-2.7 were released recently.  To understand > where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000 > for x86/x86-64 and posted the comparison of it with the > previous GCC releases and LLVM-2.7. > >  Eve

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
>> > > Thanks for the comments.  FDO will probably improve SPEC2000 score. >  Although it is not obvious for some tests because the train data sets for > them are different from the reference data sets and it might actually > mislead the  compiler. > > FDO is important for optimizations where all p

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
Point well put. The benchmark suite should have good mixture of programs with different sizes. SPEC2k programs cluster at the lower end of the spectrum though. David On Thu, Apr 29, 2010 at 12:43 PM, Vladimir Makarov wrote: > Xinliang David Li wrote: >>> >>> Thanks for th

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote: >> Thanks for the comments.  FDO will probably improve SPEC2000 score. >> Although it is not obvious for some tests because the train data sets >> for them are different from the reference data sets and it might >> actually mislead the  compiler.

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
I noticed eon's peak options do not include FDO, is that intended? David On Thu, Apr 29, 2010 at 2:27 PM, Jan Hubicka wrote: >> Thanks for the comments.  FDO will probably improve SPEC2000 score. >> Although it is not obvious for some tests because the train data sets >> for them are different

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
Thanks for the suggestion. Raksit currently is busy with merging trunk changes back to lw-ipo branch which can be a daunting task. After that this can be done. (Our internal release is based on 4.4). David On Thu, Apr 29, 2010 at 2:38 PM, Steven Bosscher wrote: > On Thu, Apr 29, 2010 at 11:27 P

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-29 Thread Xinliang David Li
On Thu, Apr 29, 2010 at 4:03 PM, Jan Hubicka wrote: >> 2010/4/30 Jan Hubicka : >> >> Thanks for the suggestion. Raksit currently is busy with merging trunk >> >> changes back to lw-ipo branch which can be a daunting task. After that >> >> this can be done.  (Our internal release is based on 4.4).

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-30 Thread Xinliang David Li
On Fri, Apr 30, 2010 at 1:37 AM, Jan Hubicka wrote: >> In theory, LIPO should not generate better results than LTO+FDO. What >> makes LIPO attractive is that it allows distributed build from the >> beginning. Its integration with large distributed build system is also >> easy.  Another point is th

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-04-30 Thread Xinliang David Li
On Fri, Apr 30, 2010 at 11:12 AM, Jan Hubicka wrote: >> > >> > Interesting.  My plan for profiling with LTO is to ultimately make it >> > linktime >> > transform.  This will be more difficult with WHOPR (i.e. instrumenting need >> > function bodies that are not available at WPA time), but I belie

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-05-02 Thread Xinliang David Li
On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote: >> >> Vortex needs -fno-strict-aliasing.  It casts between two record types >> with one record being a 'prefix' of another. > > So today runs are complette.  Thanks to Richi who fixed ICE in symtab merging > that affected perl and GCC.  With vorte

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-05-02 Thread Xinliang David Li
On Sun, May 2, 2010 at 6:45 AM, Jan Hubicka wrote: >> On Sat, May 1, 2010 at 2:36 AM, Jan Hubicka wrote: >> >> >> >> Vortex needs -fno-strict-aliasing.  It casts between two record types >> >> with one record being a 'prefix' of another. >> > >> > So today runs are complette.  Thanks to Richi who

Re: Where does the time go?

2010-05-20 Thread Xinliang David Li
On Thu, May 20, 2010 at 2:09 PM, Ian Lance Taylor wrote: > Steven Bosscher writes: > >> And finally: expand. This should be just a change of IR format, from >> GIMPLE to RTL. I have no idea why this pass always shows up in the top >> 10 of slowest parts of GCC.  Lowering passes on e.g. WHIRL, or

Re: Where does the time go?

2010-05-20 Thread Xinliang David Li
On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher wrote: > On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li > wrote: >> stack variable overlay and stack slot assignments is here too. > > Yes, and for these I would like to add a separate timevar. Agree? Yes. (By the way

Re: Where does the time go?

2010-05-21 Thread Xinliang David Li
On Fri, May 21, 2010 at 2:24 AM, Richard Guenther wrote: > On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li > wrote: >> On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher >> wrote: >>> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li >>> wrote: >&g

stack slot reuse

2010-05-21 Thread Xinliang David Li
On Fri, May 21, 2010 at 2:24 AM, Richard Guenther wrote: > On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li > wrote: >> On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher >> wrote: >>> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li >>> wrote: >&g

Re: stack slot reuse

2010-05-21 Thread Xinliang David Li
On Fri, May 21, 2010 at 10:35 AM, Richard Guenther wrote: > On Fri, May 21, 2010 at 7:30 PM, Xinliang David Li wrote: >> On Fri, May 21, 2010 at 2:24 AM, Richard Guenther >> wrote: >>> On Thu, May 20, 2010 at 11:21 PM, Xinliang David Li >>> wrote: >>>

Re: stack slot reuse

2010-05-26 Thread Xinliang David Li
On Wed, May 26, 2010 at 2:58 AM, Richard Guenther wrote: > On Tue, May 25, 2010 at 10:02 PM, Easwaran Raman wrote: >> On Fri, May 21, 2010 at 10:30 AM, Xinliang David Li >> wrote: >>> >>> On Fri, May 21, 2010 at 2:24 AM, Richard Guenther >>> wro

Re: stack slot reuse

2010-05-27 Thread Xinliang David Li
On Thu, May 27, 2010 at 2:38 AM, Richard Guenther wrote: > On Wed, May 26, 2010 at 6:05 PM, Richard Guenther > wrote: >> On Wed, May 26, 2010 at 5:42 PM, Xinliang David Li >> wrote: >>> On Wed, May 26, 2010 at 2:58 AM, Richard Guenther >>> wrote: &

Re: The impact of the IVOPT changes for our code.

2010-07-29 Thread Xinliang David Li
Thanks for testing it out. There are probably more tuning opportunities for fortran (e.g. larger solution search space, more aggressive pruning, and more advanced loop invariants and register pressure estimation), which I hope someone can continue working on (or me if I find more time). David On

Re: Clustering switch cases

2010-08-27 Thread Xinliang David Li
Another main thing missing is to consider profile information (if available) so that most frequent cases can be peeled out. David On Fri, Aug 27, 2010 at 8:03 AM, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 4:47 PM, Ian Lance Taylor wrote: >> "Paulo J. Matos" writes: >> >>> In the first

Re: Better performance on older version of GCC

2010-08-27 Thread Xinliang David Li
Briefly looked at it -- the trunk gcc also regresses a lot compared to the binary you attached. (To match your binary, also added -mfpmath=387 -m32 options) Two problems: 1) more register spills in the trunk version -- the old compiler seems more effective in using fp stack registers; 2) the comp

Re: Better performance on older version of GCC

2010-08-27 Thread Xinliang David Li
Right -- I missed Richard's previous email regarding the options. Thanks, David On Fri, Aug 27, 2010 at 5:21 PM, Andrew Pinski wrote: > On Fri, Aug 27, 2010 at 5:12 PM, Xinliang David Li wrote: >> Briefly looked at it -- the trunk gcc also regresses a lot compared to >

Re: Worse code generated by PRE

2010-09-29 Thread Xinliang David Li
The optimization does look bad -- splitting backedge to allow expression hoisting rarely removes any redundancy -- unless the loop is really short trip counted. Besides it introduces extra copy, jump instruction and increases register pressure. David On Wed, Sep 29, 2010 at 5:55 AM, Bingfeng Mei

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread Xinliang David Li
I re-measured the performance difference using trunk gcc and trunk clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc because clang/llvm's type based aliasing is not incomplete and not enabled by default. I also added -fomit-frame-pointer to clang/llvm as this is gcc's default. The b

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread Xinliang David Li
On Sat, Nov 13, 2010 at 2:39 PM, Paolo Bonzini wrote: > On 11/13/2010 10:08 PM, Xinliang David Li wrote: >> >> Though gcc leads LLVM in performance overrall, there are a couple of >> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and >> twolf (32

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-14 Thread Xinliang David Li
Thanks, this works. gcc vs llvm 176.gcc: +3.7% 252.eon: +6.1% David On Sat, Nov 13, 2010 at 3:14 PM, H.J. Lu wrote: > On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li wrote: >> >> Though gcc leads LLVM in performance overrall, there are a couple of >> benchmarks

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
wrote: > Hello, > > On 14.11.2010 0:08, Xinliang David Li wrote: >> >> I re-measured the performance difference using trunk gcc and trunk >> clang/llvm on a core-2 box.  -fno-strict-aliasing is added to gcc >> because clang/llvm's type based aliasing is not i

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
t be the case for gcc for now. > > Function inlining definitly helps. -O3 also imply vectorization and other > stuff. > > Honza >> >> Thanks, >> >> David >> >> On Mon, Nov 15, 2010 at 4:29 AM, Andrey Belevantsev wrote: >> > Hello, >> >

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More data come later. 164.gzip13241322 -0.10% 175.vpr16941703 0.51% 176.gcc22932347

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
This means O3 level inlining should be turned on also for lto build by default -- as -O2 lto performance is too unimpressive. David On Mon, Nov 15, 2010 at 3:36 PM, Xinliang David Li wrote: > Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More > data come

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
On Mon, Nov 15, 2010 at 4:25 PM, Jan Hubicka wrote: >> This means O3 level inlining should be turned on also for lto build by >> default -- as -O2 lto performance is too unimpressive. > > I am just re-tunning the inliner and hope to get more speedups for smaller > costs than we get right now.  I h

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
> Fortunately linker plugin solves the problem here and this is why I want to > have it by default.  GCC then can do effectively -fwhole-program for binaries > (since linker knows what will be bound elsewhere) and take advantage of > visibility((hidden)) hints for shared libraries same way.  Most o

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote: >> > Fortunately linker plugin solves the problem here and this is why I want to >> > have it by default.  GCC then can do effectively -fwhole-program for >> > binaries >> > (since linker knows what will be bound elsewhere) and take advantage of

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-15 Thread Xinliang David Li
More performance data: -O2 -funroll-all-loops vs O2: +1.1% geomean O2 O2 unroll-all-loops 164.gzip13241336 0.94% 175.vpr16941670 -1.44%

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-16 Thread Xinliang David Li
23923294 37.69% 256.bzip217191956 13.77% 300.twolf22882404 5.07% David On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li wrote: > More performance data: >

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-16 Thread Xinliang David Li
On Tue, Nov 16, 2010 at 6:35 AM, Jan Hubicka wrote: >> More FDO related performance numbers >> >> Experiment 1:  trunk gcc O2 + FDO vs O2:      FDO improves performance >> by 5% geomean >> Experiment 2: our internal gcc compiler (4.4.3 based with many local >> patches) O2 + FDO vs O2 (trunk gcc):

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-17 Thread Xinliang David Li
63927590687 4.75% 175.vpr/139321135276 -2.90% 252.eon/607704585954 -3.58% 254.gap/496262487289 -1.81% size_sum 4227793 4243308 0.37% On Tue, Nov 16, 2010 at 12:26 AM, Xinliang David Li wrote: >

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-18 Thread Xinliang David Li
On Thu, Nov 18, 2010 at 3:58 AM, Jan Hubicka wrote: >> Some text size measurement. >> >> Summary: >> 1) LTO with -O3 bloats up code considerably; > Yes, you need either -fwhole-program or -fuse-linker-plugin to make it behave > sanely. > > For Mozilla I have best experience with -fuse-linker-plugi

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-18 Thread Xinliang David Li
I found an error in my size experiment set up -- (libstdc++ shared vs non shared) -- please discard the size numbers -- will remeasure. Thanks, David On Thu, Nov 18, 2010 at 4:02 AM, Jan Hubicka wrote: > Hi, > and for size, could you please also do -Os comparsions?  I am aware that -O2 > inline

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-18 Thread Xinliang David Li
On Thu, Nov 18, 2010 at 4:12 PM, Jan Hubicka wrote: > Hi, >> I'll get back to you with our local inlining changes.  We're looking to move >> development closer to trunk to reduce this divergence in the future. >> >> Our tuning was done primarily on big c++ programs.  A significant size >> improvem

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-18 Thread Xinliang David Li
New size data -- hopefully it is sane this time. Changes in experiment 1) shared libstdc++ is used with trunk gcc 2) bfd linker is used in both trunk and patched 4.4.3 compiler (which used gold). The size comparison for all C benchmarks in previous report is still valid. The following is the corr

Re: Should "restrictness" be preserved over function linling and casting?

2010-11-19 Thread Xinliang David Li
A good optimizing compiler tries hard to preserve restrict aliasing of a callee function in its inline instance, and this is usually a hard problem because the use of restrict qualified pointers are now mixed with the caller context. In many cases, the compiler may choose not to inline the functio

Re: Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread Xinliang David Li
As Ian said, you want to make your emulation inline functions available when __SSE2__ macro is not defined so that you get the definitions when -msse2 is not specified, but not getting them when -msse2 is specified. In the future, gcc may be enhanced to exposed those mm intrinsics unconditionally (

Re: "ld -r" on mixed IR/non-IR objects (

2010-12-08 Thread Xinliang David Li
On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >> On 12/07/2010 04:20 PM, Andi Kleen wrote: >>> >>> The only problem left is mixing of lto and non lto objects. this right >>> now is not handled. IMHO still the best way to handle it is to use >>> slim lto and then simply separate link the "left

performance comparison post

2010-12-13 Thread Xinliang David Li
Any comment on the following? http://blog.regehr.org/archives/320 1) is due to lack of non-linear induction variable support 5) is the same problem mentioned in pr35363 I have not looked at the details of others -- there are probably related missed-optimization bugs already filed. Thanks, Davi

  1   2   >