Re: LRA handling of subreg (on AARCH64 with ILP32)

2015-01-15 Thread Richard Biener
On Thu, Jan 15, 2015 at 5:11 AM, Andrew Pinski  wrote:
> Hi,
>   I have some code where we generate some weird code that has stores
> followed by a load from the same location.
> For an example we get:
> add x14, sp, 240
> add x15, sp, 232
> str x14, [sp, 136]
> mov w2, w27
> ldr w1, [sp, 136]
> str x15, [sp, 136]
> ldr w0, [sp, 136]
>
> The RTL originally using an offset of the frame pointer and in DImode
> and then we use it in SImode because pointers are 32bit in ILP32.
> Can you explain how LRA decides to create this code and ways of improving it?

I also wonder why postreload-cse doesn't fix this...?

> This is in perlbench in SPEC CPU 2006. I can provide the preprocessed
> sources (since I am using LTO) if needed.
>
> Thanks,
> Andrew Pinski


Re: looking for support

2015-01-15 Thread Joern Rennecke
On 14 January 2015 at 18:03, vgol...@innovasic.de  wrote:
> Hello out there,
>
> I am looking of some support maintaining the m68k target toolchain (incl GDB) 
> for the fido1100 (basically a CPU32, the real changes are in GDB). Some 
> experience with the m68k target would be helpful.
> Is there someone around that maybe interested in that? Or does someone knows 
> a person that would be intrested?
>
> Just let me know!

Embecosm provides GNU tool chain development and support services.
One of my colleagues will contact you privately to see if we are able to
help you.


Re: organization of optimization options in manual

2015-01-15 Thread Richard Biener
On Thu, Jan 15, 2015 at 12:48 AM, Sandra Loosemore
 wrote:
> The "Options That Control Optimization" section of the manual is
> currently divided into three parts (not subsections, just separate option
> lists):
>
> (1) General options like -O[n]
>
> (2) Options that individually control options enabled by default at some
> -O[n] setting
>
> (3) Options controlling optimizations that aren't enabled by default, or
> that are experimental
>
> I've noted that a lot of options that belong in group (3) have been added to
> group (2), and at least one from (2) into group (3).  I'm thinking that the
> distinction between (2) and (3) is not really useful anyway; there are
> already both lists of which options are enabled at each -O level, and info
> in the descriptions of individual options to say what -O levels they're
> enabled at.
>
> What would you think about reorganizing this section to add some subsections
> grouping options by purpose, instead?  E.g., loop optimizations,
> floating-point optimizations, inlining, LTO, profiling options, etc?  The
> section is almost 60 pages long in the printed manual and adding some
> structure would probably make it easier for users to find things.
>
> The other option would be just to list the options alphabetically, but the
> index already does that if readers know the name of what they're looking for
> (or they can search for it in their browser).

Just to chime in late ...

I think users are interested in (1) (of course) and in additional
things they can enable ontop of them, thus (3).  (2) is not so much
interesting and especially disabling passes enabled by (1) should
be done with care.

I think  what we should do is merge (2) and (3) (because as you have
seen it gets disordered quite easily) and add a section after (1)
explaining what to do to get "more" optimization or what to do when
trying to debug a problem with the compiler.  Stuff like

 Generally enabling or disabling individual options listed in (2)/(3)
 is discouraged as anything but the basic options in (1) receives
 much testing and thus enabling or disabling individual options
 may uncover bugs in the compiler.

 If you are compiling a scientific application try using -ffast-math
 (also enabled by -Ofast) and -funroll-loops.

 If you are concerned for both binary size and runtime performance
 profiling will help you a lot.

 If you run into issues with your application and blame the compiler
 first try -fsanitize=... and -fno-strict-aliasing (also see bugs.html).

After all the documentation of most of the options in (2) and (3)
doesn't mean much to a user of the compiler.

I agree that some options should receive separate sections,
LTO for example, but that has one already.  The individual
options should still be documented in (2)/(3), possibly refering
to the separate section.

Keep in mind that invoke.texi is supposed to be user documentation
and users are generally not compiler developers ;)

Richard.

> -Sandra
>


Re: LRA handling of subreg (on AARCH64 with ILP32)

2015-01-15 Thread Richard Earnshaw
On 15/01/15 04:11, Andrew Pinski wrote:
> Hi,
>   I have some code where we generate some weird code that has stores
> followed by a load from the same location.
> For an example we get:
> add x14, sp, 240
> add x15, sp, 232
> str x14, [sp, 136]
> mov w2, w27
> ldr w1, [sp, 136]
> str x15, [sp, 136]
> ldr w0, [sp, 136]
> 
> The RTL originally using an offset of the frame pointer and in DImode
> and then we use it in SImode because pointers are 32bit in ILP32.
> Can you explain how LRA decides to create this code and ways of improving it?
> 
> This is in perlbench in SPEC CPU 2006. I can provide the preprocessed
> sources (since I am using LTO) if needed.
> 
> Thanks,
> Andrew Pinski
> 

So it looks like it's spilling a 64-bit value, but then reloading it as
a 32-bit value.  That seems quite strange and I can see why the reload
cse pass would miss the connection.

Does this even produce the right code in big-endian, where the stack
slot offset changes slightly?

R.



Re: Unconfirmed boehm-gc test failure

2015-01-15 Thread Kai Tietz
Hi Tom,

- Ursprüngliche Mail -
> Hi Kai,
> 
> I encountered a test failure in boehm-gc (
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 'FAIL: boehm-gc.c/gctest.c
> -O2 execution test' ).
> 
> I would like to ask somebody to confirm the PR,  which hopefully should be as
> simple as patching a .exp for iterated running of a single test (see comment
> 5),
> and running the boehm-gc test suite.
> 
> But I'm not sure who to ask, since there's no maintainer listed in
> MAINTAINERS.
> Any idea?
> 
> Thanks,
> - Tom
> 

Sorry that I answer pretty late on your question.  I will take a look to the PR 
soon.

The boehm-gc stuff was in the past maintained by the java-maintainers.  But 
well, java is a pretty dead piece of code in gcc's repository, so you might 
want to ping here instead the Object-C(++) maintainer.  Later are using 
boehm-gc too for some scenario, too.  So they might be able to take ownership 
for it ...
The boehm-gc code in gcc is actually just a fork of mainline venture.  So you 
might be able to ask there too.

Regards,
Kai


Question about vectorization

2015-01-15 Thread Konstantin Vladimirov
Hi,

Consider simple test:

#include 

#ifdef FAILV

unsigned short* get_aa(void);
double* get_bb(void);

#else

extern unsigned short a[1024];
extern double b[1024];

#endif


unsigned short *foo()
{
size_t i;

#ifdef FAILV
unsigned short * restrict aa = get_aa();
double * restrict bb = get_bb();
#else
unsigned short * restrict aa = a;
double * restrict bb = b;
#endif

for (i = 0; i < 1024; ++i)
{
*bb = *aa;
++bb; ++aa;
}

return aa;
}

Compile it with latest gcc, 4.8 or 4.9:

gcc -O3 -ftree-vectorizer-verbose=2 --std=c99 -S test.c
gcc -O3 -ftree-vectorizer-verbose=2 --std=c99 -S test.c -DFAILV

In second case it outputs:

test.c:28: note: versioning for alias required: can't determine
dependence between *aa_22 and *bb_23

So loop is vectorized but there is unnecessary aliasing check inside.

But AFAIC, due to strong aliasing rules, compiler should statically
know, that aliasing is not possible in that case.

Is it a bug?

---
With best regards, Konstantin


Re: Question about vectorization

2015-01-15 Thread Richard Biener
On Thu, Jan 15, 2015 at 2:22 PM, Konstantin Vladimirov
 wrote:
> Hi,
>
> Consider simple test:
>
> #include 
>
> #ifdef FAILV
>
> unsigned short* get_aa(void);
> double* get_bb(void);
>
> #else
>
> extern unsigned short a[1024];
> extern double b[1024];
>
> #endif
>
>
> unsigned short *foo()
> {
> size_t i;
>
> #ifdef FAILV
> unsigned short * restrict aa = get_aa();
> double * restrict bb = get_bb();
> #else
> unsigned short * restrict aa = a;
> double * restrict bb = b;
> #endif
>
> for (i = 0; i < 1024; ++i)
> {
> *bb = *aa;
> ++bb; ++aa;
> }
>
> return aa;
> }
>
> Compile it with latest gcc, 4.8 or 4.9:
>
> gcc -O3 -ftree-vectorizer-verbose=2 --std=c99 -S test.c
> gcc -O3 -ftree-vectorizer-verbose=2 --std=c99 -S test.c -DFAILV
>
> In second case it outputs:
>
> test.c:28: note: versioning for alias required: can't determine
> dependence between *aa_22 and *bb_23
>
> So loop is vectorized but there is unnecessary aliasing check inside.
>
> But AFAIC, due to strong aliasing rules, compiler should statically
> know, that aliasing is not possible in that case.

It can know but only because it is in a loop.  I belive this is fixed
already for GCC 5.

Richard.



> Is it a bug?
>
> ---
> With best regards, Konstantin


Re: organization of optimization options in manual

2015-01-15 Thread Joel Sherrill

On 1/15/2015 12:15 AM, Jeff Law wrote:
> On 01/14/15 23:12, Sandra Loosemore wrote:
>> On 01/14/2015 08:41 PM, Jeff Law wrote:
>>> With the section being ~60 pages, my first thought is we have way too
>>> many options!
>> Heh, at least we have documentation for all those options.  :-)
>>
>>> But that's not likely to change.  Though perhaps the
>>> process will encourage some culling of options that really don't make
>>> sense anymore.
>> Would we want to remove useless options outright, or deprecate them for
>> a while with removal to happen at some future time, or just deprecate
>> them and/or document that they are not useful?
> We typically deprecate and leave it as a nop for a major release cycle, 
> then do final removal the next major release.
>
>> I guess it can't be any worse than it is now, though, where the whole 60
>> pages is essentially a "misc bucket".  I'll see if I can put together a
>> plan for splitting things up  if there are too many leftovers maybe
>> others can help by suggesting different/additional categories.
> Sounds good.  I think just starting with the list & creating the buckets 
> with the list.  Then post here and we'll iterate and try to nail that 
> down before you start moving everything in the .texi file.
I think this is a great idea.

It may make sense for some options to end up with details in one section
and a reference in another. I am wondering if there are some common
questions users ask about options which could be addressed like this.
Disabling C++ exceptions and RTTI plus the floating point options for
performance which usually come up in Intel C vs GCC benchmarks
come to mind.
> jeff

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985



Re: Extending -flto parallel feature to the rest of the build

2015-01-15 Thread Lewis Hyatt
Well, I guess it's safe to say this did not generate resounding
interest :-). Just thought I would check once more if anyone thought
it was a worthwhile thing to pursue, and/or had any feedback on the
attempt at implementing it. FWIW I have been using this myself for a
while now and enjoy it. Thanks!

-Lewis

On Wed, Dec 17, 2014 at 1:12 PM, Lewis Hyatt  wrote:
> Hello-
>
> I recently started using -flto in my builds, it's a very impressive
> feature, thanks very much for adding it. One thing that occurred to me
> while switching over to using it: In an LTO world, the object files,
> it seems to me, are becoming increasingly less relevant, at least for
> some applications. Since you are already committing to the build
> taking a long time, in return for the run-time performance benefit, it
> makes sense in a lot of cases to go whole-hog and just compile
> everything every time anyway. This comes with a lot of advantages,
> besides fewer large files laying around, it simplifies things a lot,
> say I don't need to worry about accidentally linking in an object file
> compiled differently vs the rest (different -march, different
> compiler, etc.), since I am just rebuilding from scratch every time.
> In my use case, I do such things a lot, and find it very freeing to
> know I don't need to worry about any state from a previous build.
>
> In any case, the above was some justification for why I think the
> following feature would be appreciated and used by others as well.
> It's perhaps a little surprising, or at least disappointing, that
> this:
>
> g++ -flto=jobserver *.o
>
> will be parallelized, but this:
>
> g++ -flto=jobserver *.cpp
>
> will effectively not be; each .cpp is compiled serially, then the LTO
> runs in parallel, but in many cases the first step dominates the build
> time. Now it's clear why things are done this way, if the user wants
> to parallelize the compile, they are free to do so by just naming each
> object as a separate target in their Makefile and running a parallel
> make. But this takes some effort to set up, especially if you want to
> take care to remove the intermediate .o files automatically, and since
> -flto has already opened the door to gcc providing parallelization
> features, it seems like it would be nice to enable parallelizing more
> generally, for all parts of the build that could benefit from it.
>
> I took a stab at implementing this. The below patch adds an option
> -fparallel=(jobserver|N) that works analogously to -flto=, but applies
> to the whole build. It generates a Makefile from each spec, with
> appropriate dependencies, and then runs make to execute it. The
> combination -fparallel=X -flto will also be parallelized on the lto
> side as well, as if -flto=jobserver were specified; the idea would be
> any downstream tool that could naturally offer parallel features would
> do so in the presence of the -fparallel switch.
>
> I am sure this must be very rough around the edges, it's my first-ever
> look at the gcc codebase, but I tried not to make it overly
> restrictive. I only really have experience with Linux and C++ so I may
> have inadvertently specialized something to these cases, but I did try
> to keep it general. Here is a list of potential issues that could be
> addressed:
>
> -For some jobs there are environment variables set on a per-job basis.
> I attempted to identify all of them and came up with COMPILER_PATH,
> LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up
> to date if others are added.
>
> -The mechanism I used to propagate environment variables (export +
> unset) is probably specific to the Bourne shell and wouldn't work on
> other platforms, but there would be some simple platform-specific code
> to do it right for Windows and others.
>
> -Similarly for -pipe mode, I put pipes into the Makefile recipe, so
> there may be platforms where this is not the correct syntax.
>
> Anyway, here it is, in case there is any interest to pursue it
> further. Thanks for listening...
>
> -Lewis
>
> =
>
> diff --git gcc/common.opt gcc/common.opt
> index 3b8b14d..4417847 100644
> --- gcc/common.opt
> +++ gcc/common.opt
> @@ -1575,6 +1575,10 @@ flto=
>  Common RejectNegative Joined Var(flag_lto)
>  Link-time optimization with number of parallel jobs or jobserver.
>
> +fparallel=
> +Common Driver RejectNegative Joined Var(flag_parallel)
> +Enable parallel build with number of parallel jobs or jobserver.
> +
>  Enum
>  Name(lto_partition_model) Type(enum lto_partition_model)
> UnknownError(unknown LTO partitioning model %qs)
>
> diff --git gcc/gcc.c gcc/gcc.c
> index a5408a4..6f9c1cd 100644
> --- gcc/gcc.c
> +++ gcc/gcc.c
> @@ -1716,6 +1716,73 @@ static int have_c = 0;
>  /* Was the option -o passed.  */
>  static int have_o = 0;
>
> +/* Parallel mode  */
> +static int parallel = 0;
> +static int parallel_ctr = 0;
> +static int parallel_sctr = 0;
> +static enum {
> +  parallel_mode_off,
> +  parallel_mode_first_job_in_spec,
> +  pa

Re: organization of optimization options in manual

2015-01-15 Thread Joseph Myers
On Wed, 14 Jan 2015, Sandra Loosemore wrote:

> What would you think about reorganizing this section to add some subsections
> grouping options by purpose, instead?  E.g., loop optimizations,
> floating-point optimizations, inlining, LTO, profiling options, etc?  The
> section is almost 60 pages long in the printed manual and adding some
> structure would probably make it easier for users to find things.

For the floating-point options the classification as optimization options 
is a bit questionable anyway; they're more like language dialect options 
(with some dialects being more optimizable than others).  But while you 
might have divisions such as (language dialect options, options such as 
LTO or profiling that may affect your build system, non-semantic options 
that are quite likely to be useful in normal use, miscellaneous options 
enabled by -On that it probably only makes sense to turn on and off 
individually when debugging the compiler), organizing the manual that way 
could have issues with closely related options falling in different 
categories.

(E.g. -Ofast is like the other -O options, but also changes the language 
dialect by enabling -ffast-math.  -fno-strict-aliasing can be used as a 
language dialect option, but unless you're doing that sort of aliasing 
it's an option it doesn't make much sense to turn on or off on its own 
rather than through -O2.  -fmerge-all-constants is a language dialect 
option, but is naturally described next to the non-semantic option 
-fmerge-constants.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: will openacc 2.0 be merged into trunk?

2015-01-15 Thread Mark Farnell
It is already January 16, I have just seen the notice that OpenMP
offloading is already merged into the trunk.  However, from SVN, the
openacc stuff is still not yet merged.

Will it be able to be merged today?  Otherwise openacc will not make
it to GCC 5.0

Are there any issues that prevents OpenACC 2.0 from reaching the
trunk?  And does the GCC core team intend to include OpenACC 2.0 as a
feature of GCC 5.0?



On Fri, Jan 9, 2015 at 9:04 AM, Mark Farnell  wrote:
> Currently, OpenACC 2.0 is in gomp-4_0-branch, but this email:
>
> https://gcc.gnu.org/ml/gcc/2015-01/msg00032.html
>
> says that gcc 5.0 will enter stage 4 on Friday 16th January, and from
> that point onward, only bug fixing patches will be accepted.
>
> So will gomp-4_0-branch be able to be merged into the trunk before
> Friday 16 January?


gcc-4.8-20150115 is now available

2015-01-15 Thread gccadmin
Snapshot gcc-4.8-20150115 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20150115/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 219694

You'll find:

 gcc-4.8-20150115.tar.bz2 Complete GCC

  MD5=f5d42978427b107260a94006b3aef8d3
  SHA1=3313e4d007207a63d568a0128a7f3de140f6fea4

Diffs from 4.8-20150108 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.