Re: [cfe-dev] [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Nathan Froyd
On Tue, Oct 18, 2016 at 8:59 AM, Ludovic Courtès via cfe-dev
 wrote:
> Shea Levy  skribis:
>
>> Your patches look good! My biggest concern is how the ld wrapper behaves
>> in the presence of response files. Have you tested that?
>
> It surely doesn’t (yet?).
>
> However, GCC does not pass “@file” arguments when it invokes ‘ld’, and
> the bug report you mentioned¹ talks about GHC invoking ‘gcc’, not ‘ld’,
> so I guess it’s fine to ignore response files in the ld wrapper.

GCC will pass response files to ld when response files were used in
the invocation of GCC.

-Nathan


Re: Target deprecations for 4.6

2011-02-11 Thread Nathan Froyd
On Fri, Jan 28, 2011 at 01:11:10AM +, Joseph S. Myers wrote:
> Here is a concrete list I propose for deprecation in 4.6; please send
> any other suggestions...

score-* doesn't have a maintainer and score-elf couldn't build libgcc
last I checked (it was also mentioned in your previous message).

crx-*?  crx-elf can't built libgcc, and hasn't been able to for a while.

-Nathan


Re: Target deprecations for 4.6

2011-02-14 Thread Nathan Froyd
On Sat, Feb 12, 2011 at 08:11:07AM -0500, David Edelsohn wrote:
> On Fri, Feb 11, 2011 at 9:15 PM, Joseph S. Myers
>  wrote:
> > appear to involve a simultaneously maintained set of upstream components
> > that are usable together in their current upstream forms; they got Linux
> > kernel support upstream in 2009 (and don't seem to have maintained it much
> > since then), some time after they got GCC support upstream (and then
> > stopped maintaining it).
> 
> The SCORE port was accepted and maintainers appointed with the
> understanding that lack of maintenance would lead to rapid deprecation
> and removal.
> 
> I would suggest directly sending a message to the last contacts and
> any other contact email address for SCORE that the port must function
> and test results posted or it will be deprecated and removed.  That
> the GCC community needs to see action, not future promises.

Patch for adding score-* and crx-* to obsolete ports below.  Last
contact for SCORE and current crx maintainer CC'd.

OK to commit?

-Nathan

* config.gcc: Declare score-* and crx-* obsolete.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 54b822e..0f7050d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -237,6 +237,7 @@ case ${target} in
  | alpha*-*-gnu*   \
  | arm*-*-netbsd*  \
  | arm-*-pe*   \
+ | crx-*   \
  | i[34567]86-*-interix3*  \
  | i[34567]86-*-netbsd*\
  | i[34567]86-*-pe \
@@ -247,6 +248,7 @@ case ${target} in
  | m68k-*-uclinuxoldabi*   \
  | mcore-*-pe* \
  | powerpc*-*-gnu* \
+ | score-* \
  | sh*-*-symbianelf*   \
  | vax-*-netbsd*   \
  )


Re: Target deprecations for 4.6

2011-02-22 Thread Nathan Froyd
On Mon, Feb 14, 2011 at 09:41:50AM -0800, Nathan Froyd wrote:
> Patch for adding score-* and crx-* to obsolete ports below.  Last
> contact for SCORE and current crx maintainer CC'd.

I have committed this patch.  The crx maintainer (Pompapathi Gadad)
contacted me via private mail and indicated it would be OK to obsolete
the crx port.  Following David and Joseph's comments, along with the
lack of activity for SCORE targets for the past several years[*], I think
the SCORE port is overdue.

I will commit a patch to the website shortly.

-Nathan

[*] Though I think some of the other backends might qualify under the
inactivity criterion.  We can save those battles for 4.7, though...


Re: how can I write a right V32QI Unpack Low Data insn pattern?

2011-03-02 Thread Nathan Froyd
On Wed, Mar 02, 2011 at 07:14:53AM -0800, Ian Lance Taylor wrote:
> This patch should at least cause genrecog to crash for you rather than
> generating bogus output.  I've verified that this patch bootstraps on
> x86_64 and makes no difference in the generated insn-recog.c.  Can you
> see whether this gives you a crash?  Any opinion on whether I should
> commit this to mainline?
>  
> +   gcc_assert (i < 26);
> + gcc_assert (j < 26);
> +  gcc_assert (j - 1 < 26);

Is it worthwhile pulling out the 26 into a #define somewhere?  (Maybe
not, as there are pre-existing 26-esque constants elsewhere?)

-Nathan


Handling strictness in {predicates,constraints}.md [was: Re: Converting CRIS to constraints.md]

2011-03-10 Thread Nathan Froyd
[moving to gcc@ to get input from a wider audience]

On Thu, Mar 10, 2011 at 06:47:20AM +0100, Hans-Peter Nilsson wrote:
> > From: Nathan Froyd 
> > On Thu, Mar 10, 2011 at 04:02:27AM +0100, Hans-Peter Nilsson wrote:
> > > PS. If you really feel for it, I won't stop you converting MMIX. :)
> > 
> > Heh.  I looked at doing MMIX; I think the only tricky thing might be
> > dealing with the 'U' constraint.
> 
> Hm.  Speaking of macros with semantics different depending on
> REG_OK_STRICT being defined (should be just register and address
> constraints), how do you do that in constraints.md?  I looked
> around but haven't found the answer.  I guess you've bumped into
> that problem a few times now, or some converted target will find
> out the hard way?
> 
> (Please CC any reply to gcc@)

I haven't ran into that problem; all the targets I've converted to
constraints.md haven't had constraints that changed based on strictness.
I think the right thing to do is depend on
reload_{in_progress,completed}
(cf. rs6000/predicates.md:volatile_memory_operand), but I freely admit
this is a part of the compiler that I'm not familiar with.

In fact, I think the only targets remaining for constraints.md are
(besides h8300, which I haven't received an ack on yet) are:

- mmix ('U' dependent on strictness);
- cris (all interesting constraints based on strictness, I think);
- m32c (tedious to convert due to m32c-specific encode_pattern).

so knowing how to deal with strictness would help a great deal.

-Nathan


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Nathan Froyd
On Fri, Apr 29, 2011 at 09:18:56AM +0200, Paolo Bonzini wrote:
> * Get rid of EXPR_LIST and INSN_LIST

This is reasonably difficult, though particular subprojects may be easy
enough.  Notable uses of EXPR_LIST:

- loop-iv.c

- the interface to TARGET_FUNCTION_VALUE

- the scheduler

- REG_NOTES

- var-tracking.c

- reload

Notable uses of INSN_LIST:

- the scheduler

- reload

- gcse.c

The biggest uses of each in the scheduler ought to be easy to deal with,
but the scheduler manipulates the lists in peculiar ways.

> * cxx_binding should be 16 bytes, not 20.

Not your fault, but comments like this on SpeedupAreas are so opaque as
to be useless.  *Why* should cxx_binding be 16 bytes?  Should we take
the next member out and have a VEC someplace instead of chaining?  Are
we duplicating information in the members themselves?  Etc.

-Nathan


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Nathan Froyd
On Fri, Apr 29, 2011 at 04:20:15PM +0200, Paolo Bonzini wrote:
> On 04/29/2011 04:15 PM, Nathan Froyd wrote:
>>> >  * cxx_binding should be 16 bytes, not 20.
>>
>> Not your fault, but comments like this on SpeedupAreas are so opaque as
>> to be useless. *Why* should cxx_binding be 16 bytes?  Should we take
>> the next member out and have a VEC someplace instead of chaining?  Are
>> we duplicating information in the members themselves?  Etc.
>
> Sorry, you're right.  It's about cache lines I guess, and moving the  
> bitfields into one of the pointers.

Gross. :)

-Nathan


Re: how to distinguish patched GCCs

2011-05-27 Thread Nathan Froyd
On Fri, May 27, 2011 at 06:30:21AM -0700, Ian Lance Taylor wrote:
> Jonathan Wakely  writes:
> > It's an additional maintenance burden.
>
> It's not a maintenance burden on gcc, though.
>
> I think we should have the gcc configure script provide a way to add a
> preprocessor macro.

FWIW, we decided we needed similar capabilities in our compilers and
decided to add a --with-specs option, which enables additional
flexibility, particularly WRT optimization capabilities.

Below is the patch against our 4.5 tree, written by Nathan Sidwell.

-Nathan

* configure.ac (--with-specs): New option.
* configure: Regenerated.
* gcc.c (driver_self_specs): Include CONFIGURE_SPECS.
* Makefile.in (DRIVER_DEFINES): Add -DCONFIGURE_SPECS.

Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 271022)
+++ gcc/gcc.c   (revision 271023)
@@ -955,7 +955,7 @@ static const char *const multilib_defaul

 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %

Re: Gimple Pass

2009-07-24 Thread Nathan Froyd
On Fri, Jul 24, 2009 at 05:20:16AM -0700, pms wrote:
>   But I want to know what are the TREE_CODEs for other remaing constructs
> viz declration stmt, conditions, count for constants  and how to use them in
> the gimple pass. Can anybody help in this regard

The names are defined in tree.def.

-Nathan


Re: Preserving the argument spills for GDB

2009-11-04 Thread Nathan Froyd
On Wed, Nov 04, 2009 at 11:24:34AM -0500, Jean Christophe Beyler wrote:
> However, I've been going through the first step : running GDB, setting
> a break-point and doing a continue to see what I get and try to get
> the information right for O3 too.
> 
> In O0, I get:
> Breakpoint @@ 1, foo (a=4, b=3, c=2, d=1) at hello.c:10
> 
> In O3, I get:
> Breakpoint @@ 1, foo (a=Variable "a" is not available.) at hello.c:11
> 
> It seems that, in the O0 case, the Dwarf information is automatically
> propagated to say "The input register is now here", but when I do it
> in O3, I'm issuing the information in the same way.
> 
> What am I exactly missing? Any ideas why GDB would not have enough
> information in this case?

You should look at the DWARF information (readelf -wi) and see if the
function parameters have DW_AT_location attributes.  If they don't, then
you need to ensure that they get generated.  If they do, then perhaps
they are wrong or GDB is not interpreting them correctly.  (They get
generated with optimization and interpreted correctly on other platforms
that pass args in registers.)

-Nathan


Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Nathan Froyd
On Mon, Jan 04, 2010 at 04:08:17PM +, Andrew Haley wrote:
> On 01/04/2010 12:07 PM, Jakub Jelinek wrote:
> > IMHO we really should have some late tree pass that converts adjacent
> > bitfield operations into integral operations on non-bitfields (likely with
> > alias set of the whole containing aggregate), as at the RTL level many cases
> > are simply too many instructions for combine etc. to optimize them properly,
> > while at the tree level it could be simpler.
> 
> Yabbut, how come RTL cse can handle it in x86_64, but PPC not?

Probably because the RTL on x86_64 uses and's and ior's, but PPC uses
set's of zero_extract's (insvsi).

-Nathan


Re: RTL question for I64

2010-02-11 Thread Nathan Froyd
On Thu, Feb 11, 2010 at 09:43:31AM -0800, Douglas B Rupp wrote:
> A pointer would be much appreciated!
>
> In ia64.md for *cmpdi_normal this is found:
> "cmp.%C1 %0, %I0 = %3, %r2"
>
> Where are %C, %I, %r described?

Above gcc/config/ia64/ia64.c:ia64_print_operand.

-Nathan


Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Nathan Froyd
On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote:
> In it, you will find a directory with all the source code
> needed to observe the problem for yourself.
> The top level directory contains a linux executable called
> maxFlo, which you should be able to run on a linux box
> as is. But if you want/need to compile things yourself,
> type "make clean" and "make all" in the top level
> directory and you should get a fresh copy of maxFlo.

So, compiling maxFlo with no -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 

real0m3.465s
user0m3.460s
sys 0m0.000s

Compiling maxFlo with -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 

real0m9.780s
user0m9.760s
sys 0m0.010s

Notice that ~60% of the running time with gprof enabled is simply
overhead from call counting and the like.  That time isn't recorded by
gprof.  That alone accounts for your report about gprof ignoring 2/3 of
the execution time.

Checking to see whether maxFlo is a dynamic executable (since you
claimed earlier that you were statically linking your program):

@nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo 
  linux-vdso.so.1 =>  (0x7fff2977f000)
  libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000)
  libm.so.6 => /lib/libm.so.6 (0x7fb42299d000)
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000)
  libc.so.6 => /lib/libc.so.6 (0x7fb422417000)
  /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000)

So calls to shared library functions (such as functions in libm) will
not be caught by gprof.  Those calls count account for a significant
amount of running time of your program and gprof can't tell you about
them.

Inspecting the gmon.out file:

@nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 16.09  0.37 0.3727649 0.00 0.00  shortPath::findPath()
 12.61  0.66 0.29 55889952 0.00 0.00  graph::next(int,int) const
 11.96  0.94 0.28 61391904 0.00 0.00  graph::mate(int,int) const
 10.87  1.19 0.25 58654752 0.00 0.00  flograph::res(int,int) 
const
 10.44  1.43 0.24 _fini
  6.96  1.59 0.16 65055289 0.00 0.00  graph::term(int) const
  6.96  1.75 0.16 61391904 0.00 0.00  digraph::tail(int) const
[...lots of stuff elided...]
  0.00  2.30 0.001 0.00 0.00  graph

gprof is telling you about 2.3 seconds of your execution time.  With the
factors above accounted for, that doesn't seem unreasonable.

-Nathan


Re: Compiling OpenCV 2.0 on Solaris 10 with GCC 4.4.3

2010-03-10 Thread Nathan Froyd
On Wed, Mar 10, 2010 at 01:30:36PM +, Nick Fielding wrote:
> The first syntax error the compiler complains about I think is the main 
> problem:
> /tmp/OpenCV-2.0.0/src/cxcore/cxmatmul.cpp:673: error: expected ',' or '...' 
> before numeric constant
> 
> However, when I look at the line of code it's complaining about I can see 
> absolutely nothing wrong with it:
> 
>...
>673  void gemm( const Mat& _A, const Mat& _B double alpha,
   ^

Looks like a missing comma.

-Nathan


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Nathan Froyd
On Thu, Mar 18, 2010 at 04:34:56PM +0100, Vincent Lefevre wrote:
> On 2010-03-18 15:32:04 +0100, Michael Matz wrote:
> > But unfortunately you are right, this expansion can only be done for
> > -fno-signed-zeros. (FWIW the general expandsion of pow(x,N/2) where
> > N!=1 is already guarded by unsafe_math, but for N==1 we do it
> > unconditionally).
> 
> If GCC is able to track range of values, the transformation could
> be allowed when it can be proved that the value -0 is not possible
> (e.g. when x > 0).

That would be useful, and other language implementations do similar
things.  GCC's VRP (value range propagation pass) does not handle
floating-point values at the moment, though.

-Nathan


Re: Vanilla cross compiling and libstdc++v3

2010-04-05 Thread Nathan Froyd
On Mon, Apr 05, 2010 at 10:29:07AM -0430, Kofi Doku Atuah wrote:
> The process of building a simply, plain vanilla cross compiler for
> arch-fmt-no_os is really probably overdone. To build, for example, a
> GCC cross compiler for an i586-elf target, the build process requires
> you to have a libc for the target, and then from there, the build
> process uses the features in your vanilla target's libc to decide how
> to configure libstdc++v3.
> 
> However, anyone building a vanilla cross compiler either doesn't yet
> *have* a standard lib for his kernel project as yet, or isn't yet
> interested in building an os-specific toolchain for arch-fmt-his_os as
> yet. Therefore the assumption that there would be a standard library,
> or libc, or even that the person even *wants* a libstdc++ with his
> vanilla build is incorrect.

Have you tried configuring with --enable-languages=c?  Doing so should
ensure that libstdc++ is not configured for your target.

-Nathan


Re: lower subreg optimization

2010-04-06 Thread Nathan Froyd
On Tue, Apr 06, 2010 at 09:58:23AM -0700, Ian Lance Taylor wrote:
> In the code the register is always accessed via a subreg, so the
> lower-subregs pass thinks that it is OK to decompose the register.
> Once it is decomposed, nothing is expected to put it back together.
> 
> To fix this, you should probably look at simple_move in
> lower-subreg.c.  You will want it to return NULL_RTX for a vector load
> or store.  Perhaps it should check costs, or perhaps it should never
> decompose explicit vector modes.

Compiling anything that uses doubles on powerpc e500v2 produces awful
code due in part to lower-subregs (the register allocator doesn't help,
either, but that's a different story).  Code that looks like:

  rY:DI = r:DI
  rX:DI = rY:DI
  (subreg:DF rZ:DI 0) = rX:DI

 is a hard register for argument passing; the code looks equally
awful inside of a function, too.  The above gets lowered to:

1:  r:SI = r:SI
2:  r:SI = r:SI
3:  (subreg:SI rX:DI 0) = r:SI
4:  (subreg:SI rX:DI 4) = r:SI
5:  (subreg:DF rZ:DI 0) = rX:DI

which usually results in two stores and a load against the stack, rather
than a single-instruction dealing entirely in registers.  I realize
e500v2 is not exactly a mainstream target, but perhaps a target hook is
appropriate here?  I suppose checking costs might achieve the same
thing.

-Nathan


Re: lower subreg optimization

2010-04-06 Thread Nathan Froyd
On Tue, Apr 06, 2010 at 11:55:01AM -0700, Ian Lance Taylor wrote:
> Nathan Froyd  writes:
> > Compiling anything that uses doubles on powerpc e500v2 produces awful
> > code due in part to lower-subregs (the register allocator doesn't help,
> > either, but that's a different story).
> 
> I doubt that a target hook is required to avoid this.  Perhaps
> simple_move_operand should reject a mode changing subreg when the two
> modes are !MODE_TIEABLE_P.

Ah, thanks for the pointer. I'll try poking at that.

> This code is sort of weird, though; why the conversion from DImode to
> DFmode?

Welcome to the wonderful world of e500, which has floating-point
instructions operating on the general purpose registers.

-Nathan


Re: i386 SSE Test Question

2010-04-12 Thread Nathan Froyd
On Mon, Apr 12, 2010 at 09:47:04AM -0500, Joel Sherrill wrote:
> qemu with no cpu argument specified.  So qemu32.
> It does run OK when I change the cpu model to 486
> or pentium.
>
> Is qemu reporting that it supports SSE and not doing a good
> enough job to make gcc happen?

I think that's quite likely.

-Nathan


Re: Notes from the GROW'10 workshop panel (GCC research opportunities workshop)

2010-04-14 Thread Nathan Froyd
On Wed, Apr 14, 2010 at 11:30:44AM -0400, Diego Novillo wrote:
> On Wed, Apr 14, 2010 at 11:18, Manuel López-Ibáñez
>  wrote:
> > Otherwise, as Ian said in another topic [2]: "I have a different fear:
> > that gcc will become increasing irrelevant".
> 
> That's my impression, as well.  It is true of just about every code
> base, if it cannot attract new developers, it stagnates and eventually
> whithers away.
> 
> To attract new developers, GCC needs to modernize its internal
> structure.  I have some thoughts on that, but progress has been slow,
> due mostly to resource constraints.

Would you mind expanding--even just a little bit--on what bits need
modernizing?  There's things like:

http://gcc.gnu.org/wiki/Speedup_areas

and perhaps:

http://gcc.gnu.org/wiki/general_backend_cleanup

But neither of those really touches the middle-end, which is where I
presume the grousing vis-a-vis GCC vs. LLVM is really generated from.
Or it's the front-end support.  I don't know.

I know there are ugly parts still remaining in GCC.  But my experience
(extending/parameterizing an LLVM optimization pass, writing/modifying
GCC middle-end optimization passes, some GCC backend hacking) suggests
that the complexity is similar.  I think concrete "I tried X and it
sucked" or "these are the areas of suckage" would be helpful.

-Nathan


Re: Notes from the GROW'10 workshop panel (GCC research opportunities workshop)

2010-04-14 Thread Nathan Froyd
On Wed, Apr 14, 2010 at 09:49:08PM +0200, Basile Starynkevitch wrote:
> Toon Moene wrote:
>>
>> Mutatis mutandis, the same goes for GCC: There might be too many 
>> hurdles to use GCC in academia.  
>
> This is probably true, however, the plugin ability of the just released  
> GCC 4.5 (or is it released tomorrow) helps probably significantly.
>
> Academics (even people working in technological research institutes like  
> me) will probably be more able to practically contribute to GCC thru the  
> plugin interface. It brings two minor points: a somehow defined plugin  
> API (which is a sane "bottleneck" to the enormity of GCC code), and the  
> ability to practically publish code without transfering copyright to FSF  
> (in the previous situation, the only way to avoid that was to create a  
> specific GPLv3 fork of GCC; in practice it is too expensive in labor for  
> academia).

I appreciate the point about the difficulty of copyright transference in
an academic environment, having gone through such difficulties myself.
But I think you are confusing "using GCC as a base for your research
activities" and "getting the results of that research accepted
upstream".  I think plugins help in the first category insofar as they
force GCC to clearly define interface boundaries.  But they have little
effect concerning the second category.

Perhaps people will be able to make their code more widely available:
the plugin interface will likely be relatively stable (I realize this is
not guaranteed) and people can therefore release easily compilable
packages.  Before, you would be forced to distribute (and maintain!)
patch files that may need significant changes from release to release.

-Nathan


Re: GCC 4.5.0 Released

2010-04-20 Thread Nathan Froyd
On Mon, Apr 19, 2010 at 09:35:44AM -0400, Jack Howarth wrote:
>The annoucement should probably note that targets which lack
> objdump currently can't build plugins. I've had about as much
> luck getting the patch to fix this...
> 
> http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00610.html
> 
> ...reviewed as you are likely to get this feature announced.
> From the responses in the "dragonegg in FSF gcc?" thread, it
> is clear that the plugin feature is viewed more as a necessary
> evil than a clear win for FSF gcc in some quarters.

Perhaps you should ping the original patch, with a CC to the build
maintainers in case they missed the patch, or are/were on vacation, or
whatever.  A simple ping on the original thread would do more good than
sniping in an unrelated thread.

Speaking only for myself, I also think it's unfortunate that the above
impression is the impression that you took away from the "dragonegg in
FSF gcc?" thread; I don't think that's a proper reading of the thread at
all.

-Nathan


Re: Passing options down to assembler and linker

2010-04-23 Thread Nathan Froyd
On Fri, Apr 23, 2010 at 01:55:48PM -0400, Jean Christophe Beyler wrote:
> I know we can pass -Wl,option, -Wa,option from gcc down to as and ld
> however if I have to write :
> 
> gcc -mArch2 -Wl,--arch2 -Wa,--arch2 hello.c
> 
> it gets a bit redundant, I must be blind because I can't seem to find
> how to do it internally.

You want to get comfortable with specs:

http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Spec-Files.html#Spec-Files

and building in what specs should be handled by default:

http://gcc.gnu.org/onlinedocs/gccint/Driver.html#Driver

The ones you're interested in are ASM_SPEC and LINK_SPEC.

-Nathan


Re: gcc-4.3.2 generated "vmhraddshs" instruction when I compiled with -mcpu=8540

2008-12-16 Thread Nathan Froyd
On Tue, Dec 16, 2008 at 12:28:10PM +0700, Ha Luong wrote:
> I used gcc-4.3.2 to compile the c source(*) and it generated
> "vmhraddshs" instruction when I compiled with -mcpu=8540.
> 000103A4: 11EB0321  vmhraddshs vr15,vr11,vr0,vr12

You are running into the problem that the Altivec and SPE opcode spaces
overlap, so the instructions that you see generated depend on what
disassembler you use.  For example, compiling the assembler file test.s:

.text
.long 0x10074B20
.long 0x11eb0321

with:

  powerpc-linux-gnu-gcc -c -o test.o test.s

and disassembling:

  powerpc-linux-gnu-objdump -d test.o

gives you:

Disassembly of section .text:

 <.text>:
   0:   10 07 4b 20 vmhaddshs v0,v7,v9,v12
   4:   11 eb 03 21 vmhraddshs v15,v11,v0,v12

but if you tell objdump to disassemble e500 instructions instead (use -M
e500x2 to include e500v2 instructions):

  powerpc-linux-gnu-objdump -d -M e500 test.o

you see:

Disassembly of section .text:

 <.text>:
   0:   10 07 4b 20 evstddx r0,r7,r9
   4:   11 eb 03 21 evstdd  r15,0(r11)

The compiler is generating evstdd{,x} instructions, not vmhraddshs
instructions.

-Nathan


Re: gcc-4.3.2 generated "vmhraddshs" instruction when I compiled with -mcpu=8540

2008-12-17 Thread Nathan Froyd
On Wed, Dec 17, 2008 at 01:33:38PM +0700, Ha Luong wrote:
> Thanks for your guide. When I debugged the exe file or make it ran on
> 8548 board, the vmhaddshs makes the exe file hang out. Could I compile
> the source for 8540 (e500v1 instructions) only?

Sure.  But evstdd{,x} are e500v1 instructions too, so you'll still see
them in the generated exe file.

My guess is that you're taking alignment traps on those instructions.
You need to compile with -mabi=spe in addition to -mcpu=8540 to ensure
that the stack and register save areas are properly aligned.

-Nathan


Re: Binary Autovectorization

2009-01-29 Thread Nathan Froyd
On Thu, Jan 29, 2009 at 04:46:37PM -0500, Rodrigo Dominguez wrote:
> I am looking at binary auto-vectorization or taking a binary and rewriting
> it to use SIMD instructions (either statically or dynamically). I was
> wondering if anyone knew of similar work and could help me with some links.

Anshuman Dasgupta did some work at Rice University on binary
autovectorization of x86 binaries.  See:

  http://www.cs.rice.edu/~keith/pubs/LACSI02.pdf

His master's thesis might also be available online.

-Nathan


Re: Question about type conversion when GCC compile the file?

2009-02-18 Thread Nathan Froyd
On Wed, Feb 18, 2009 at 06:03:58PM +0800, JCX wrote:
> Hello,
>  After I compile the following file for testing, I check the dump
> file called "129t.final_cleanup". I doubt about why the type "short
> int" changes into "short unsigned int" during the array operations,
> and at last changes back to "short int" when it stores the result into
> memory.
> [...]
>  I don't know why GCC do such a type conversion.  Can anyone tell me?
> This makes me more difficult to do the optimization in GCC.

Because GCC converts to short unsigned int to avoid arithmetic 


Re: Question about type conversion when GCC compile the file?

2009-02-18 Thread Nathan Froyd
On Wed, Feb 18, 2009 at 05:55:43AM -0800, Nathan Froyd wrote:
> On Wed, Feb 18, 2009 at 06:03:58PM +0800, JCX wrote:
> > Hello,
> >  After I compile the following file for testing, I check the dump
> > file called "129t.final_cleanup". I doubt about why the type "short
> > int" changes into "short unsigned int" during the array operations,
> > and at last changes back to "short int" when it stores the result into
> > memory.
> > [...]
> >  I don't know why GCC do such a type conversion.  Can anyone tell me?
> > This makes me more difficult to do the optimization in GCC.
> 
> Because GCC converts to short unsigned int to avoid arithmetic 

Bah, set this before it was complete.  GCC converts to short unsigned
int to avoid signed arithmetic overflow, which is undefined.  See
convert.c:convert_to_integer.

-Nathan


Re: constant propagation optimization

2009-03-05 Thread Nathan Froyd
On Thu, Mar 05, 2009 at 11:39:45AM +, charfi asma wrote:
> intc;

> int main()
> 
> {
> 
> Calcul ca;
> 
> c=3;
> 
> ca.affich();
> 
> ca.inc(c);
> 
> cout << "the value of c is" << c << endl;
> 
> return 0;
> 
> }
[...]
> int main()
> 
> {
> 
> Calcul ca;
> 
> ca.affich();
> 
> c=3;
> 
> ca.inc(c);
> 
> cout << "the value of c is" << c << endl;
> 
> return 0;
> 
> }
> 
> Why in the fist code, c is not considered as a constant (in spite that
> affich() does not change c)

Because GCC does not currently do the necessary analysis to know that
affich() does not change c; it therefore makes the conservative
assumption that it does.

-Nathan


Re: generating functions and eh region

2009-04-03 Thread Nathan Froyd
On Fri, Apr 03, 2009 at 08:05:47PM +0100, Dave Korn wrote:
> Ian Lance Taylor wrote:
> > No fundamental difficulty that I know of.  Lots of tedious work for
> > every backend setting RTX_FRAME_RELATED_P and adding
> > REG_FRAME_RELATED_EXPR notes to the manually constructed epilogue insns.
> 
>   I think we're only proposing to add SEH support for platforms where the OS
> supports it.  Would it be an impediment to a patch if it only fixed -fa-u-t
> for platforms where we needed epilogue handling to work correctly, and left it
> up to other target maintainers to decide for themselves if they needed this
> functionality on their platforms?

If you're going to seriously consider doing this, you may want to take:

http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01091.html

as a starting point.

-Nathan


Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-21 Thread Nathan Froyd
On Thu, May 20, 2010 at 08:59:32PM -0700, Mark Mitchell wrote:
> David Edelsohn wrote:
> > No one disagrees with the potential benefit of the feature.
> 
> OK; I must have misremembered.
> 
> I believe our current implementation keeps track of FP usage through the
> front-end, and then disables any floating-point registers by futzing
> with fixed_regs and such when compiling each function.  There appear to
> be no back-end specific patches at all.  If that sounds like a
> reasonable approach, we might be able to get that into 4.6.

The primary backend specific bit was the addition of a predicate to
indicate which registers are FP registers so that they can be marked
appropriately.  The SH backend changes were more invasive, but not out
of the ordinary given what I understand of floating-point and the SH
family.  Many backends already had such a macro; the only change was
exposing that macro via the new target macro.

If there's already a way to do this that doesn't involve the addition of
a new predicate, that would of course be the better route.

-Nathan


Re: Bootstrap failed for i386-pc-solaris2.10 and sparc-sun-solaris2.10

2010-06-04 Thread Nathan Froyd
On Fri, Jun 04, 2010 at 01:44:02PM +, Art Haas wrote:
> This morning's i386 build fails with the following error:
> 
> libbackend.a(sol2.o): In function `solaris_output_init_fini':
> /home/ahaas/gnu/gcc.git/gcc/config/sol2.c:109: undefined reference to 
> `print_operand'
> /home/ahaas/gnu/gcc.git/gcc/config/sol2.c:116: undefined reference to 
> `print_operand'
> collect2: ld returned 1 exit status
> make[3]: *** [cc1-dummy] Error 1
> 
> The sparc build fails like so:
> 
> libbackend.a(targhooks.o): In function `default_print_operand_address':
> /export/home/arth/src/gcc.git/gcc/targhooks.c:349: undefined reference to 
> `output_operand'
> /export/home/arth/src/gcc.git/gcc/targhooks.c:349: undefined reference to 
> `output_operand'
> collect2: ld returned 1 exit status
> gmake[3]: *** [cc1-dummy] Error 1

Whoops, sorry about that.  This patch fixes at least the sparc error
(successfully built a cross to sparc-solaris) and I'm in the process of
building a cross to i686-solaris.

OK to commit if compilation succeeds?

-Nathan


* config/i386/i386-protos.h (ix86_print_operand): Declare.
* config/i386/i386.c (ix86_print_operand): Make non-static.
* config/i386/sol2.h (ASM_OUTPUT_CALL): Call ix86_print_operand.
* rtl.h (output_operand): Declare.
* final.c (output_operand): Make non-static.

Index: final.c
===
--- final.c (revision 160266)
+++ final.c (working copy)
@@ -220,7 +220,6 @@ static void output_asm_name (void);
 static void output_alternate_entry_point (FILE *, rtx);
 static tree get_mem_expr_from_op (rtx, int *);
 static void output_asm_operand_names (rtx *, int *, int);
-static void output_operand (rtx, int);
 #ifdef LEAF_REGISTERS
 static void leaf_renumber_regs (rtx);
 #endif
@@ -3478,7 +3477,7 @@ mark_symbol_refs_as_used (rtx x)
The meanings of the letters are machine-dependent and controlled
by TARGET_PRINT_OPERAND.  */
 
-static void
+void
 output_operand (rtx x, int code ATTRIBUTE_UNUSED)
 {
   if (x && GET_CODE (x) == SUBREG)
Index: ChangeLog
===
Index: rtl.h
===
--- rtl.h   (revision 160266)
+++ rtl.h   (working copy)
@@ -2417,6 +2417,7 @@ extern void simplify_using_condition (rt
 /* In final.c  */
 extern unsigned int compute_alignments (void);
 extern int asm_str_count (const char *templ);
+extern void output_operand (rtx, int);
 
 struct rtl_hooks
 {
Index: config/i386/sol2.h
===
--- config/i386/sol2.h  (revision 160266)
+++ config/i386/sol2.h  (working copy)
@@ -145,7 +145,7 @@ along with GCC; see the file COPYING3.  
   do   \
 {  \
   fprintf (FILE, "\tcall\t");  \
-  print_operand (FILE, XEXP (DECL_RTL (FN), 0), 'P');  \
+  ix86_print_operand (FILE, XEXP (DECL_RTL (FN), 0), 'P'); \
   fprintf (FILE, "\n");\
 }  \
   while (0)
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 160266)
+++ config/i386/i386-protos.h   (working copy)
@@ -60,6 +60,7 @@ extern bool legitimate_pic_operand_p (rt
 extern int legitimate_pic_address_disp_p (rtx);
 
 extern void print_reg (rtx, int, FILE*);
+extern void ix86_print_operand (FILE *, rtx, int);
 extern bool output_addr_const_extra (FILE*, rtx);
 
 extern void split_di (rtx[], int, rtx[], rtx[]);
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 160266)
+++ config/i386/i386.c  (working copy)
@@ -11579,7 +11579,7 @@ get_some_local_dynamic_name (void)
; -- print a semicolon (after prefixes due to bug in older gas).
  */
 
-static void
+void
 ix86_print_operand (FILE *file, rtx x, int code)
 {
   if (code)


Re: Bootstrap failed for i386-pc-solaris2.10 and sparc-sun-solaris2.10

2010-06-04 Thread Nathan Froyd
On Fri, Jun 04, 2010 at 07:45:20AM -0700, Ian Lance Taylor wrote:
> Nathan Froyd  writes:
> > * config/i386/i386-protos.h (ix86_print_operand): Declare.
> > * config/i386/i386.c (ix86_print_operand): Make non-static.
> > * config/i386/sol2.h (ASM_OUTPUT_CALL): Call ix86_print_operand.
> > * rtl.h (output_operand): Declare.
> > * final.c (output_operand): Make non-static.
> 
> The changes in config/i386 are OK.
> 
> I don't understand the point of the changes to rtl.h and final.c.

The changes to rtl.h and final.c are necessary because PRINT_OPERAND* on
some platforms freely calls output_operand.  But now that PRINT_OPERAND*
has been hookized, that call to output_operand no longer appears
textually in final.c, but somewhere else (targhooks.c if the port has
not done the conversion PRINT_OPERAND* -> TARGET_PRINT_OPERAND*,
config/$PORT/$PORT.c if the port has), output_operand needs to be
exported.

Looking at things a little more closely, output_address is exported in
output.h.  I suppose output_operand should be exported there as well?

-Nathan


Re: Bootstrap failed for i386-pc-solaris2.10 and sparc-sun-solaris2.10

2010-06-04 Thread Nathan Froyd
On Fri, Jun 04, 2010 at 08:32:26AM -0700, Ian Lance Taylor wrote:
> Nathan Froyd  writes:
> > Looking at things a little more closely, output_address is exported in
> > output.h.  I suppose output_operand should be exported there as well?
> 
> Yes, put the declaration there, by output_operand_lossage.

This is what I committed.

-Nathan

* config/i386/i386-protos.h (ix86_print_operand): Declare.
* config/i386/i386.c (ix86_print_operand): Make non-static.
* config/i386/sol2.h (ASM_OUTPUT_CALL): Call ix86_print_operand.
* output.h (output_operand): Declare.
* final.c (output_operand): Make non-static.

Index: final.c
===
--- final.c (revision 160285)
+++ final.c (working copy)
@@ -220,7 +220,6 @@ static void output_asm_name (void);
 static void output_alternate_entry_point (FILE *, rtx);
 static tree get_mem_expr_from_op (rtx, int *);
 static void output_asm_operand_names (rtx *, int *, int);
-static void output_operand (rtx, int);
 #ifdef LEAF_REGISTERS
 static void leaf_renumber_regs (rtx);
 #endif
@@ -3478,7 +3477,7 @@ mark_symbol_refs_as_used (rtx x)
The meanings of the letters are machine-dependent and controlled
by TARGET_PRINT_OPERAND.  */
 
-static void
+void
 output_operand (rtx x, int code ATTRIBUTE_UNUSED)
 {
   if (x && GET_CODE (x) == SUBREG)
Index: ChangeLog
===
Index: output.h
===
--- output.h(revision 160285)
+++ output.h(working copy)
@@ -77,6 +77,9 @@ extern rtx final_scan_insn (rtx, FILE *,
subreg of.  */
 extern rtx alter_subreg (rtx *);
 
+/* Print an operand using machine-dependent assembler syntax.  */
+extern void output_operand (rtx, int);
+
 /* Report inconsistency between the assembler template and the operands.
In an `asm', it's the user's fault; otherwise, the compiler's fault.  */
 extern void output_operand_lossage (const char *, ...) ATTRIBUTE_PRINTF_1;
Index: config/i386/sol2.h
===
--- config/i386/sol2.h  (revision 160285)
+++ config/i386/sol2.h  (working copy)
@@ -145,7 +145,7 @@ along with GCC; see the file COPYING3.  
   do   \
 {  \
   fprintf (FILE, "\tcall\t");  \
-  print_operand (FILE, XEXP (DECL_RTL (FN), 0), 'P');  \
+  ix86_print_operand (FILE, XEXP (DECL_RTL (FN), 0), 'P'); \
   fprintf (FILE, "\n");\
 }  \
   while (0)
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 160285)
+++ config/i386/i386-protos.h   (working copy)
@@ -60,6 +60,7 @@ extern bool legitimate_pic_operand_p (rt
 extern int legitimate_pic_address_disp_p (rtx);
 
 extern void print_reg (rtx, int, FILE*);
+extern void ix86_print_operand (FILE *, rtx, int);
 extern bool output_addr_const_extra (FILE*, rtx);
 
 extern void split_di (rtx[], int, rtx[], rtx[]);
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 160285)
+++ config/i386/i386.c  (working copy)
@@ -11579,7 +11579,7 @@ get_some_local_dynamic_name (void)
; -- print a semicolon (after prefixes due to bug in older gas).
  */
 
-static void
+void
 ix86_print_operand (FILE *file, rtx x, int code)
 {
   if (code)


Re: Better performance on older version of GCC

2010-08-27 Thread Nathan Froyd
On Fri, Aug 27, 2010 at 09:44:25AM -0400, Corey Kasten wrote:
> I find that the executable compiled on system A runs faster (on both
> systems) than the executable compiled on system B (on both system), by a
> factor about approximately 4 times. I have attempted to play with the
> GCC optimizer flags and have not been able to get System B (with the
> later GCC version) to compile code with any better performance. Could
> someone please help figure this out?

It's almost impossible to tell what's going on without an actual
testcase.  You might not be able to provide the actual code, but you
could try distilling it down to something you could release.

-Nathan


Re: passing #define-d values to #define-d macros

2010-09-26 Thread Nathan Froyd
On Sun, Sep 26, 2010 at 06:09:34PM -0700, ir_idjit wrote:
> i can seem to get this to work:
> 
> #define PREFIX "p_"
> #define HIGHER_INTERFACE(id) LOWER_INTERFACE(PREFIX, id)
> 
> #define LOWER_INTERFACE(prefix, id) struct prefix##id \
> { \
> int i; \
> }
> 
> int main(void)
> {
> HIGHER_INTERFACE(0);
> 
> /* test if struct declaration went well: */
> struct p_0 var;
> return 0;
> }

This question is not appropriate for the mailing list gcc@gcc.gnu.org,
which is for gcc development.  It would be appropriate for a forum about
using the C language, such as the newsgroup comp.lang.c or
gcc-h...@gcc.gnu.org.  Please take any followups to gcc-help.  Thanks.

Your problem can be solved by using another layer of indirection and
making PREFIX to not be a string:

#define PREFIX p_
#define HIGHER_INTERFACE(id) L2(PREFIX, id)

#define L2(prefix,id) LOWER_INTERFACE(prefix,id)
#define LOWER_INTERFACE(prefix, id) struct prefix##id \
{ \
int i; \
}

int main(void)
{
HIGHER_INTERFACE(0);

/* test if struct declaration went well: */
struct p_0 var;
return 0;
}

-Nathan


Re: Questions about selective scheduler and PowerPC

2010-10-18 Thread Nathan Froyd
On Mon, Oct 18, 2010 at 02:49:21PM +0800, Jie Zhang wrote:
> 3. The aforementioned rs6000 hack rs6000_issue_rate was added by
> 
> 2003-03-03  David Edelsohn  
> 
> * config/rs6000/rs6000.c (rs6000_multipass_dfa_lookahead): Delete.
> (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Delete.
> (rs6000_variable_issue): Do not return negative value.
> (rs6000_issue_rate): Uniformly set issue rate to 1 for first
> scheduling pass.
> 
> , which was more than 7 years ago. Is this still needed now?

I asked David about this on IRC several days ago.  He indicated that it
was necessary to prevent the first scheduling pass from unnecessarily
increasing register pressure.  I don't know whether anybody has actually
tested it with recent GCC, though presumably it did help when it was
installed.

-Nathan


Re: Bug in expand_builtin_setjmp_receiver ?

2010-10-21 Thread Nathan Froyd
On Thu, Oct 21, 2010 at 02:14:15PM +0200, Frederic Riss wrote:
> On 19 October 2010 15:31, Ian Lance Taylor  wrote:
> > However, I agree that it does seem that it should be added to or
> > subtracted from hard_frame_pointer_rtx before setting
> > virtual_stack_vars_rtx, or something.  I only see one existing target
> > which sets STARTING_FRAME_OFFSET to a non-zero value and does not have a
> > nonlocal_goto expander: lm32.  It would be interesting to know whether
> > that target works here.
> 
> Is it easy to test lm32 on some simulator?

lm32 has a gdb simulator available, so it should be fairly easy to write
a board file for it if one doesn't already exist.

Unfortunately, building lm32-elf is broken in several different ways
right now.

-Nathan


Re: Bug in expand_builtin_setjmp_receiver ?

2010-10-27 Thread Nathan Froyd
On Tue, Oct 26, 2010 at 01:07:26PM +0100, Jon Beniston wrote:
> > lm32 has a gdb simulator available, so it should be fairly easy to write
> > a board file for it if one doesn't already exist.
> > 
> > Unfortunately, building lm32-elf is broken in several different ways
> > right now.
> 
> What problems do you have building lm32-elf? If you let me know, I can try
> to look in to them.

At least INCOMING_RETURN_ADDR_RTX and TARGET_EXCEPT_UNWIND_INFO need to
be defined, as in the below patch (not sure about the definition of
INCOMING_RETURN_ADDR_RTX).  I think even with those defined, compiling
libgcc ICEs, though I don't remember the details.

-Nathan

diff --git a/gcc/config/lm32/lm32.c b/gcc/config/lm32/lm32.c
index 671f0e1..b355309 100644
--- a/gcc/config/lm32/lm32.c
+++ b/gcc/config/lm32/lm32.c
@@ -100,6 +100,9 @@ static void lm32_option_override (void);
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P lm32_legitimate_address_p
 
+#undef TARGET_EXCEPT_UNWIND_INFO
+#define TARGET_EXCEPT_UNWIND_INFO sjlj_except_unwind_info
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Current frame information calculated by lm32_compute_frame_size.  */
diff --git a/gcc/config/lm32/lm32.h b/gcc/config/lm32/lm32.h
index b0c2d59..4c63e94 100644
--- a/gcc/config/lm32/lm32.h
+++ b/gcc/config/lm32/lm32.h
@@ -249,6 +249,8 @@ enum reg_class
 
 #define ARG_POINTER_REGNUM FRAME_POINTER_REGNUM
 
+#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (SImode, RA_REGNUM)
+
 #define RETURN_ADDR_RTX(count, frame)   \
   lm32_return_addr_rtx (count, frame)
 


decimal float, LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, and ARM ABI issues

2010-11-16 Thread Nathan Froyd
The easiest way to deal with the use of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN
in libgcc is to define a preprocessor macro __FLOAT_WORD_ORDER__ similar
to how WORDS_BIG_ENDIAN was converted.  That is, cppbuiltin.c will do:

  cpp_define_formatted (FOO, "__FLOAT_WORD_ORDER__=%s",
(FLOAT_WORDS_BIG_ENDIAN
 ? "__ORDER_BIG_ENDIAN__"
 : "__ORDER_LITTLE_ENDIAN__"));

and change any uses of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN to consult
__FLOAT_WORD_ORDER__ instead.

A grep reveals that there are no target definitions of
LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, so we should be OK with the
straightforward conversion, right?

This runs into a curious case in the arm backend, though, which has:

#define FLOAT_WORDS_BIG_ENDIAN (arm_float_words_big_endian ())

with no corresponding LIBGCC2_FLOAT_WORDS_BIG_ENDIAN.  I think what this
means is that the places that care about the order of float words
(currently libdecnumber, libbid, and dfp-bit.h) will always use the
order indicated by __BYTE_ORDER__/WORDS_BIG_ENDIAN, even when the
backend is secretly using a different order.

ARM has probably gotten lucky wrt dfp-bit.h because it has its own
assembler fp routines that presumbly DTRT for unusual float word
orderings.  (dfp-bit.h also does not *use* the setting of
LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, so that helps.)  But IIUC, using
__FLOAT_WORD_ORDER__ in the relevant libraries will break pre-existing
code that used libdecnumber and/or libbid.  I am not conversant enough
with ARM ABIs and/or targets to know which ones would break.

The saving grace here is that decimal float is not enabled by default
for arm platforms, so there are likely very few, if any, users of
decimal float on ARM; it might be worthwhile to go ahead and fix things,
ignoring the fallout from earlier versions.

What do the ARM maintainers think?  Should I prepare a patch for getting
rid of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN and we'll declare decimal float
horribly broken pre-4.6?  Or is there a better way forward?

-Nathan


Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)

2010-11-16 Thread Nathan Froyd
On Wed, Nov 17, 2010 at 03:40:39AM +0100, Paolo Bonzini wrote:
> True, but you can hide that cast in a base class.  For example you
> can use a hierarchy
> 
> Target   // abstract base
> TargetImplBase   // provides strong typing
> TargetI386   // actual implementation
> 
> The Target class would indeed take a void *, but the middle class
> would let TargetI386 think in terms of TargetI386::CumulativeArgs
> with something like
> 
> void f(void *x) {
> // T needs to provide void T::f(T::CumulativeArgs *)
> f(static_cast (x));
> }
> 
> The most similar thing in C (though not suitable for multitarget) is
> a struct, which is why I suggest using that now rather than void *
> (which would be an implementation detail).

I am admittedly a C++ newbie; the first thing I thought of was:

class gcc::cumulative_args {
  virtual void advance (...) = 0;
  virtual rtx arg (...) = 0;
  virtual rtx incoming_arg (...) { return this->arg (...); };
  virtual int arg_partial_bytes (...) = 0;
  // ...and so on for many of the hooks that take CUMULATIVE_ARGS *
  // possibly with default implementations instead of pure virtual
  // functions.
};

class i386::cumulative_args : gcc::cumulative_args {
  // concrete implementations of virtual functions
};

// the hook interface is then solely for the backend to return
// `cumulative_args *' things (the current INIT_*_ARGS macros), which
// are then manipulated via the virtual functions above.

AFAICS, this eliminates the casting issues Joern described.  What are
the advantages of the scheme you describe above?  (Honest question.)  Or
are we talking about the same thing in slightly different terms?

-Nathan


Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)

2010-11-17 Thread Nathan Froyd
On Tue, Nov 16, 2010 at 10:22:00PM -0500, Joern Rennecke wrote:
> Quoting Nathan Froyd :
> >I am admittedly a C++ newbie; the first thing I thought of was:
> >
> >class gcc::cumulative_args {
> >  virtual void advance (...) = 0;
> >  virtual rtx arg (...) = 0;
> >  virtual rtx incoming_arg (...) { return this->arg (...); };
> >  virtual int arg_partial_bytes (...) = 0;
> >  // ...and so on for many of the hooks that take CUMULATIVE_ARGS *
> >  // possibly with default implementations instead of pure virtual
> >  // functions.
> >};
> 
> Trying to put a target-derived object of that into struct rtl_data would
> be nonsentical.  You might store a pointer, of course.

Yes, of course.  I thought that might have been clear from context.

> Does that mean you acknowledge that we shouldn't have CUMULATIVE_ARGS
> taking hooks in the global target vector?

Maybe?  I think the methods discussed in this thread would be better for
when we do move to C++.  I don't think your original proposal or
anything that sacrifices the type-safety of the current interface is the
way forward.

-Nathan


Re: RFC: semi-automatic hookization

2010-11-19 Thread Nathan Froyd
On Tue, Nov 16, 2010 at 06:23:32AM -0800, Ian Lance Taylor wrote:
> Joern Rennecke  writes:
> > Before I go and make all these target changes & test them, is there at
> > least agreemwent that this is the right approach, i.e replacing
> > CUMULATIVE_ARG *
> > with void *, and splitting up x_rtl into two variables.
> 
> I don't know how we want to get there, but it seems to me that the place
> we want to end up is with the target hooks defined to take an argument
> of type struct cumulative_args * (or a better name if we can think of
> one).  We could consider moving the struct definition into CPU.c, and
> having the target structure just report the size, or perhaps a combined
> allocation/INIT_CUMULATIVE_ARGS function.

FWIW, this is basically what I proposed here:

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02527.html

I have gotten stalled on the INIT_* macros because the documentation and
the practices of individual backends do not seem to agree and I have not
taken the time to sit down and hammer out agreement.  (I don't think
attempting hookization of those macros would be appropriate for stage
3.)  I was a little uncertain how to handle the allocation issues; I
think specifying that the INIT_* hooks allocate them from a known place
(heap, obstack, or alloc_pool) is probably sufficient.

-Nathan


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Nathan Froyd
On Wed, Nov 24, 2010 at 02:48:01PM +, Pedro Alves wrote:
> On Wednesday 24 November 2010 13:45:40, Joern Rennecke wrote:
> > Quoting Pedro Alves :
> > Also, these separate hooks for common operations can make the code more
> > readable, particularly in the bits_in_units_ceil case.
> > I.e.
> >  foo_var = ((bitsize + targetm.bits_per_unit () - 1)
> > / targetm.bits_per_unit ());
> > vs.
> >  foo_var = targetm.bits_in_units_ceil (bitsize);
> > 
> 
> bits_in_units_ceil could well be a macro or helper function
> implemented on top of targetm.bits_per_unit (which itself could
> be a data field instead of a function call), that only accessed
> bits_per_unit once.  It could even be implemented as a helper
> macro / function today, on top of BITS_PER_UNIT.

I think adding the functions as inline functions somewhere and using
them in the appropriate places would be a reasonable standalone
cleanup.  It'd be easy to move towards something more general later.
Writing:

  int bits = ...;
  ... (X + bits - 1)/ bits;

  also generates ever-so-slightly smaller code than:

  ... (X + BITS_PER_UNIT - 1) / BITS_PER_UNIT;

on targets where BITS_PER_UNIT is not constant.

I personally am not a fan of the X_in_Y naming, though; I think X_to_Y
is a little clearer.

-Nathan


Re: ppc: const data not in RO section

2010-12-01 Thread Nathan Froyd
On Tue, Nov 30, 2010 at 08:04:06PM +0100, Joakim Tjernlund wrote:
> Why is not
>   const char cstr[] = "mystr";
>   const int myint = 3;
> added to a read only section?
> Especially since
>   const int myarr[]={1,2,3};
> is placed in .rodata.
> 
> hmm, -G 0 does place these in .rodata but why do I have to specify that?

It would help if you specified the target and the compiler version that
you used.

The compiler I have (~4.5) places myint and mystr in .sdata; since
they're so small, GCC thinks that placing myint and mystr in .sdata is
beneficial.  Why do you think -G 0 should be the default?

It does seem kind of odd that "mystr" is placed in .sdata, since
rs6000_elf_in_small_data_p indicates that string constants shouldn't be
in .sdata.  You could investigate and submit a patch or file a bug.

-Nathan


Re: [ARM] Implementing doloop pattern

2011-01-13 Thread Nathan Froyd
On Thu, Jan 06, 2011 at 09:59:08AM +0200, Revital1 Eres wrote:
> Index: loop-doloop.c
> + Some targets (ARM) do the comparison before the branch, as in the
> + folloring form:
^

"following"

> +  /* In case the pattern is not PARALLEL we expect two forms
> +  of doloop which are cases 2) and 3) above: in case 2) the
> +  decrement is immediately precedes the branch. while in case
   ^^

Take out the "is".

> +  3) the compre and decrement instructions immediately precede
^^

"compare"

> Index: config/arm/thumb2.md
> +   /* Currently SMS relies on the do-loop pattern to recognize loops
> +  where (1) the control part comprises of all insns defining and/or
^

I think "consists" would be more idiomatic here, even if it's still a
little awkward.

-Nathan



TREE_LIST removals and cleanups for 4.7

2011-01-22 Thread Nathan Froyd
Since people are starting to post interesting patches for 4.7, I thought
it would be good to talk about bits I plan to cleanup in 4.7.  Comments
on other ugly things would also be welcome.

TREE_LIST related things:

- TREE_VECTOR_CST_ELTS.  I have a patch for this.

- ASM_EXPR operands, clobbers, and labels.  I've started on this.

- Attributes.  This looks reasonably straightforward, though updating
  all the attribute handlers will be tedious.  Geoff Keating tackled
  this several years ago, but small differences of style prevented his
  patches from getting merged.

- TYPE_VALUES of ENUMERAL_TYPEs.  Again, Geoff did this conversion
  already; I hope to reuse some of the bits that he did.

- TYPE_ARG_TYPES.  The major blocker here is the C++ front-end.  I don't
  know whether this is feasible to tackle in the 4.7 timeframe.

- Moving default arguments out of TYPE_ARG_TYPES into PARM_DECLs is a
  prerequisite for redoing TYPE_ARG_TYPES.  This is a much smaller
  undertaking.

- Converting the last few uses of build_function_type that could not be
  easily replaced with build_function_type_list.  This will be done with
  introducing build_function_type_vec or similar.  I also have come to
  think that the current build_function_type_list interface is a
  mistake, but that's a discussion to have later.

- Continuing to get rid of TREE_LISTs in other places, replacing them
  with VECs or similar.

Other things:

- Similarly to the work I did for s/TREE_CHAIN/DECL_CHAIN/, I'd like to
  replace TREE_TYPE for things like {POINTER,FUNCTION,ARRAY}_TYPE, etc.
  This work would be a good step towards both staticizing trees and
  tuplification of types.

- Looking at TYPE_FIELDS usage in the compiler, there are a couple of
  places that didn't get DECL_CHAIN-ified due to suboptimal test
  coverage.  Those should be fixed.

- We ought to be using the VEC(stack,X) support added in 4.6 for a
  number of temporary VECs in the front-ends (e.g. constructing
  CALL_EXPRs, function types, etc.).  I don't know how that will
  interact with any C++ usage; perhaps a template class for
  mostly-stack-allocated vectors is in order.  Any typing issues should
  be resolved by using VEC_base or with C++ auto-conversion cleverness.

- Hookizing INIT_CUMULATIVE_ARGS and friends.  Joern's patch may be a
  good starting point, though depending on what we do with C++ for 4.7,
  moving most of the CUMULATIVE_ARGS bits out of hooks and into a
  separate class is probably better.  Cleanups related to C++ usage is
  probably worth a separate discussion.

Comments?  Concerns?

-Nathan


Re: TREE_LIST removals and cleanups for 4.7

2011-01-24 Thread Nathan Froyd
On Sat, Jan 22, 2011 at 08:02:33PM +0100, Michael Matz wrote:
> On Sat, 22 Jan 2011, Nathan Froyd wrote:
> > - Similarly to the work I did for s/TREE_CHAIN/DECL_CHAIN/, I'd like to
> >   replace TREE_TYPE for things like {POINTER,FUNCTION,ARRAY}_TYPE, etc.
> >   This work would be a good step towards both staticizing trees and
> >   tuplification of types.
> 
> I don't see the advantage in the accessors to that type be named
> differently according to context compared to simply TREE_TYPE.

Well, documentation for one.  TREE_TYPE (TREE_TYPE (t)) looks better if
you wrote it as RETURN_TYPE (DECL_TYPE (t)).  Maybe it's slightly more
obvious if the variable wasn't named `t' and from the surrounding
context; from the conversion for RETURN_TYPE, though, I don't think it's
obvious.  And triply-nested TREE_TYPEs are confusing regardless. :) I
admit that this introduces unnecessary tests to satisfy
--enable-checking builds; I haven't looked whether GCC will optimize out
the extra checks for --disable-checking.

Not all types have a {sub,element}type either.  You'd like to be able to
split out those types to make them smaller (tree_type is huge), and
that's hard to do otherwise--you can't use the tree_typed .type member.
(This is the tuplification part.)

If you have statically typed trees, you're also going to have separate
accessors for type of types (see above), type of exprs, type of decls,
etc. even if they share a common base class (tree_base) for lightweight
RTTI.  This goal is farther off, even if the proposal is eight years old
at this point.

> If your goal is to make tree_common smaller, introduce a tree_typed 
> structure (consisting of tree_base + type member), and use that instead of 
> tree_common in all tree structures needing to have a type.

I think that's a good idea, too.  But orthogonal to the above.

-Nathan


Re: Is anyone testing for a (cross-) target (board) with dynlinking?

2008-02-11 Thread Nathan Froyd
On Tue, Feb 12, 2008 at 02:47:39AM +0100, Hans-Peter Nilsson wrote:
> Is it as simple as nobody having tested cross-gcc setups for
> targets with dynamic linking, or are they incorrectly using the
> wrong (the installed, not the newly compiled) libgcc_s.so.1?
> 
> Or how did you do it?  NFS mounts on target and
> "env LD_LIBRARY_PATH=... make check"?

One way to do it is with NFS mounts and setting -Wl,-dynamic-linker
-Wl,-rpath for your ldflags.  You could skip -Wl,-dynamic-linker if you
weren't testing a newly compiled libc.

-Nathan


Re: Is anyone testing for a (cross-) target (board) with dynlinking?

2008-02-12 Thread Nathan Froyd
On Tue, Feb 12, 2008 at 05:13:45AM +0100, Hans-Peter Nilsson wrote:
> > From: Nathan Froyd <[EMAIL PROTECTED]>
> > One way to do it is with NFS mounts and setting -Wl,-dynamic-linker
> > -Wl,-rpath for your ldflags.
> 
> Thanks to you and David Daney.  Have you used it yourself?
> Apparently tricks are needed as the -rpath is used both at
> run-time and at link-time, ld complains about "No such file or
> directory" if the path doesn't exist on the host side.

I do use it, but I forgot to mention one other piece of the setup I use:
identical paths on the host and the target.  e.g.:

host:/path/to/gcc   gets mounted at
target:/mount/path/to/gcc   gets symlinked to
target:/path/to/gcc

-Nathan


Re: GCC 4.3.0-20080228 (powerpc-linux-gnuspe) ICE on 20000718.c test

2008-03-20 Thread Nathan Froyd
On Mon, Mar 10, 2008 at 03:22:13PM +0300, Sergei Poselenov wrote:
> I've got the ICE on the gcc.c-torture/compile/2718.c test:
> powerpc-linux-gnuspe-gcc -c -O3 -funroll-loops 2718.c
> 2718.c: In function 'baz':
> 2718.c:14: internal compiler error: Segmentation fault
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.

I run the testsuite with --enable-e500_double all the time and have not
seen this bug before.  It's possible that the bug you're seeing would be
fixed by this patch:

http://gcc.gnu.org/ml/gcc-patches/2008-02/msg01045.html

which was committed on the 6th of March.  The bug is very sensitive to
the environment in which the compiler is run, which would explain why
nobody else is seeing it.

Please update your checkout and rebuild the compiler.  If it still
fails, then please file a bugreport.  And if the bugreport could include
a backtrace (obtained with GDB), that'd be even better.

Thanks,
-Nathan


Re: [PATCH,rs6000] split up crtsavres into individual files

2008-06-24 Thread Nathan Froyd
On Tue, Jun 24, 2008 at 10:42:57AM +1000, Ben Elliston wrote:
> On Mon, 2008-06-23 at 15:52 -0700, Andrew Pinski wrote:
> > This introduced a few warnings while building libgcc for 
> > powerpc64-linux-gnu:
> 
> I see lots and lots of these myself:
>
> Please fix! :-)

I believe this patch fixes things; there are Makefile rules
automagically generated for files in LIB2ADD{,_ST} and these rules were
conflicting with the manually specified ones in t-ppccomm.  With the
patch, no warnings are emitted when make'ing in the libgcc/ directory
and the resulting object/library files are identical to what they were
previously.

OK to commit as is?  Or should I do a testing run as well?

-Nathan

libgcc/
2008-06-24  Nathan Froyd  <[EMAIL PROTECTED]>

* config/rs6000/t-ppccomm: Remove rules that conflict with
auto-generated rules.

Index: config/rs6000/t-ppccomm
===
--- config/rs6000/t-ppccomm (revision 136762)
+++ config/rs6000/t-ppccomm (working copy)
@@ -101,63 +101,3 @@ ncrti$(objext): ncrti.S
 
 ncrtn$(objext): ncrtn.S
$(crt_compile) -c ncrtn.S
-
-crtsavres$(objext): crtsavres.S
-   $(crt_compile) -c crtsavres.S
-
-crtsavfpr$(objext): crtsavfpr.S
-   $(crt_compile) -c crtsavfpr.S
-
-crtresfpr$(objext): crtresfpr.S
-   $(crt_compile) -c crtresfpr.S
-
-crtsavgpr$(objext): crtsavgpr.S
-   $(crt_compile) -c crtsavgpr.S
-
-crtresgpr$(objext): crtresgpr.S
-   $(crt_compile) -c crtresgpr.S
-
-crtresxfpr$(objext): crtresxfpr.S
-   $(crt_compile) -c crtresxfpr.S
-
-crtresxgpr$(objext): crtresxgpr.S
-   $(crt_compile) -c crtresxgpr.S
-
-e500crtres32gpr$(objext): e500crtres32gpr.S
-   $(crt_compile) -c e500crtres32gpr.S
-
-e500crtres64gpr$(objext): e500crtres64gpr.S
-   $(crt_compile) -c e500crtres64gpr.S
-
-e500crtres64gprctr$(objext): e500crtres64gprctr.S
-   $(crt_compile) -c e500crtres64gprctr.S
-
-e500crtrest32gpr$(objext): e500crtrest32gpr.S
-   $(crt_compile) -c e500crtrest32gpr.S
-
-e500crtrest64gpr$(objext): e500crtrest64gpr.S
-   $(crt_compile) -c e500crtrest64gpr.S
-
-e500crtresx32gpr$(objext): e500crtresx32gpr.S
-   $(crt_compile) -c e500crtresx32gpr.S
-
-e500crtresx64gpr$(objext): e500crtresx64gpr.S
-   $(crt_compile) -c e500crtresx64gpr.S
-
-e500crtsav32gpr$(objext): e500crtsav32gpr.S
-   $(crt_compile) -c e500crtsav32gpr.S
-
-e500crtsav64gpr$(objext): e500crtsav64gpr.S
-   $(crt_compile) -c e500crtsav64gpr.S
-
-e500crtsav64gprctr$(objext): e500crtsav64gprctr.S
-   $(crt_compile) -c e500crtsav64gprctr.S
-
-e500crtsavg32gpr$(objext): e500crtsavg32gpr.S
-   $(crt_compile) -c e500crtsavg32gpr.S
-
-e500crtsavg64gpr$(objext): e500crtsavg64gpr.S
-   $(crt_compile) -c e500crtsavg64gpr.S
-
-e500crtsavg64gprctr$(objext): e500crtsavg64gprctr.S
-   $(crt_compile) -c e500crtsavg64gprctr.S


Re: ARM constant folding bug?

2007-08-03 Thread Nathan Froyd
On Fri, Aug 03, 2007 at 06:24:06PM +0100, Paul Brook wrote:
> On Friday 03 August 2007, Jonathan S. Shapiro wrote:
> > Then it seems very curious that the constant folding should fail on this
> > platform. Any idea what may be going on here?
> 
> You're exploiting a hole in the C aliasing rules by accessing a 32-bit int as 
> type char. I tested several compilers (4.2, 4.1 and 3.4 x86, 4.2 m68k and 4.2 
> arm) and the only one that eliminated the comparison was 3.4-x86.

FWIW, rewriting it with the "obvious" union approach seems to give the
desired results on 4.2 arm with and without -mbig-endian.

-Nathan


[lto] project: adding --with-libelf to configure

2007-09-06 Thread Nathan Froyd
The LTO driver requires libelf and currently grovels around in the
system directories looking for it, which may not always be the right
place to find it.  (This bit me when building LTO on our new Linux
machines, which do not have libelf installed.)  The Right Thing would be
to add a --with-libelf flag to configure so we wouldn't have to assume
it's always installed in one particular place.  A small LTO project
would be for someone to add a --with-libelf to configure.

I'm not an expert at configury; would somebody else like to take a whack
at this?

-Nathan


[lto] preliminary SPECint benchmark numbers

2007-12-24 Thread Nathan Froyd
In one of my recent messages about a patch to the LTO branch, I
mentioned that we could compile and successfully run all of the C
SPECint benchmarks except 176.gcc.  Chris Lattner asked if I had done
any benchmarking now that real programs could be run; I said that I
hadn't but would try to do some soon.  This is the result of that.

I don't have numbers on what compile times look like, but I don't think
they're good.  176.gcc takes several minutes to compile (basically -flto
*.o, not counting the time to compile individual .o files); the other
benchmarks are all a minute or more apiece.

Executive summary: LTO is currently *not* a win.

In the table below, runtimes are in seconds.  I ran the tests on an
8-core 1.6GHz machine with 8 GB RAM.  I believe the machine was
relatively idle; I ran the tests over a weekend evening.  The last merge
from mainline to the LTO branch was mainline r130155, so that's about
what the -O2 numbers correspond to--I don't think we've changed too much
core code on the branch.  The % change are just in-my-head estimates,
using -O2 as a baseline.

-O2 -flto   % change
164.gzip174 176 + 1
175.vpr 139 143 + 3
181.mcf 162 166 + 3
186.crafty  65.266.6+ < 1
197.parser  240 261 + 9
253.perlbmk 119 133 + 13
254.gap 84.487  + 4
256.bzip2   131 145 + 11
300.twolf   202 193 - 4 (!)

176.gcc doesn't run correctly with LTO yet; 255.vortex didn't run
correctly with "mainline", but it did with -flto, which is curious.  We
don't do C++ yet, so 252.eon is not included.

In general, things get worse with LTO, sometimes much worse.  I can
think of at least three possible reasons off the top of my head:

- Alias information.  We don't have any type-based alias information in
  -flto, which hurts.

- We don't merge types between compilation units, which could account
  for poor optimization behavior.

- I believe we lose some information in the LTO write/read process; edge
  probabilities, estimated # instructions in functions, etc. get lost.
  This hurts inlining decisions, block layout, alignment of jump
  targets, etc.  So there's information we need to write out or
  recompute.

-Nathan