Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Richard Guenther

On 4/27/07, Janne Blomqvist <[EMAIL PROTECTED]> wrote:

Hi,

I spent some time with oprofile, trying to figure out why we suck at the
gas_dyn benchmark in polyhedron. It turns out that there are two lines
that account for ~54% of the total runtime.

In subroutine CHOZDT we have the line

 DTEMP = DX/(ABS(VEL) + SOUND)

and in subroutine EOS the line

 CS(:NODES) = SQRT(CGAMMA*PRES(:NODES)/DENS(:NODES))


See also http://www.suse.de/~gcctest/c++bench/polyhedron/analysis.html
(same conclusion for gas_dyn).


Both of these lines are array expressions, but they are quite simple and
gfortran manages to scalarize both of them without creating temporaries.
Both loops also vectorize nicely, which is important since gas_dyn is a
single precision program so vectorization is a real benefit on current
cpu:s (vectorization alone reduces runtime from 30s to 24s on my athlon 64).

You can find both subroutines simplified, with comments showing the
oprofile data for the CPU_CLK_UNHALTED (basically, runtime) and
L2_CACHE_MISS events for the critical lines, attached. For ifort, I had
to disable -ipo to get any results for CHOZDT (probably inlined), but
without -ipo I didn't get sensible results for EOS (seems like the line
numbers got messed up somehow for opannotate), so the results are not
entirely comparable. Nonetheless, the ifort timings change only
marginally due to -ipo, so it shouldn't make a big difference.

Ifort and other commercial compilers (I haven't tested others) still
manage to beat gfortran quite badly, see e.g.

http://www.polyhedron.com/

http://physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/

The reason, it seems, is that ifort (and presumably other commercial
compilers with competitive scores in gas_dyn) avoids calculating
divisions and square roots, replacing them with reciprocals and
reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as
1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE
instruction set contains very fast instructions for this, rcpps, rcpss,
rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These
instructions have latencies of 1-2 cycles vs. dozens or even hundreds of
cycles for normal division and square root.  The price to be paid for
this speed is that these reciprocal instructions have an accuracy of
only 12 bits, so clearly they can be enabled only for -ffast-math. And
they are available only for single precision. I'll file a
missed-optimization PR about this.


I think that even with -ffast-math 12 bits accuracy is not ok.  There is
the possibility of doing another newton iteration step to improve
accuracy, that would be ok for -ffast-math.  We can, though, add an
extra flag -msserecip or however you'd call it to enable use of the
instructions with less accuracy.

Richard.


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Andrew Pinski

On 4/27/07, Richard Guenther <[EMAIL PROTECTED]> wrote:

I think that even with -ffast-math 12 bits accuracy is not ok.  There is
the possibility of doing another newton iteration step to improve
accuracy, that would be ok for -ffast-math.  We can, though, add an
extra flag -msserecip or however you'd call it to enable use of the
instructions with less accuracy.


Which is already done for PPC at least the scalar code, see -mswdiv
option.  We don't do this for sqrt reciprocal yet but it is easy to
special case for the reciprocal case :).

-- Pinski


Re: DR#314 update

2007-04-27 Thread Joseph S. Myers
On Fri, 26 Apr 2007, Geoffrey Keating wrote:

> This seems reasonable to me, but maybe it would be simpler to write
> 
> If there are one or more incomplete structure or union types which
> cannot all be completed without producing undefined behaviour, the
> behaviour is undefined.
> 
> if that gives the same effect (which I think it does)?

That suffers somewhat from the vagueness that afflicts this area (what 
rearrangements of translation units are permitted in completing the 
types?).  I considered some other examples and decided that what I wanted 
was unifiability even in some cases that don't involve incomplete types.  
For example:

// TU 1
void
f (void)
{
  struct s { int a; };
  extern struct s a, b;
}

void
g (void)
{
  struct s { int a; };
  extern struct s c, d;
}

// TU 2
void
h (void)
{
  struct s { int a; };
  extern struct s a, c;
}

void
i (void)
{
  struct s { int a; };
  extern struct s b, d;
}


Here, each object individually has compatible types in the two translation 
units - but "a" and "b" have compatible complete types within TU 1, but 
incompatible complete types withing TU 2.

I didn't feel it should be necessary to unify two incompatible types from 
the same translation unit (even if they'd be compatible in different 
translation units), nor to split the uses of a single type within a 
translation unit into two or more distinct and incompatible types.

-- 
Joseph S. Myers
[EMAIL PROTECTED]


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Janne Blomqvist

Richard Guenther wrote:

See also http://www.suse.de/~gcctest/c++bench/polyhedron/analysis.html
(same conclusion for gas_dyn).


Thanks, I seem to have completely missed that page (though I was aware 
of your polyhedron tester).


>On 4/27/07, Janne Blomqvist <[EMAIL PROTECTED]> wrote: >> The 
reason, it seems, is that ifort (and presumably other commercial

compilers with competitive scores in gas_dyn) avoids calculating
divisions and square roots, replacing them with reciprocals and
reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as
1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE
instruction set contains very fast instructions for this, rcpps, rcpss,
rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These
instructions have latencies of 1-2 cycles vs. dozens or even hundreds of
cycles for normal division and square root.  The price to be paid for
this speed is that these reciprocal instructions have an accuracy of
only 12 bits, so clearly they can be enabled only for -ffast-math. And
they are available only for single precision. I'll file a
missed-optimization PR about this.


I think that even with -ffast-math 12 bits accuracy is not ok.  There is
the possibility of doing another newton iteration step to improve
accuracy, that would be ok for -ffast-math.  We can, though, add an
extra flag -msserecip or however you'd call it to enable use of the
instructions with less accuracy.


I agree it can be an issue, but OTOH people who care about precision 
probably 1. avoid -ffast-math 2. use double precision (where these 
reciprocal instrs are not available). Intel calls it -no-prec-div, but 
it's enabled for the "-fast" catch-all option.


On a related note, our beloved competitors generally have some high 
level flag for combining all these fancy and potentially unsafe 
optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at 
least FP benchmarks seem to do generally well with something like "-O3 
-funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", 
but it's quite a mouthful.


--
Janne Blomqvist


Re: assign numbers to warnings; treat selected warnings as errors

2007-04-27 Thread Gabriel Dos Reis
Thomas Koenig <[EMAIL PROTECTED]> writes:

| [adjusting Subject and also forwarding to [EMAIL PROTECTED]
| 
| On Wed, 2007-04-18 at 12:12 -0700, Vivek Rao wrote:
| > Here is a feature of g95 that I would like to see in
| > gfortran. G95 assigns numbers to warnings and allows
| > selected warnings to be treated as errors. 
| 
| [...]
| 
| > g95 -Wall -Wextra -Werror=113,115,137 xunused.f90
| > 
| > turns those warnings into errors. 
| > 
| > Gfortran does not assign numbers to warnings, and the
| > option -Werror turns ALL warnings into errors. I'd
| > like finer control.
| 
| This does sound like a useful feature, not only for
| gfortran, but for all of gcc.
| 
| Thoughts, comments?

The is a front end-independent infrastructure in place to name
diagnostics and filer them -- used by most GCC front ends.  Only
Gfortran seems to build its own ghetto.

-- Gaby


Re: assign numbers to warnings; treat selected warnings as errors

2007-04-27 Thread Steven Bosscher

On 27 Apr 2007 08:50:57 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:

Thomas Koenig <[EMAIL PROTECTED]> writes:

| [adjusting Subject and also forwarding to [EMAIL PROTECTED]
|
| On Wed, 2007-04-18 at 12:12 -0700, Vivek Rao wrote:
| > Here is a feature of g95 that I would like to see in
| > gfortran. G95 assigns numbers to warnings and allows
| > selected warnings to be treated as errors.
|
| [...]
|
| > g95 -Wall -Wextra -Werror=113,115,137 xunused.f90
| >
| > turns those warnings into errors.
| >
| > Gfortran does not assign numbers to warnings, and the
| > option -Werror turns ALL warnings into errors. I'd
| > like finer control.
|
| This does sound like a useful feature, not only for
| gfortran, but for all of gcc.
|
| Thoughts, comments?

The is a front end-independent infrastructure in place to name
diagnostics and filer them -- used by most GCC front ends.  Only
Gfortran seems to build its own ghetto.


[ Please don't use such offensive wording, there is no
"ghetto-building going on here. ]

The front-end independent infrastructure is not independent enough to
support the format of errors/warnings that gfortran writes out.
Gfortran writes out the line in the source file that has the issue,
and uses carrets to pin-point the location of the issue.  The language
independent infrastructure unfortunately still cannot do this.

Gr.
Steven


Re: general_operand() not accepting CONCAT?

2007-04-27 Thread Rask Ingemann Lambertsen
On Thu, Apr 26, 2007 at 01:52:37PM -0700, Richard Henderson wrote:
> On Thu, Apr 26, 2007 at 09:49:16PM +0200, Rask Ingemann Lambertsen wrote:
> >Unfortunately, the fallback code isn't exactly optimum, as it produces
> > something like
> > 
> > addw$-N*2,  %sp
> > movw%sp,%basereg
> > movw%wordN, N*2(%basereg)
> > ...
> > movw%word0, (%basereg)
> > 
> > which compared with
> > 
> > pushw   %wordN
> > ...
> > pushw   %word0
> 
> It's not supposed to.  Please debug emit_move_complex_push
> and find out why.  I suspect PUSH_ROUNDING is larger than
> it's supposed to be.

#define PUSH_ROUNDING(BYTES)(((BYTES) + 1) & ~1)

   I don't see how emit_move_complex_push() can ever generate a push
instruction. Here's a backtrace:

(gdb) fin
Run till exit from #0  push_operand (op=0xb7f7b118, mode=SFmode) at 
../../../cvssrc/gcc/gcc/recog.c:1299
0x0828f941 in emit_move_multi_word (mode=SFmode, x=0xb7f78bc4, y=0xb7f793d0) at 
../../../cvssrc/gcc/gcc/expr.c:3182
Value returned is $44 = 1
(gdb) bt
#0  0x0828f941 in emit_move_multi_word (mode=SFmode, x=0xb7f78bc4, 
y=0xb7f793d0) at ../../../cvssrc/gcc/gcc/expr.c:3182
#1  0x0829016e in emit_move_insn_1 (x=0xb7f78bc4, y=0xb7f793d0) at 
../../../cvssrc/gcc/gcc/expr.c:3291
#2  0x0829074d in emit_move_insn (x=0xb7f78bc4, y=0xb7f793d0) at 
../../../cvssrc/gcc/gcc/expr.c:3351
#3  0x0828f236 in emit_move_complex_push (mode=SCmode, x=0xb7f78bb8, 
y=0xb7f78078) at ../../../cvssrc/gcc/gcc/expr.c:3025
#4  0x0828f45d in emit_move_complex (mode=SCmode, x=0xb7f78bb8, y=0xb7f78078) 
at ../../../cvssrc/gcc/gcc/expr.c:3061
#5  0x0829003f in emit_move_insn_1 (x=0xb7f78bb8, y=0xb7f78078) at 
../../../cvssrc/gcc/gcc/expr.c:3264
#6  0x0829074d in emit_move_insn (x=0xb7f78bb8, y=0xb7f78078) at 
../../../cvssrc/gcc/gcc/expr.c:3351
#7  0x0829120e in emit_single_push_insn (mode=SCmode, x=0xb7f78078, 
type=0xb7edcbd0) at ../../../cvssrc/gcc/gcc/expr.c:3582
#8  0x08291d43 in emit_push_insn (x=0xb7f78078, mode=SCmode, type=0xb7edcbd0, 
size=0x0, align=16, partial=0, reg=0x0, extra=0, args_addr=0x0,
args_so_far=0xb7ecb210, reg_parm_stack_space=0, alignment_pad=0xb7ecb210) 
at ../../../cvssrc/gcc/gcc/expr.c:3852

(gdb) call debug_rtx(x)
(mem:SF (pre_dec:HI (reg/f:HI 12 sp)) [0 S4 A8])
(gdb) call debug_rtx(y)
(reg/v:SF 27 [ i+4 ])

   The only place where push_optab is consulted is at the beginning of
emit_single_push_insn(), which is only called from move_by_pieces() and
emit_push_insn(). emit_push_insn() isn't called from anywhere in expr.c.
I don't see how move_by_pieces() can be called by emit_move_insn(). There
seems to be no way that it could ever work.

> > (define_insn_and_split "*push1_concat"
> >   [(set (mem:COMPLEX (pre_dec:HI (reg:HI SP_REG)))
> > (concat:COMPLEX (match_operand: 0 "general_operand" "RmIpu")
> > (match_operand: 1 "general_operand" 
> > "RmIpu")))]
> 
> This is horrible.  At minimum you should expand this to
> two separate pushed immediately.

   Usually, doing so will fool reload's frame pointer elimination if the
operand is a pseudo which ends up on the stack. Diffing the output between
the two implementations confirms it:

--- /tmp/complex-3.s_expand 2007-04-27 15:50:49.0 +0200
+++ /tmp/complex-3.s_postsplit  2007-04-27 15:50:26.0 +0200
@@ -53,8 +53,8 @@
movw16(%di),%ax 
pushw   %cx 
pushw   %ax 
-   pushw   14(%di) 
-   pushw   12(%di) 
+   pushw   10(%di) 
+   pushw   8(%di)  
callg   
movw24(%di),%dx 
movw%dx,32(%di) 

-- 
Rask Ingemann Lambertsen


mismatch in parameter of builtin_ffs?

2007-04-27 Thread Erven ROHOU

Hello,

Looking at builtins, I think I have found something inconsistent.
__builtin_ffs is defined in the documentation as taking an unsigned int 
parameter:


 Built-in Function: int __builtin_ffs (unsigned int x)

However in the file builtins.def, it is defined as:

DEF_EXT_LIB_BUILTIN (BUILT_IN_FFS, "ffs", BT_FN_INT_INT, 
ATTR_CONST_NOTHROW_LIST)


that is: it takes an int.

I think it should be BT_FN_INT_UINT. (Other functions like clz, parity, 
popcount are defined with unsigned int.)

Unless I am missing something...

--
Erven.


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Geert Bosch


On Apr 27, 2007, at 06:12, Janne Blomqvist wrote:
I agree it can be an issue, but OTOH people who care about  
precision probably 1. avoid -ffast-math 2. use double precision  
(where these reciprocal instrs are not available). Intel calls it - 
no-prec-div, but it's enabled for the "-fast" catch-all option.


On a related note, our beloved competitors generally have some high  
level flag for combining all these fancy and potentially unsafe  
optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc,  
at least FP benchmarks seem to do generally well with something  
like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native  
-mfpmath=sse", but it's quite a mouthful.


No, using only 12 bits of precision is just ridiculous and should
not be included in -ffast-math. You should always use a Newton-Rhapson
step after getting the 12-bit approximation. When done correctly
this doubles the precision and gets you just about the 24 bits of
precision needed for float. Reciprocal approximations are meant
to be used that way, and it's no accident the lookup provides
exactly half the bits needed. For double precision you just do
two more iterations, which is why there is no need for double
precision variants of these instructions.

The cost for the extra step is small, and you get good results.
There are many variations possible, and using fused-multiply add
it's even possible to get correctly rounded results at low cost.
I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.

  -Geert


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Robert Dewar

Geert Bosch wrote:


I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.


If they do then they are probably seriously broken, not just
because they give complete junk results in this case, but
such an implementation would indicate a complete lack of
knowledge of how to do fpt reasonably. As Geert says, this
instruction is intended ONLY as part of an NR implementation.


Re: general_operand() not accepting CONCAT?

2007-04-27 Thread Richard Henderson
On Fri, Apr 27, 2007 at 04:00:13PM +0200, Rask Ingemann Lambertsen wrote:
>I don't see how emit_move_complex_push() can ever generate a push
> instruction. Here's a backtrace:

  emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
  read_complex_part (y, imag_first));
  return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
 read_complex_part (y, !imag_first));

Note that we're replacing (pre_dec:CSI sp) with two
instances of (pre_dec:SI sp).

>Usually, doing so will fool reload's frame pointer elimination if the
> operand is a pseudo which ends up on the stack. Diffing the output between
> the two implementations confirms it:

This doesn't look like frame pointer elimination at all,
just different stack slots allocated.  But that said, if
there's a bug in elimination, it should be fixed, not
hacked around in one backend.


r~


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Janne Blomqvist

Geert Bosch wrote:

On Apr 27, 2007, at 06:12, Janne Blomqvist wrote:
I agree it can be an issue, but OTOH people who care about precision 
probably 1. avoid -ffast-math 2. use double precision (where these 
reciprocal instrs are not available). Intel calls it -no-prec-div, but 
it's enabled for the "-fast" catch-all option.


No, using only 12 bits of precision is just ridiculous and should
not be included in -ffast-math. You should always use a Newton-Rhapson
step after getting the 12-bit approximation.


Yes, I realize that.


When done correctly
this doubles the precision and gets you just about the 24 bits of
precision needed for float. Reciprocal approximations are meant
to be used that way, and it's no accident the lookup provides
exactly half the bits needed.  For double precision you just do
two more iterations, which is why there is no need for double
precision variants of these instructions.


However, I didn't realize so few iterations were required to achieve 
(almost) full precision. That's pretty nice.



The cost for the extra step is small, and you get good results.
There are many variations possible, and using fused-multiply add
it's even possible to get correctly rounded results at low cost.
I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.


I guess so. I haven't checked the others, but Intel does indeed do a 
single NR step. However, if I change the subroutine in question to 
double precision, it uses divpd and sqrtpd instead of two NR iterations. 
According to the benchmarks I linked to in PR 31723, it could actually 
be faster to use the reciprocal + 2 NR iters for double precision, 
though in my own testing it turned out to be a wash.



--
Janne Blomqvist


Re: Accessing signgam from the middle-end for builtin lgamma

2007-04-27 Thread Tom Tromey
> "Kaveh" == Kaveh R GHAZI <[EMAIL PROTECTED]> writes:

Kaveh> I'm doing this at the tree level, so AIUI I have to be mindful of type,
Kaveh> scope and conflicts.  I also have to decide what to do in non-C.

There's nothing to do here for Java -- Java code can't access lgamma.

Not to be too negative (I am curious about this), but does this sort
of optimization really carry its own weight?  Is this a common thing
in numeric code or something like that?

Tom


Re: Performance analysis of Polyhedron/gas_dyn

2007-04-27 Thread Robert Dewar

Janne Blomqvist wrote:

However, I didn't realize so few iterations were required to achieve 
(almost) full precision. That's pretty nice.


NR is a nice iteration, you double the number of bits of precision
on each iteration (approximately :-)



Re: mismatch in parameter of builtin_ffs?

2007-04-27 Thread Richard Henderson
On Fri, Apr 27, 2007 at 04:23:35PM +0200, Erven ROHOU wrote:
> I think it should be BT_FN_INT_UINT. (Other functions like clz, parity, 
> popcount are defined with unsigned int.)
> Unless I am missing something...

man 3 ffs.


r~


Gomp in mainline is broken

2007-04-27 Thread H. J. Lu
FYI, gomp in mainline is broken:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31722

Possible cause may be:

http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01965.html


H.J.


Re: RFC: obsolete __builtin_apply?

2007-04-27 Thread Steve Ellcey
Andrew,  are you still planning on applying the libobjc patch that
removes the use of __builtin_apply?

Steve Ellcey
[EMAIL PROTECTED]


RE: mismatch in parameter of builtin_ffs?

2007-04-27 Thread Dave Korn
On 27 April 2007 16:58, Richard Henderson wrote:

> On Fri, Apr 27, 2007 at 04:23:35PM +0200, Erven ROHOU wrote:
>> I think it should be BT_FN_INT_UINT. (Other functions like clz, parity,
>> popcount are defined with unsigned int.)
>> Unless I am missing something...
> 
> man 3 ffs.
> 
> 
> r~


  Then it's a doco bug!


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: Accessing signgam from the middle-end for builtin lgamma

2007-04-27 Thread Kaveh R. GHAZI
On Fri, 27 Apr 2007, Tom Tromey wrote:

> Not to be too negative (I am curious about this), but does this sort of
> optimization really carry its own weight?  Is this a common thing in
> numeric code or something like that?
> Tom

I don't know that optimizing lgamma by itself makes a big difference.
However we're down to the last few C99 math functions and if I can get all
of them I think it's worthwhile to be complete.  For the record, the
remaining ones are lgamma/gamma and drem/remainder/remquo.  (Bessel
functions have been submitted but not approved yet.  Complex math however
still needs some TLC.)  If you can find something I've overlooked, please
let me know.

Taken as a whole, I do believe optimizing constant args helps numeric
code.  E.g. it's noted here that PI is often written as 4*atan(1) and that
this idiom appears in several SPEC benchmarks.
http://gcc.gnu.org/ml/gcc-patches/2003-05/msg02310.html

And of course there are many ways through macros, inlining, templates, and
various optimizations that a constant could be propagated into a math
function call.  When that happens, it is both a size and a speed win to
fold it.  And in the above PI case, folding atan also allows GCC to fold
the mult.

--Kaveh
--
Kaveh R. Ghazi  [EMAIL PROTECTED]


Re: GCC -On optimization passes: flag and doc issues

2007-04-27 Thread Joerg Wunsch
As Ian Lance Taylor wrote:

> > What's that test suite that has been mentioned here, and how to
> > run it?

> http://www.inf.u-szeged.hu/csibe/

Thanks for the pointer.  Got it.  Alas, that tool is completely
unportable, and requires Linux to run.  It suffers from bashomania
(like using $((I--)) when the POSIX way wouldn't require much more
work), and also uses non-portable options to other Unix tools (like
the option -f for time(1)).  I'm close to give up on that :(,
partially because of not getting it to run on my FreeBSD host, and
obviously, it stands no chance to be run against an AVR target system
anyway.

The idea behind that tool is great, I only wish the authors had taken
a class in portable shell scripting before.  It's not that all the
world's a Vax these days...

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


Re: GCC -On optimization passes: flag and doc issues

2007-04-27 Thread Steven Bosscher

On 4/27/07, Joerg Wunsch <[EMAIL PROTECTED]> wrote:

As Ian Lance Taylor wrote:

> > What's that test suite that has been mentioned here, and how to
> > run it?

> http://www.inf.u-szeged.hu/csibe/

Thanks for the pointer.  Got it.  Alas, that tool is completely
unportable, and requires Linux to run.  It suffers from bashomania
(like using $((I--)) when the POSIX way wouldn't require much more
work), and also uses non-portable options to other Unix tools (like
the option -f for time(1)).  I'm close to give up on that :(,
partially because of not getting it to run on my FreeBSD host, and
obviously, it stands no chance to be run against an AVR target system
anyway.

The idea behind that tool is great, I only wish the authors had taken
a class in portable shell scripting before.  It's not that all the
world's a Vax these days...


Patches welcome, I guess.

Gr.
Steven


Re: general_operand() not accepting CONCAT?

2007-04-27 Thread Rask Ingemann Lambertsen
On Fri, Apr 27, 2007 at 08:24:11AM -0700, Richard Henderson wrote:
> On Fri, Apr 27, 2007 at 04:00:13PM +0200, Rask Ingemann Lambertsen wrote:
> >I don't see how emit_move_complex_push() can ever generate a push
> > instruction. Here's a backtrace:
> 
>   emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
>   read_complex_part (y, imag_first));
>   return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
>  read_complex_part (y, !imag_first));
> 
> Note that we're replacing (pre_dec:CSI sp) with two
> instances of (pre_dec:SI sp).

   Yes. emit_move_insn() will call emit_move_insn_1(), which goes on to call
emit_move_multi_word(). Here, first emit_move_resolve_push() is called to
update the stack pointer. Then follows a loop to emit a sequence of move
insns, each moving one word, using emit_move_insn().

> >Usually, doing so will fool reload's frame pointer elimination if the
> > operand is a pseudo which ends up on the stack. Diffing the output between
> > the two implementations confirms it:
> 
> This doesn't look like frame pointer elimination at all,
> just different stack slots allocated.

   No, that was the only difference in the asm outputs.

> But that said, if
> there's a bug in elimination, it should be fixed, not
> hacked around in one backend.

   What happens when splitting during expand is that we get a sequence of
push insns:

(set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 6))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 4))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 2))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 0))

   During register allocation, the pseudo obj is put on the stack, let's say
(mem:DI (plus:HI (reg:HI %bp) (const_int -16)). So the insns look like this:

(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -10)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -12)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -14)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -16)))

   Now, reload comes along and eliminates %bp to %sp, let's say with an
elimination offset of 20. We get:

(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 10)))

   Reload sees that we decremented %sp by two and increases the elimination
offset accordingly for the next insn:

(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 8+2)))

   And so on for the next two insns:

(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 6+4)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 4+6)))

   The stack pointer is not a valid base register, so reload fixes it up:

(set (reg:HI %di) (reg:HI %sp))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10)))
(set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10)))

   I seems likely that reload inheritance contributes to the mess in some way.

-- 
Rask Ingemann Lambertsen


Re: DR#314 update

2007-04-27 Thread Geoffrey Keating


On 27/04/2007, at 2:50 AM, Joseph S. Myers wrote:


On Fri, 26 Apr 2007, Geoffrey Keating wrote:


This seems reasonable to me, but maybe it would be simpler to write

If there are one or more incomplete structure or union types which
cannot all be completed without producing undefined behaviour, the
behaviour is undefined.

if that gives the same effect (which I think it does)?


That suffers somewhat from the vagueness that afflicts this area (what
rearrangements of translation units are permitted in completing the
types?).


I wasn't thinking that the completion would necessarily be able to be  
written in the translation unit, just that there would be some  
possible completion.



 I considered some other examples and decided that what I wanted
was unifiability even in some cases that don't involve incomplete  
types.

For example:

// TU 1
void
f (void)
{
 struct s { int a; };
 extern struct s a, b;
}

void
g (void)
{
 struct s { int a; };
 extern struct s c, d;
}

// TU 2
void
h (void)
{
 struct s { int a; };
 extern struct s a, c;
}

void
i (void)
{
 struct s { int a; };
 extern struct s b, d;
}


Here, each object individually has compatible types in the two  
translation
units - but "a" and "b" have compatible complete types within TU 1,  
but

incompatible complete types withing TU 2.


Hmm.  That makes sense to me, so I agree your wording is better; but  
please please make sure that this example (and the other one from the  
original DR) gets into a footnote or an example or at least the  
rationale.

smime.p7s
Description: S/MIME cryptographic signature


Re: GCC -On optimization passes: flag and doc issues

2007-04-27 Thread Joerg Wunsch
As Steven Bosscher wrote:

> >The idea behind that tool is great, I only wish the authors had
> >taken a class in portable shell scripting before.  It's not that
> >all the world's a Vax these days...

> Patches welcome, I guess.

Well, quite an amount of work, alas.  There's no central template in
CSiBE where this could be changed, instead, they apparently manually
changed each and any of the Makefiles etc. in the src/ subdirectories
there, so it's almost 50 files to make identical changes to.

I intended to spend my time into trying the various possible GCC
configurations (including the really promising idea Richard Guenther
proposed), not into patching the benchmark tool.  It's not that I'm a
university student anymore that has almost indefinate time at hands to
spend...  That's been 20 years ago.

Another thing is to extend CSiBE so it could be used to compile some
meaningful AVR code.  It's not that I'm lacking that kind of code, but
these manually hacked Makefiles for each tool make it kinda difficult
to adapt the benchmark suite to different sources that are more
appropriate to the AVR.

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


gcc-4.3-20070427 is now available

2007-04-27 Thread gccadmin
Snapshot gcc-4.3-20070427 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20070427/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.3 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 124239

You'll find:

gcc-4.3-20070427.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.3-20070427.tar.bz2 C front end and core compiler

gcc-ada-4.3-20070427.tar.bz2  Ada front end and runtime

gcc-fortran-4.3-20070427.tar.bz2  Fortran front end and runtime

gcc-g++-4.3-20070427.tar.bz2  C++ front end and runtime

gcc-java-4.3-20070427.tar.bz2 Java front end and runtime

gcc-objc-4.3-20070427.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.3-20070427.tar.bz2The GCC testsuite

Diffs from 4.3-20070420 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.3
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


GCC 4.2.0: Still planning on RC1, etc.

2007-04-27 Thread Mark Mitchell
In case anyone here sends me an email, and gets my vacation auto-reply
for the next week: I do still plan to proceed with the 4.2.0 release
schedule in my last status report.

FYI,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713