Re: Performance analysis of Polyhedron/gas_dyn
On 4/27/07, Janne Blomqvist <[EMAIL PROTECTED]> wrote: Hi, I spent some time with oprofile, trying to figure out why we suck at the gas_dyn benchmark in polyhedron. It turns out that there are two lines that account for ~54% of the total runtime. In subroutine CHOZDT we have the line DTEMP = DX/(ABS(VEL) + SOUND) and in subroutine EOS the line CS(:NODES) = SQRT(CGAMMA*PRES(:NODES)/DENS(:NODES)) See also http://www.suse.de/~gcctest/c++bench/polyhedron/analysis.html (same conclusion for gas_dyn). Both of these lines are array expressions, but they are quite simple and gfortran manages to scalarize both of them without creating temporaries. Both loops also vectorize nicely, which is important since gas_dyn is a single precision program so vectorization is a real benefit on current cpu:s (vectorization alone reduces runtime from 30s to 24s on my athlon 64). You can find both subroutines simplified, with comments showing the oprofile data for the CPU_CLK_UNHALTED (basically, runtime) and L2_CACHE_MISS events for the critical lines, attached. For ifort, I had to disable -ipo to get any results for CHOZDT (probably inlined), but without -ipo I didn't get sensible results for EOS (seems like the line numbers got messed up somehow for opannotate), so the results are not entirely comparable. Nonetheless, the ifort timings change only marginally due to -ipo, so it shouldn't make a big difference. Ifort and other commercial compilers (I haven't tested others) still manage to beat gfortran quite badly, see e.g. http://www.polyhedron.com/ http://physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/ The reason, it seems, is that ifort (and presumably other commercial compilers with competitive scores in gas_dyn) avoids calculating divisions and square roots, replacing them with reciprocals and reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as 1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE instruction set contains very fast instructions for this, rcpps, rcpss, rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These instructions have latencies of 1-2 cycles vs. dozens or even hundreds of cycles for normal division and square root. The price to be paid for this speed is that these reciprocal instructions have an accuracy of only 12 bits, so clearly they can be enabled only for -ffast-math. And they are available only for single precision. I'll file a missed-optimization PR about this. I think that even with -ffast-math 12 bits accuracy is not ok. There is the possibility of doing another newton iteration step to improve accuracy, that would be ok for -ffast-math. We can, though, add an extra flag -msserecip or however you'd call it to enable use of the instructions with less accuracy. Richard.
Re: Performance analysis of Polyhedron/gas_dyn
On 4/27/07, Richard Guenther <[EMAIL PROTECTED]> wrote: I think that even with -ffast-math 12 bits accuracy is not ok. There is the possibility of doing another newton iteration step to improve accuracy, that would be ok for -ffast-math. We can, though, add an extra flag -msserecip or however you'd call it to enable use of the instructions with less accuracy. Which is already done for PPC at least the scalar code, see -mswdiv option. We don't do this for sqrt reciprocal yet but it is easy to special case for the reciprocal case :). -- Pinski
Re: DR#314 update
On Fri, 26 Apr 2007, Geoffrey Keating wrote: > This seems reasonable to me, but maybe it would be simpler to write > > If there are one or more incomplete structure or union types which > cannot all be completed without producing undefined behaviour, the > behaviour is undefined. > > if that gives the same effect (which I think it does)? That suffers somewhat from the vagueness that afflicts this area (what rearrangements of translation units are permitted in completing the types?). I considered some other examples and decided that what I wanted was unifiability even in some cases that don't involve incomplete types. For example: // TU 1 void f (void) { struct s { int a; }; extern struct s a, b; } void g (void) { struct s { int a; }; extern struct s c, d; } // TU 2 void h (void) { struct s { int a; }; extern struct s a, c; } void i (void) { struct s { int a; }; extern struct s b, d; } Here, each object individually has compatible types in the two translation units - but "a" and "b" have compatible complete types within TU 1, but incompatible complete types withing TU 2. I didn't feel it should be necessary to unify two incompatible types from the same translation unit (even if they'd be compatible in different translation units), nor to split the uses of a single type within a translation unit into two or more distinct and incompatible types. -- Joseph S. Myers [EMAIL PROTECTED]
Re: Performance analysis of Polyhedron/gas_dyn
Richard Guenther wrote: See also http://www.suse.de/~gcctest/c++bench/polyhedron/analysis.html (same conclusion for gas_dyn). Thanks, I seem to have completely missed that page (though I was aware of your polyhedron tester). >On 4/27/07, Janne Blomqvist <[EMAIL PROTECTED]> wrote: >> The reason, it seems, is that ifort (and presumably other commercial compilers with competitive scores in gas_dyn) avoids calculating divisions and square roots, replacing them with reciprocals and reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as 1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE instruction set contains very fast instructions for this, rcpps, rcpss, rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These instructions have latencies of 1-2 cycles vs. dozens or even hundreds of cycles for normal division and square root. The price to be paid for this speed is that these reciprocal instructions have an accuracy of only 12 bits, so clearly they can be enabled only for -ffast-math. And they are available only for single precision. I'll file a missed-optimization PR about this. I think that even with -ffast-math 12 bits accuracy is not ok. There is the possibility of doing another newton iteration step to improve accuracy, that would be ok for -ffast-math. We can, though, add an extra flag -msserecip or however you'd call it to enable use of the instructions with less accuracy. I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it -no-prec-div, but it's enabled for the "-fast" catch-all option. On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful. -- Janne Blomqvist
Re: assign numbers to warnings; treat selected warnings as errors
Thomas Koenig <[EMAIL PROTECTED]> writes: | [adjusting Subject and also forwarding to [EMAIL PROTECTED] | | On Wed, 2007-04-18 at 12:12 -0700, Vivek Rao wrote: | > Here is a feature of g95 that I would like to see in | > gfortran. G95 assigns numbers to warnings and allows | > selected warnings to be treated as errors. | | [...] | | > g95 -Wall -Wextra -Werror=113,115,137 xunused.f90 | > | > turns those warnings into errors. | > | > Gfortran does not assign numbers to warnings, and the | > option -Werror turns ALL warnings into errors. I'd | > like finer control. | | This does sound like a useful feature, not only for | gfortran, but for all of gcc. | | Thoughts, comments? The is a front end-independent infrastructure in place to name diagnostics and filer them -- used by most GCC front ends. Only Gfortran seems to build its own ghetto. -- Gaby
Re: assign numbers to warnings; treat selected warnings as errors
On 27 Apr 2007 08:50:57 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: Thomas Koenig <[EMAIL PROTECTED]> writes: | [adjusting Subject and also forwarding to [EMAIL PROTECTED] | | On Wed, 2007-04-18 at 12:12 -0700, Vivek Rao wrote: | > Here is a feature of g95 that I would like to see in | > gfortran. G95 assigns numbers to warnings and allows | > selected warnings to be treated as errors. | | [...] | | > g95 -Wall -Wextra -Werror=113,115,137 xunused.f90 | > | > turns those warnings into errors. | > | > Gfortran does not assign numbers to warnings, and the | > option -Werror turns ALL warnings into errors. I'd | > like finer control. | | This does sound like a useful feature, not only for | gfortran, but for all of gcc. | | Thoughts, comments? The is a front end-independent infrastructure in place to name diagnostics and filer them -- used by most GCC front ends. Only Gfortran seems to build its own ghetto. [ Please don't use such offensive wording, there is no "ghetto-building going on here. ] The front-end independent infrastructure is not independent enough to support the format of errors/warnings that gfortran writes out. Gfortran writes out the line in the source file that has the issue, and uses carrets to pin-point the location of the issue. The language independent infrastructure unfortunately still cannot do this. Gr. Steven
Re: general_operand() not accepting CONCAT?
On Thu, Apr 26, 2007 at 01:52:37PM -0700, Richard Henderson wrote: > On Thu, Apr 26, 2007 at 09:49:16PM +0200, Rask Ingemann Lambertsen wrote: > >Unfortunately, the fallback code isn't exactly optimum, as it produces > > something like > > > > addw$-N*2, %sp > > movw%sp,%basereg > > movw%wordN, N*2(%basereg) > > ... > > movw%word0, (%basereg) > > > > which compared with > > > > pushw %wordN > > ... > > pushw %word0 > > It's not supposed to. Please debug emit_move_complex_push > and find out why. I suspect PUSH_ROUNDING is larger than > it's supposed to be. #define PUSH_ROUNDING(BYTES)(((BYTES) + 1) & ~1) I don't see how emit_move_complex_push() can ever generate a push instruction. Here's a backtrace: (gdb) fin Run till exit from #0 push_operand (op=0xb7f7b118, mode=SFmode) at ../../../cvssrc/gcc/gcc/recog.c:1299 0x0828f941 in emit_move_multi_word (mode=SFmode, x=0xb7f78bc4, y=0xb7f793d0) at ../../../cvssrc/gcc/gcc/expr.c:3182 Value returned is $44 = 1 (gdb) bt #0 0x0828f941 in emit_move_multi_word (mode=SFmode, x=0xb7f78bc4, y=0xb7f793d0) at ../../../cvssrc/gcc/gcc/expr.c:3182 #1 0x0829016e in emit_move_insn_1 (x=0xb7f78bc4, y=0xb7f793d0) at ../../../cvssrc/gcc/gcc/expr.c:3291 #2 0x0829074d in emit_move_insn (x=0xb7f78bc4, y=0xb7f793d0) at ../../../cvssrc/gcc/gcc/expr.c:3351 #3 0x0828f236 in emit_move_complex_push (mode=SCmode, x=0xb7f78bb8, y=0xb7f78078) at ../../../cvssrc/gcc/gcc/expr.c:3025 #4 0x0828f45d in emit_move_complex (mode=SCmode, x=0xb7f78bb8, y=0xb7f78078) at ../../../cvssrc/gcc/gcc/expr.c:3061 #5 0x0829003f in emit_move_insn_1 (x=0xb7f78bb8, y=0xb7f78078) at ../../../cvssrc/gcc/gcc/expr.c:3264 #6 0x0829074d in emit_move_insn (x=0xb7f78bb8, y=0xb7f78078) at ../../../cvssrc/gcc/gcc/expr.c:3351 #7 0x0829120e in emit_single_push_insn (mode=SCmode, x=0xb7f78078, type=0xb7edcbd0) at ../../../cvssrc/gcc/gcc/expr.c:3582 #8 0x08291d43 in emit_push_insn (x=0xb7f78078, mode=SCmode, type=0xb7edcbd0, size=0x0, align=16, partial=0, reg=0x0, extra=0, args_addr=0x0, args_so_far=0xb7ecb210, reg_parm_stack_space=0, alignment_pad=0xb7ecb210) at ../../../cvssrc/gcc/gcc/expr.c:3852 (gdb) call debug_rtx(x) (mem:SF (pre_dec:HI (reg/f:HI 12 sp)) [0 S4 A8]) (gdb) call debug_rtx(y) (reg/v:SF 27 [ i+4 ]) The only place where push_optab is consulted is at the beginning of emit_single_push_insn(), which is only called from move_by_pieces() and emit_push_insn(). emit_push_insn() isn't called from anywhere in expr.c. I don't see how move_by_pieces() can be called by emit_move_insn(). There seems to be no way that it could ever work. > > (define_insn_and_split "*push1_concat" > > [(set (mem:COMPLEX (pre_dec:HI (reg:HI SP_REG))) > > (concat:COMPLEX (match_operand: 0 "general_operand" "RmIpu") > > (match_operand: 1 "general_operand" > > "RmIpu")))] > > This is horrible. At minimum you should expand this to > two separate pushed immediately. Usually, doing so will fool reload's frame pointer elimination if the operand is a pseudo which ends up on the stack. Diffing the output between the two implementations confirms it: --- /tmp/complex-3.s_expand 2007-04-27 15:50:49.0 +0200 +++ /tmp/complex-3.s_postsplit 2007-04-27 15:50:26.0 +0200 @@ -53,8 +53,8 @@ movw16(%di),%ax pushw %cx pushw %ax - pushw 14(%di) - pushw 12(%di) + pushw 10(%di) + pushw 8(%di) callg movw24(%di),%dx movw%dx,32(%di) -- Rask Ingemann Lambertsen
mismatch in parameter of builtin_ffs?
Hello, Looking at builtins, I think I have found something inconsistent. __builtin_ffs is defined in the documentation as taking an unsigned int parameter: Built-in Function: int __builtin_ffs (unsigned int x) However in the file builtins.def, it is defined as: DEF_EXT_LIB_BUILTIN (BUILT_IN_FFS, "ffs", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LIST) that is: it takes an int. I think it should be BT_FN_INT_UINT. (Other functions like clz, parity, popcount are defined with unsigned int.) Unless I am missing something... -- Erven.
Re: Performance analysis of Polyhedron/gas_dyn
On Apr 27, 2007, at 06:12, Janne Blomqvist wrote: I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it - no-prec-div, but it's enabled for the "-fast" catch-all option. On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful. No, using only 12 bits of precision is just ridiculous and should not be included in -ffast-math. You should always use a Newton-Rhapson step after getting the 12-bit approximation. When done correctly this doubles the precision and gets you just about the 24 bits of precision needed for float. Reciprocal approximations are meant to be used that way, and it's no accident the lookup provides exactly half the bits needed. For double precision you just do two more iterations, which is why there is no need for double precision variants of these instructions. The cost for the extra step is small, and you get good results. There are many variations possible, and using fused-multiply add it's even possible to get correctly rounded results at low cost. I truly doubt that any of the compilers you mention use these instructions without NR iteration to get required precision. -Geert
Re: Performance analysis of Polyhedron/gas_dyn
Geert Bosch wrote: I truly doubt that any of the compilers you mention use these instructions without NR iteration to get required precision. If they do then they are probably seriously broken, not just because they give complete junk results in this case, but such an implementation would indicate a complete lack of knowledge of how to do fpt reasonably. As Geert says, this instruction is intended ONLY as part of an NR implementation.
Re: general_operand() not accepting CONCAT?
On Fri, Apr 27, 2007 at 04:00:13PM +0200, Rask Ingemann Lambertsen wrote: >I don't see how emit_move_complex_push() can ever generate a push > instruction. Here's a backtrace: emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)), read_complex_part (y, imag_first)); return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)), read_complex_part (y, !imag_first)); Note that we're replacing (pre_dec:CSI sp) with two instances of (pre_dec:SI sp). >Usually, doing so will fool reload's frame pointer elimination if the > operand is a pseudo which ends up on the stack. Diffing the output between > the two implementations confirms it: This doesn't look like frame pointer elimination at all, just different stack slots allocated. But that said, if there's a bug in elimination, it should be fixed, not hacked around in one backend. r~
Re: Performance analysis of Polyhedron/gas_dyn
Geert Bosch wrote: On Apr 27, 2007, at 06:12, Janne Blomqvist wrote: I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it -no-prec-div, but it's enabled for the "-fast" catch-all option. No, using only 12 bits of precision is just ridiculous and should not be included in -ffast-math. You should always use a Newton-Rhapson step after getting the 12-bit approximation. Yes, I realize that. When done correctly this doubles the precision and gets you just about the 24 bits of precision needed for float. Reciprocal approximations are meant to be used that way, and it's no accident the lookup provides exactly half the bits needed. For double precision you just do two more iterations, which is why there is no need for double precision variants of these instructions. However, I didn't realize so few iterations were required to achieve (almost) full precision. That's pretty nice. The cost for the extra step is small, and you get good results. There are many variations possible, and using fused-multiply add it's even possible to get correctly rounded results at low cost. I truly doubt that any of the compilers you mention use these instructions without NR iteration to get required precision. I guess so. I haven't checked the others, but Intel does indeed do a single NR step. However, if I change the subroutine in question to double precision, it uses divpd and sqrtpd instead of two NR iterations. According to the benchmarks I linked to in PR 31723, it could actually be faster to use the reciprocal + 2 NR iters for double precision, though in my own testing it turned out to be a wash. -- Janne Blomqvist
Re: Accessing signgam from the middle-end for builtin lgamma
> "Kaveh" == Kaveh R GHAZI <[EMAIL PROTECTED]> writes: Kaveh> I'm doing this at the tree level, so AIUI I have to be mindful of type, Kaveh> scope and conflicts. I also have to decide what to do in non-C. There's nothing to do here for Java -- Java code can't access lgamma. Not to be too negative (I am curious about this), but does this sort of optimization really carry its own weight? Is this a common thing in numeric code or something like that? Tom
Re: Performance analysis of Polyhedron/gas_dyn
Janne Blomqvist wrote: However, I didn't realize so few iterations were required to achieve (almost) full precision. That's pretty nice. NR is a nice iteration, you double the number of bits of precision on each iteration (approximately :-)
Re: mismatch in parameter of builtin_ffs?
On Fri, Apr 27, 2007 at 04:23:35PM +0200, Erven ROHOU wrote: > I think it should be BT_FN_INT_UINT. (Other functions like clz, parity, > popcount are defined with unsigned int.) > Unless I am missing something... man 3 ffs. r~
Gomp in mainline is broken
FYI, gomp in mainline is broken: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31722 Possible cause may be: http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01965.html H.J.
Re: RFC: obsolete __builtin_apply?
Andrew, are you still planning on applying the libobjc patch that removes the use of __builtin_apply? Steve Ellcey [EMAIL PROTECTED]
RE: mismatch in parameter of builtin_ffs?
On 27 April 2007 16:58, Richard Henderson wrote: > On Fri, Apr 27, 2007 at 04:23:35PM +0200, Erven ROHOU wrote: >> I think it should be BT_FN_INT_UINT. (Other functions like clz, parity, >> popcount are defined with unsigned int.) >> Unless I am missing something... > > man 3 ffs. > > > r~ Then it's a doco bug! cheers, DaveK -- Can't think of a witty .sigline today
Re: Accessing signgam from the middle-end for builtin lgamma
On Fri, 27 Apr 2007, Tom Tromey wrote: > Not to be too negative (I am curious about this), but does this sort of > optimization really carry its own weight? Is this a common thing in > numeric code or something like that? > Tom I don't know that optimizing lgamma by itself makes a big difference. However we're down to the last few C99 math functions and if I can get all of them I think it's worthwhile to be complete. For the record, the remaining ones are lgamma/gamma and drem/remainder/remquo. (Bessel functions have been submitted but not approved yet. Complex math however still needs some TLC.) If you can find something I've overlooked, please let me know. Taken as a whole, I do believe optimizing constant args helps numeric code. E.g. it's noted here that PI is often written as 4*atan(1) and that this idiom appears in several SPEC benchmarks. http://gcc.gnu.org/ml/gcc-patches/2003-05/msg02310.html And of course there are many ways through macros, inlining, templates, and various optimizations that a constant could be propagated into a math function call. When that happens, it is both a size and a speed win to fold it. And in the above PI case, folding atan also allows GCC to fold the mult. --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: GCC -On optimization passes: flag and doc issues
As Ian Lance Taylor wrote: > > What's that test suite that has been mentioned here, and how to > > run it? > http://www.inf.u-szeged.hu/csibe/ Thanks for the pointer. Got it. Alas, that tool is completely unportable, and requires Linux to run. It suffers from bashomania (like using $((I--)) when the POSIX way wouldn't require much more work), and also uses non-portable options to other Unix tools (like the option -f for time(1)). I'm close to give up on that :(, partially because of not getting it to run on my FreeBSD host, and obviously, it stands no chance to be run against an AVR target system anyway. The idea behind that tool is great, I only wish the authors had taken a class in portable shell scripting before. It's not that all the world's a Vax these days... -- cheers, J"org .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-)
Re: GCC -On optimization passes: flag and doc issues
On 4/27/07, Joerg Wunsch <[EMAIL PROTECTED]> wrote: As Ian Lance Taylor wrote: > > What's that test suite that has been mentioned here, and how to > > run it? > http://www.inf.u-szeged.hu/csibe/ Thanks for the pointer. Got it. Alas, that tool is completely unportable, and requires Linux to run. It suffers from bashomania (like using $((I--)) when the POSIX way wouldn't require much more work), and also uses non-portable options to other Unix tools (like the option -f for time(1)). I'm close to give up on that :(, partially because of not getting it to run on my FreeBSD host, and obviously, it stands no chance to be run against an AVR target system anyway. The idea behind that tool is great, I only wish the authors had taken a class in portable shell scripting before. It's not that all the world's a Vax these days... Patches welcome, I guess. Gr. Steven
Re: general_operand() not accepting CONCAT?
On Fri, Apr 27, 2007 at 08:24:11AM -0700, Richard Henderson wrote: > On Fri, Apr 27, 2007 at 04:00:13PM +0200, Rask Ingemann Lambertsen wrote: > >I don't see how emit_move_complex_push() can ever generate a push > > instruction. Here's a backtrace: > > emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)), > read_complex_part (y, imag_first)); > return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)), > read_complex_part (y, !imag_first)); > > Note that we're replacing (pre_dec:CSI sp) with two > instances of (pre_dec:SI sp). Yes. emit_move_insn() will call emit_move_insn_1(), which goes on to call emit_move_multi_word(). Here, first emit_move_resolve_push() is called to update the stack pointer. Then follows a loop to emit a sequence of move insns, each moving one word, using emit_move_insn(). > >Usually, doing so will fool reload's frame pointer elimination if the > > operand is a pseudo which ends up on the stack. Diffing the output between > > the two implementations confirms it: > > This doesn't look like frame pointer elimination at all, > just different stack slots allocated. No, that was the only difference in the asm outputs. > But that said, if > there's a bug in elimination, it should be fixed, not > hacked around in one backend. What happens when splitting during expand is that we get a sequence of push insns: (set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 6)) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 4)) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 2)) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (subreg:HI (reg:HI obj) 0)) During register allocation, the pseudo obj is put on the stack, let's say (mem:DI (plus:HI (reg:HI %bp) (const_int -16)). So the insns look like this: (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -10))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -12))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -14))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %bp) -16))) Now, reload comes along and eliminates %bp to %sp, let's say with an elimination offset of 20. We get: (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 10))) Reload sees that we decremented %sp by two and increases the elimination offset accordingly for the next insn: (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 8+2))) And so on for the next two insns: (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 6+4))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %sp) 4+6))) The stack pointer is not a valid base register, so reload fixes it up: (set (reg:HI %di) (reg:HI %sp)) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10))) (set (mem:HI (pre_dec:HI (reg:HI %sp))) (mem:HI (plus:HI (reg:HI %di) 10))) I seems likely that reload inheritance contributes to the mess in some way. -- Rask Ingemann Lambertsen
Re: DR#314 update
On 27/04/2007, at 2:50 AM, Joseph S. Myers wrote: On Fri, 26 Apr 2007, Geoffrey Keating wrote: This seems reasonable to me, but maybe it would be simpler to write If there are one or more incomplete structure or union types which cannot all be completed without producing undefined behaviour, the behaviour is undefined. if that gives the same effect (which I think it does)? That suffers somewhat from the vagueness that afflicts this area (what rearrangements of translation units are permitted in completing the types?). I wasn't thinking that the completion would necessarily be able to be written in the translation unit, just that there would be some possible completion. I considered some other examples and decided that what I wanted was unifiability even in some cases that don't involve incomplete types. For example: // TU 1 void f (void) { struct s { int a; }; extern struct s a, b; } void g (void) { struct s { int a; }; extern struct s c, d; } // TU 2 void h (void) { struct s { int a; }; extern struct s a, c; } void i (void) { struct s { int a; }; extern struct s b, d; } Here, each object individually has compatible types in the two translation units - but "a" and "b" have compatible complete types within TU 1, but incompatible complete types withing TU 2. Hmm. That makes sense to me, so I agree your wording is better; but please please make sure that this example (and the other one from the original DR) gets into a footnote or an example or at least the rationale. smime.p7s Description: S/MIME cryptographic signature
Re: GCC -On optimization passes: flag and doc issues
As Steven Bosscher wrote: > >The idea behind that tool is great, I only wish the authors had > >taken a class in portable shell scripting before. It's not that > >all the world's a Vax these days... > Patches welcome, I guess. Well, quite an amount of work, alas. There's no central template in CSiBE where this could be changed, instead, they apparently manually changed each and any of the Makefiles etc. in the src/ subdirectories there, so it's almost 50 files to make identical changes to. I intended to spend my time into trying the various possible GCC configurations (including the really promising idea Richard Guenther proposed), not into patching the benchmark tool. It's not that I'm a university student anymore that has almost indefinate time at hands to spend... That's been 20 years ago. Another thing is to extend CSiBE so it could be used to compile some meaningful AVR code. It's not that I'm lacking that kind of code, but these manually hacked Makefiles for each tool make it kinda difficult to adapt the benchmark suite to different sources that are more appropriate to the AVR. -- cheers, J"org .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-)
gcc-4.3-20070427 is now available
Snapshot gcc-4.3-20070427 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20070427/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 124239 You'll find: gcc-4.3-20070427.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20070427.tar.bz2 C front end and core compiler gcc-ada-4.3-20070427.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20070427.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20070427.tar.bz2 C++ front end and runtime gcc-java-4.3-20070427.tar.bz2 Java front end and runtime gcc-objc-4.3-20070427.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20070427.tar.bz2The GCC testsuite Diffs from 4.3-20070420 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
GCC 4.2.0: Still planning on RC1, etc.
In case anyone here sends me an email, and gets my vacation auto-reply for the next week: I do still plan to proceed with the 4.2.0 release schedule in my last status report. FYI, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713