Re: Bad code generation on HPPA platform
Steve Ellcey wrote: Steve Ellcey wrote: I am investigating a bad code generation bug on the 64 bit HPPA platform with GCC 4.3.0 and would like some help and/or ideas on how to analyze and fix it. The failing test is the SPEC 2000 GCC benchmark (version 2.7.2.2) and I have been unable to create a smaller test case so far. What I have found is that if I build GCC from version 127633 (after applying the patch for PR middle-end/33029 so that the build will work) I can build and run the SPEC GCC benchmark. If I update GCC to version 127634 then the benchmark will abort when it is run. The difference between those versions is this patch: 2007-08-19 Andrew Pinski <[EMAIL PROTECTED]> PR middle-end/32940 * cfgexpand.c (expand_one_register_var): Mark pointer DECL_ARTIFICIAL as REG_POINTER also. * stmt.c (expand_decl): Likewise. In the PR report for 32940 there is a pointer to: http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00020.html That patch fixed a bootstrap problem on HPPA and reverting it (which is what the patch for PR 32940 seems to do) seems to be reintroducing the problem that the earlier patch was intended to fix. While reverting that patch didn't seem to cause bootstrap problems it does seem to cause problems when building the older GCC version that is in SPEC2000. Any advise on how to proceed from here? Steve Ellcey [EMAIL PROTECTED] I'm well versed with the problems in this area and I'd be very leery of Andrew's patch. There are some thorny issues in this space and I'm far from 100% sure that blindly propagating a pointer type into REG_POINTER is always valid, particularly for compiler generated temporaries. Can you describe better what you're seeing in the 2.7.2.2 build that's causing problems? Jeff I am having trouble figuring out exactly what change in the generated code is causing the failure. The code still compiles but it aborts when I run the compiled code. In talking to David Anglin I do think it is a problem with a base reg and index reg getting mixed up but I can't put my finger on exactly where. Hmmm, I missed the fact that this was pa64, not pa32. It's been so long, I can't remember how PA64 HPUX sets up its spaces and whether or not that's likely to effect things in relevant ways. If it aborts, as in calling abort, rather than segfaulting, then it's not a flipped base/index in a memory reference -- those almost always segfault. This is the case that most worries me about Andrew's patch. Virtual origins in Ada in particular can break with Andrew patch. Triggering the problem might be hard, but there's little doubt in my mind that setting REG_POINTER on every compiler temporary with a pointer type is a bad idea. THe cases I've seen in old source code effectively mimicked Ada's virtual origins in C (which result in undefined behavior as you get pointers outside the object they're supposed to point to). The aliasing code queries REG_POINTER to determine how to analyze an address. You could be getting REG_POINTER on an index and mucking up the analysis that way. Or we could simply be getting better information into the aliasing code, which in turn may be exposing aliasing problems in the older code you're compiling. Jeff
Re: IRA for GCC 4.4
> With the compiler from the ira branch on x86_64-linux, here are the > timings reported by "gfortran -c -time -save-temps" with and without > IRA (two timings provided for each set of option, to check > reproducibility) OK, I come back with fresh numbers from the current IRA branch, rev. 135035, which I believe includes the fix for -O0 compilation time (thanks, by the way!). I'm still compiling the same huge testcase (from CP2K), which is a good example of relatively heavy use of Fortran 95 features. Memory used during compilation was up to 3 GB when optimization is turned on (this is a 8GB system, and I checked that disk swap didn't come into play). This is on x86_64-linux. At -O0: 3% decrease wrt current, no further effect for -fira-algorithm=CB At -O0 -g: 3% decrease wrt current, slightly smaller (-1.5%) with -fira-algorithm=CB At -O1: 7% increase wrt current; -fira-algorithm=CB turns this into only a 2% increase At -O2: 5% increase for -fira; only 1.5% increase when -fira-algorithm=CB is used At -O2 -ffast-math, -O3 and -O3 -ffast-math: roughly same as -O2, 3% to 5% increase for -fira, down to a 1%-2% increase when -fira-algorithm=CB is used. With -funroll-loops, -ftree-vectorize or both: again, roughly the same. I've also tried gfortran's -fbounds-check option, which increases a lot the amount of code emitted by the front-end for a given source, and haven't seen any significant different from the results reported above (in particular, no performance degradation). I've also played with -m32 at various optimization levels, and the results are again in the same range as above for -m64. *Conclusions* All in all, the -O0 performance is now on par with the old allocator, and at higher optimisation levels, we see a 3% to 5% regression. The CB algorithm is faster, with a regression of only 1.5% to 2%. I'll now turn to benchmarking of generated code (I'll run the Polyhedron benchmark, which is widely known and referred to in the Fortran community). I don't have the guts to do a systematic check of memory consumption of the compiler, but I think it'd be nice if someone could do that. FX PS: I attach the file containing all timings. For each set of option, I ran the compiler twice; when timings differ significantly, that's because of other users using the machine (which is a rather underused dual-core biprocessor, with an average load during my tests of 1.09), and I thus take the smallest number for calculations. -- FX Coudert http://www.homepages.ucl.ac.uk/~uccafco/ -O0 # f951 135.59 6.88 # f951 135.91 9.86 -O0 -fira # f951 131.26 6.41 # f951 131.19 6.49 -O0 -fira -fira-algorithm=CB # f951 131.20 6.76 # f951 130.84 6.80 -- -O1 # f951 477.87 14.74 # f951 478.26 14.46 -O1 -fira # f951 511.43 14.69 # f951 510.64 13.56 -O1 -fira -fira-algorithm=CB # f951 488.57 14.45 # f951 488.54 13.67 -- -O2 # f951 670.03 16.17 # f951 669.36 14.80 -O2 -fira # f951 701.83 14.23 # f951 703.29 15.17 -O2 -fira -fira-algorithm=CB # f951 682.19 15.01 # f951 678.86 15.06 -- -O2 -ffast-math # f951 675.44 16.60 # f951 673.41 16.63 -O2 -ffast-math -fira # f951 706.19 14.39 # f951 706.00 13.76 -O2 -ffast-math -fira -fira-algorithm=CB # f951 688.10 14.68 # f951 736.99 18.26 -- -O3 # f951 844.27 15.13 # f951 845.93 14.35 -O3 -fira # f951 872.07 16.54 # f951 873.54 13.72 -O3 -fira -fira-algorithm=CB # f951 854.09 14.85 # f951 847.93 16.90 -- -O3 -ffast-math # f951 846.92 14.47 # f951 846.12 16.58 -O3 -ffast-math -fira # f951 877.64 14.22 # f951 883.09 13.62 -O3 -ffast-math -fira -fira-algorithm=CB # f951 865.35 13.44 # f951 891.76 16.52 -- -O3 -ffast-math -funroll-loops # f951 1112.40 15.43 # f951 1091.32 15.83 -O3 -ffast-math -funroll-loops -fira # f951 1123.51 13.97 # f951 1126.89 15.50 -O3 -ffast-math -funroll-loops -fira -fira-algorithm=CB # f951 1106.21 15.21 # f951 1108.12 15.91 -- -O3 -ffast-math -funroll-loops -ftree-vectorize # f951 1093.59 14.93 # f951 1092.91 15.98 -O3 -ffast-math -funroll-loops -ftree-vectorize -fira # f951 1149.13 15.80 # f951 1134.78 14.84 -O3 -ffast-math -funroll-loops -ftree-vectorize -fira -fira-algorithm=CB # f951 1107.87 14.71 # f951 1092.80 13.97 -- -O0 -m32 # f951 133.29 6.63 # f951 133.38 6.97 -O0 -m32 -fira # f951 132.86 7.68 # f951 134.41 7.03 -O0 -m32 -fira -fira-algorithm=CB # f951 133.95 6.98 # f951 132.94 5.96 -- -O2 -m32 # f951 654.35 14.56 # f951 652.43 13.62 -O2 -m32 -fira # f951 675.74 14.10 # f951 686.01 13.97 -O2 -m32 -fira -fira-algorithm=CB # f951 659.19 14.44 # f951 666.36 14.48 -- -O3 -ffast-math -funroll-loops -ftree-vectorize -m32 # f951 974.28 15.45 # f951 1024.43 15.94 -O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira # f951 1028.05 13.84 # f951 1029.07 14.03 -O3 -ff
Re: IRA for GCC 4.4
FX wrote: PS: I attach the file containing all timings. For each set of option, I ran the compiler twice; when timings differ significantly, that's because of other users using the machine (which is a rather underused dual-core biprocessor, with an average load during my tests of 1.09), and I thus take the smallest number for calculations. Thanks for testing IRA. As I understand, in # f951 135.59 6.88 the first number is wall compilation time. Could you tell me what is the second one? Is it system time? I am trying to analyze the results and it would be useful to know what kind of processor did you use (AMD or Intel, what model). As for -O0, IRA does absolutely the same when -fira-algorithm=CB is used or not. Thanks again for the testing. I'll look forward for your results for Polyhedron benchmark. I should acknowledge that I never tested it. My observation of SPECFP2000 results, IRA gives less improvement than for SPECINT2000. As I understand floating point benchmarks are more memory bound and RA could not help a lot. More important is to have memory hierarchy optimizations for them.
Re: IRA for GCC 4.4
> Thanks for testing IRA. As I understand, in > > # f951 135.59 6.88 > > the first number is wall compilation time. Could you tell me what is the > second one? Is it system time? I suppose so. The two times are the output from "gfortran -time -S". > I am trying to analyze the results and it would be useful to know what kind > of processor did you use (AMD or Intel, what model). vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2220 stepping: 3 cpu MHz : 2814.508 cache size : 1024 KB FX -- FX Coudert http://www.homepages.ucl.ac.uk/~uccafco/
Re: Bad code generation on HPPA platform
> If it aborts, as in calling abort, rather than segfaulting, then it's > not a flipped base/index in a memory reference -- those almost always > segfault. This is the case that most worries me about Andrew's patch. Sorry I wasn't clearer, it is a segfault. Running under gdb: Program received signal SIGSEGV, Segmentation fault si_code: 0 - SEGV_UNKNOWN - Unknown Error. 0x400e61e4 in life_analysis (f=, nregs=1680) at flow.c:1166 1166 basic_block_live_at_end[i][j] |= x; >From disassembly: ;;; 1166 basic_block_live_at_end[i][j] |= x; 0x400e61e0 : ldd,s %ret0(%r20),%r19 0x400e61e4 : ldw %r19(%r8),%r31 0x400e61e8 : add,l %r19,%r8,%r19 0x400e61ec : or %r31,%r6,%r31 (gdb) i r flags: 2f41 r3: 83ffbfff9600 r4: 800100017c80 r5: 8000 r6:0 r7: 83ffb790 r8:0 r9: 80010001bec8 r10:0 r11: 80010001be10 r12: 83ffbfe8a000 r13: 80010001c2ec r14: 83ffbfffda90 r15: 83ffbfe89fc0 r16: 690 r17: 40033b94 r18: 40033bb8 arg7/r19: 800100220438 arg6/r20: 83ffbfffcc30 arg5/r21:0 arg4/r22: 83ffbfe9c748 arg3/r23: 83ffbfff9610 arg2/r24:0 arg1/r25: 83ffbfff9608 arg0/r26: 1ac dp/gp/r27: 800100017c80 ret0/r28: 1ac ret1/ap/r29: 83ffb790 sp/r30: 83ffb7c0 mrp/r31:0 sar/cr11: 3c If I am reading things right, the use of r8 and r19 in the ldw instruction are switched around. And in fact, when I created an assembly language file, swapped them around by hand, and then assembled the result and built an executable, everything seemed to work OK (I did not get a segfault). Steve Ellcey [EMAIL PROTECTED]
Re: Bad code generation on HPPA platform
Steve Ellcey wrote: If it aborts, as in calling abort, rather than segfaulting, then it's not a flipped base/index in a memory reference -- those almost always segfault. This is the case that most worries me about Andrew's patch. Sorry I wasn't clearer, it is a segfault. Running under gdb: Program received signal SIGSEGV, Segmentation fault si_code: 0 - SEGV_UNKNOWN - Unknown Error. 0x400e61e4 in life_analysis (f=, nregs=1680) at flow.c:1166 1166 basic_block_live_at_end[i][j] |= x; From disassembly: ;;; 1166 basic_block_live_at_end[i][j] |= x; 0x400e61e0 : ldd,s %ret0(%r20),%r19 0x400e61e4 : ldw %r19(%r8),%r31 0x400e61e8 : add,l %r19,%r8,%r19 0x400e61ec : or %r31,%r6,%r31 (gdb) i r flags: 2f41 r3: 83ffbfff9600 r4: 800100017c80 r5: 8000 r6:0 r7: 83ffb790 r8:0 r9: 80010001bec8 r10:0 r11: 80010001be10 r12: 83ffbfe8a000 r13: 80010001c2ec r14: 83ffbfffda90 r15: 83ffbfe89fc0 r16: 690 r17: 40033b94 r18: 40033bb8 arg7/r19: 800100220438 arg6/r20: 83ffbfffcc30 arg5/r21:0 arg4/r22: 83ffbfe9c748 arg3/r23: 83ffbfff9610 arg2/r24:0 arg1/r25: 83ffbfff9608 arg0/r26: 1ac dp/gp/r27: 800100017c80 ret0/r28: 1ac ret1/ap/r29: 83ffb790 sp/r30: 83ffb7c0 mrp/r31:0 sar/cr11: 3c If I am reading things right, the use of r8 and r19 in the ldw instruction are switched around. And in fact, when I created an assembly language file, swapped them around by hand, and then assembled the result and built an executable, everything seemed to work OK (I did not get a segfault). OK. Thanks. Nearly 100% certain we've got a flipped base/index. And just to be certain, we've used a recent GCC trunk to compile an old rev of gcc (2.7 era?), which is then segfaulting when it's trying to compile code, right? Assuming that's the case, I'd start by identifying the pseudos corresponding to %r19 and %r8 and verify that the one associated with %r8 has REG_POINTER set and %r19 is not yet. Then work backwards to find out how REG_POINTER on the pseudo associated with %r8 was set. Also be careful that tail merging hasn't merged two threads of control which it thought were identical, but where in one thread of control you've got %r8/%r19 as the base/index and in the other thread of control you've got %r19/%r8 as the base/index (they are identical in functionality on any sane target :-). I fixed that eons ago, but it's always possible it's reared its ugly head again. Jeff
Re: How to implement the instruction in the back end
"Mohamed Shafi" <[EMAIL PROTECTED]> writes: > For the 16-bit target that i porting now to gcc 4.1.2 doesn't have any > branch instructions. It only has jump instructions. For comparison > operation it has this instruction: > > if cond Rx Ry > execute this insn > > So compare and branch is implemented as > > if cond Rx Ry > jmp Label For gcc's purposes this is no different from having a usual conditional branch instruction. It's just a jump with a condition. > This instructions has also another form. To check whether a particular > bit in a register is set or not. > > if bs Rx, bitNo > execute this insn > > My questions is how will i be able to implement this instruction in > the back-end? Sure, this is just a conditional instruction where the condition is a ZERO_EXTRACT. Look at the ARM backend for examples of how to work with conditional instructions. Ian
Re: Bad code generation on HPPA platform
Jeff Law wrote: > And just to be certain, we've used a recent GCC trunk to compile an old > rev of gcc (2.7 era?), which is then segfaulting when it's trying to > compile code, right? Correct, I am using GCC 4.3.0 to compile the old (2.7) GCC and when I run that old GCC it segfaults. If I start with the ToT GCC instead of 4.3.0 GCC I have no problems. > Assuming that's the case, I'd start by identifying the pseudos > corresponding to %r19 and %r8 and verify that the one associated with > %r8 has REG_POINTER set and %r19 is not yet. Then work backwards to > find out how REG_POINTER on the pseudo associated with %r8 was set. I have looked at the psuedo that started getting marked with REG_POINTER after Andrew's change. It corresponds to basic_block_live_at_end[i] and this is a pointer (basic_block_live_at_end is an array of pointers), so I don't believe (at least in this instance) that Andrews patch is incorrectly marking something as a pointer when it isn't. > Also be careful that tail merging hasn't merged two threads of control > which it thought were identical, but where in one thread of control > you've got %r8/%r19 as the base/index and in the other thread of control > you've got %r19/%r8 as the base/index (they are identical in > functionality on any sane target :-). I fixed that eons ago, but it's > always possible it's reared its ugly head again. The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19 does not. I first see REG_POINTER set for ivtmp___1536 (the psuedo for %r8) in flow.c.138r.loop2_invariant. This seems interesting because Peter's patch, that fixes this problem without undoing Andrews patch, includes a change to loop-invariant.c, though that change should be preserving REG_POINTER's during optimization not preventing them. I have tested a port of Peter's changes (already on the main line) to the 4.3 branch. It does fix the problem and it causes no regressions on hppa. My current inclination is to submit that patch to gcc-patches as a backport to the branch. Steve Ellcey [EMAIL PROTECTED]
Re: Bad code generation on HPPA platform
> If I am reading things right, the use of r8 and r19 in the ldw > instruction are switched around. Yes. If you do an rtl dump, you should be able to see where the REG_POINTER flag gets set and if the operand order gets switched. Sometimes the REG_POINTER flag gets removed by reload, etc. So, the operand order should not change after this point. Dave -- J. David Anglin [EMAIL PROTECTED] National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
Re: Use of option -fprofile-arcs is not compatible with -fprofile-use
Oops, in my cutting and past I omitted the -O2 that goes with all compilations. Without it no optimization gets done, so no warnings... Regards Edmar Lijuan Hai wrote: sorry that I couldn't re-produce the warning as you said. micro# /import/dr3/s10/gcc-4.2/bin/gcc val-prof-1.c -fprofile-arcs -g -o val-prof-1.x1 micro# /import/dr3/s10/gcc-4.2/bin/gcc -v Using built-in specs. Target: sparc-sun-solaris2.10 Configured with: /import/dr2/starlex/1/gcc-4.2-20070228/configure --prefix=/import/dr3/s10/gcc-4.2 --enable-languages=c,c++,fortran --enable-rpath --with-mpfr=/import/dr3/s10/gcc-4.2 --with-gmp=/import/dr3/s10/gcc-4.2 Thread model: posix gcc version 4.2.0 20070228 (prerelease) so is gcc-4.3 on the platform. 2008/5/7 Edmar Wienskoski-RA8797 <[EMAIL PROTECTED]>: I said if you compile val-prof-1.c the same way bprob-1.c is compiled you get an warning. gcc -g -fprofile-arcs val-prof-1.c -o val-prof-1.x1 Lijuan Hai wrote: seen in gcc-4.2, gcc.misc-tests/bprob-1.c is compiled with -fprofile-arcs and -fbranch-probabilities. gcc.dg/tree-prof/val-prof-1.c is compiled with -fprofile-generate and -fprofile-use. so there won't be any warnings. 2008/4/25 Edmar Wienskoski-RA8797 <[EMAIL PROTECTED]>: The test case gcc.misc/bprob-1.c is compiled with fprofile-arcs / fprofile-use. The option fprofile-arcs does not enable value profiling. At the second stage compilation, the option fprofile-use enables value profiling. Within tree_find_values_to_profile, if one of the value optimizations algorithms sees an optimization opportunity, it will push an histogram on stack. Later, compute_value_histograms will call get_coverage_counts to load this histogram, but none where generated. A warning is issued which means a FAIL under dejagnu. I found this problem with bprob-1.c while debugging a new value profile optimization. But it can be reproduced in any target, with non-modified gcc, at any optimization level, using one of the value profile test cases and compiler options fprofile-arcs / fprofile-use (same used with bprob-1.c). Here is an example using gcc.dg/tree-prof/val-prof-1.c: ./gcc-trunk-reference/install_e600/bin/gcc -g -fprofile-arcs val-prof-1.c -o val-prof-1.x1 ./val-prof-1.x1 ./gcc-trunk-reference/install_e600/bin/gcc -g -fprofile-use val-prof-1.c -o val-prof-1.x2 val-prof-1.c: In function 'main': val-prof-1.c:17: warning: no coverage for function 'main' found IMHO there are 3 ways to go with this: 1 - Require user behavior change (create new option -fprofile-arcs-use to match -fprofile-arcs, mismatch of options is bad user behavior) 2 - Record on the .gcda file how the first stage were done (fprofile-arcs / fprofile-generate, etc) and use it to disable other optimizations under fprofile-use (Does this already exists ?, I am not familiar with the .gcda layout) 3 - Let get_coverage_counts ignore inconsistencies when loading data. Helps / comments are appreciated. Edmar
Re: Bad code generation on HPPA platform
Steve Ellcey wrote: The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19 does not. I first see REG_POINTER set for ivtmp___1536 (the psuedo for %r8) in flow.c.138r.loop2_invariant. This seems interesting because Peter's patch, that fixes this problem without undoing Andrews patch, includes a change to loop-invariant.c, though that change should be preserving REG_POINTER's during optimization not preventing them. OK. So what is ivtmp__1536 -- is it a pointer or an index? Can you show me the gimple code which declares and uses ivtmp_1536? Hmmm, fails for 4.3... Hmmm, does 4.3 have POINTER_PLUS_EXPR? (search tree.def for POINTER_PLUS_EXPR). Jeff
Re: Bad code generation on HPPA platform
On Thu, May 8, 2008 at 11:48 AM, Jeff Law <[EMAIL PROTECTED]> wrote: > Hmmm, fails for 4.3... Hmmm, does 4.3 have POINTER_PLUS_EXPR? > (search tree.def for POINTER_PLUS_EXPR). Yes it made it in 4.3 :). Which is why the other patch went in. Thanks, Andrew Pinski
Re: Bad code generation on HPPA platform
On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote: > The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19 > does not. I first see REG_POINTER set for ivtmp___1536 (the psuedo for > %r8) in flow.c.138r.loop2_invariant. This seems interesting because > Peter's patch, that fixes this problem without undoing Andrews patch, > includes a change to loop-invariant.c, though that change should be > preserving REG_POINTER's during optimization not preventing them. Similar to hppa, power6 cares about knowing whether a pseudo is a pointer or not, because for regA + regB load/store addressing, we get much better performance if regA is the pointer and regB is the offset rather than the other way around. What I found, was that the loop invariant and GCSE code were creating some pseudos to copy expressions into, but was failing to copy the REG_POINTER/MEM_POINTER attribute along with it. The hunk from: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html which replaced the rtlanal.c from the first commit was needed at -O0, because the only chance to order the operands at -O0 is at expand time. Peter
Re: Bad code generation on HPPA platform
Peter Bergner wrote: On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote: The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19 does not. I first see REG_POINTER set for ivtmp___1536 (the psuedo for %r8) in flow.c.138r.loop2_invariant. This seems interesting because Peter's patch, that fixes this problem without undoing Andrews patch, includes a change to loop-invariant.c, though that change should be preserving REG_POINTER's during optimization not preventing them. Similar to hppa, power6 cares about knowing whether a pseudo is a pointer OK. Reasonably simpler, though on the PA if you get it wrong, you get a hard failure rather than just poor performance :( What I found, was that the loop invariant and GCSE code were creating some pseudos to copy expressions into, but was failing to copy the REG_POINTER/MEM_POINTER attribute along with it. I recall. I was briefly worried that GCSE might have a problem similar to the tail merging problem I mentioend briefly, but I just did a quick audit and it looks clean (basically it doesn't take commutativity into account when hashing values). I can't offhand think of how LICM would run afoul muck things up in the way we're seeing, but obviously something isn't working the way we expect it to :-) jeff
gcc-4.3-20080508 is now available
Snapshot gcc-4.3-20080508 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20080508/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_3-branch revision 135093 You'll find: gcc-4.3-20080508.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20080508.tar.bz2 C front end and core compiler gcc-ada-4.3-20080508.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20080508.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20080508.tar.bz2 C++ front end and runtime gcc-java-4.3-20080508.tar.bz2 Java front end and runtime gcc-objc-4.3-20080508.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20080508.tar.bz2The GCC testsuite Diffs from 4.3-20080501 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RFH: Building and testing gimple-tuples-branch
The tuples branch is at the point now that it should bootstrap all primary languages and targets. There are things that are still broken and being worked on (http://gcc.gnu.org/wiki/tuples), but by and large things should Just Work. I expect things like code generation to be sub-par because some optimizations are still not converted (notably, loop passes, PRE, and TER). So, for folks with free cycles to spare, could you build the branch on your favourite target and report bugs? Bugzilla and/or email reports are OK. If you are creating a bugzilla report, please add my address to the CC field. Other than obvious brokenness, we are interested in compile time slow downs and increased memory utilization. Both of which are possible because we have spent no effort tuning the data structures yet. To build the branch: $ svn co svn://gcc.gnu.org/svn/gcc/branches/gimple-tuples-branch $ mkdir bld && cd bld $ ../gimple-tuples-branch/configure --disable-libgomp --disable-libmudflap $ make && make -k check Thanks. Diego.
ssa_name issues
Dear mailing list: I am writing GCC code that constructs GIMPLE (after pass_apply_inline and before pass_all_optimizations) to take the address of each of a function's parameters and store those addresses in an array. The code is at the bottom of this message. Right now I need help in dealing with errors of the form -- test.c:10: error: invalid operand to unary operator x_1(D) -- where x_1(D) is an SSA_NAME for one of the variables I'm trying to store (in this case, x). The errors appear at -O2. The complaint is on a NOP_EXPR involving x; the backtrace is -- #1 0x0087dbee in verify_expr (tp=0x2b484208, walk_subtrees=0x7fffb32ddf50, data=0x0) at /home/sean/fsl/aristotle/ src/modular-gcc/build-svn/../gcc-svn/gcc/tree-cfg.c:3284 #2 0x00a7860c in walk_tree (tp=0x2b484208, func=0x87cf15 , data=0x0, pset=0x0) at /home/sean/fsl/aristotle/src/ modular-gcc/build-svn/../gcc-svn/gcc/tree.c:7978 #3 0x0087f415 in verify_stmt (stmt=0x2b4841e0, last_in_block=0 '\0') at /home/sean/fsl/aristotle/src/modular-gcc/ build-svn/../gcc-svn/gcc/tree-cfg.c:3407 #4 0x0087fbdc in verify_stmts () at /home/sean/fsl/aristotle/ src/modular-gcc/build-svn/../gcc-svn/gcc/tree-cfg.c:3615 #5 0x00a0200b in verify_ssa (check_modified_stmt=1 '\001') at /home/sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/ tree-ssa.c:614 #6 0x007a6860 in execute_function_todo (data=0x420) at /home/ sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c:927 #7 0x007a6315 in do_per_function (callback=0x7a65b2 , data=0x420) at /home/sean/fsl/aristotle/src/ modular-gcc/build-svn/../gcc-svn/gcc/passes.c:770 #8 0x007a68eb in execute_todo (flags=1056) at /home/sean/fsl/ aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c:948 #9 0x007a6d6d in execute_one_pass (pass=0x12259a0) at /home/ sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c: 1093 #10 0x007a6dfa in execute_pass_list (pass=0x12259a0) at /home/ sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c: 1123 #11 0x008edc9c in tree_rest_of_compilation (fndecl=0x2b475a80) at /home/sean/fsl/aristotle/src/modular-gcc/ build-svn/../gcc-svn/gcc/tree-optimize.c:412 -- The relevant part of verify_expr is: -- static tree verify_expr (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { tree t = *tp, x; bool in_phi = (data != NULL); if (TYPE_P (t)) *walk_subtrees = 0; /* Check operand N for being valid GIMPLE and give error MSG if not. */ #define CHECK_OP(N, MSG) \ do { if (!is_gimple_val (TREE_OPERAND (t, N)))\ { error (MSG); return TREE_OPERAND (t, N); }} while (0) switch (TREE_CODE (t)) { /* snip */ case NOP_EXPR: case CONVERT_EXPR: case FIX_TRUNC_EXPR: case FLOAT_EXPR: case NEGATE_EXPR: case ABS_EXPR: case BIT_NOT_EXPR: case NON_LVALUE_EXPR: case TRUTH_NOT_EXPR: CHECK_OP (0, "invalid operand to unary operator"); break; -- Here's my code. I freely acknowledge that I may be doing this in a very bad way; if you can point me toward a replacement for my hand- rolled build_automatic, or show me the right way to make the array addressable, or bang me on the head for not executing a fixing-up pass before or after this code, all these comments are more than welcome. -- // Returns an automatic variable of the given type. tree build_automatic( tree type,/* type of variable */ const char* name) /* name for variable */ { tree ret = NULL; ret = build_decl(VAR_DECL, get_identifier(name), type); DECL_ARTIFICIAL(ret) = 1; /* declared by the compiler */ DECL_IGNORED_P(ret) = 1; /* no debug info */ TREE_READONLY(ret) = 0; /* writable */ DECL_EXTERNAL(ret) = 0; /* defined */ TREE_STATIC(ret) = 0; /* automatic */ TREE_USED(ret) = 1; /* used */ create_var_ann(ret); return ret; } tree build_pointer_array( tree* parms, /* array of PARAM_DECLs */ int num_parms,/* number of elements in parms */ block_stmt_iterator* iter)/* pointer to a location where I will place the assignments to the array */ { tree array_size = build_int_cst(size_type_node, num_parms); tree array_index_type = build_index_type(array_size); tree array_type = build_array_type(ptr_type_node, array_index_type); TREE_ADDRESSABLE(array_type) = 1; tree pointer_array = build_automatic(array_type, "pointers"); for(unsigned int i = 0; i < variables->size(); i++) { tree index= build_int_cst(array_index_type, i); tree min_value= TYPE_MIN_VALUE(array_index_type); tree size_in_align= build_int_cst(size_type_node, tree_low_cst(TYPE_SIZE(ptr_type_node), 0) / TYPE_ALIGN(ptr_type_node)); tree variable
Re: ssa_name issues
On Thu, May 8, 2008 at 4:02 PM, Sean Callanan <[EMAIL PROTECTED]> wrote: > Dear mailing list: > > I am writing GCC code that constructs GIMPLE (after pass_apply_inline and > before pass_all_optimizations) to take the address of each of a function's > parameters and store those addresses in an array. The code is at the bottom > of this message. Right now I need help in dealing with errors of the form Right, so you are taking a gimple register and turning it into a non gimple register. This will not work with extra work, the other way does work though (and is done in the addressable pass and the aliasing TODO). To fix it up, you have to fix up the rest of the IR to take into account you just turned that symbol into a non gimple register. You have to create many extra statements, one for each use of the symbol. Thanks, Andrew Pinski
Re: RFH: Building and testing gimple-tuples-branch
> "Diego" == Diego Novillo <[EMAIL PROTECTED]> writes: > are OK. If you are creating a bugzilla report, please add my address > to the CC field. Me too please. Aldy
Questions about attributes
I have questions about function parameter attributes. I'm trying to use attributes to indicate parameters that are used to pass values back out of functions and then analyze how they are used. I tried something like this: void foo(int *a __attribute__((user("out"; By itself, this works (in GCC 4.3, used for all tests discussed here): in GIMPLE, call sites for foo refer to a FUNCTION_DECL node that has a PARM_DECL for a with the the user "out" attribute. But if a function definition is visible, this changes. With this: void foo(int *a) {} at all sites, the FUNCTION_DECL's PARM_DECL has no attributes. I tried various tests, with multiple declarations and having definitions or not, and the results seem to be: - If there is a definition present, parameter attributes are taken from the definition. - If there is no definition in the file, parameter attributes are taken from the first declaration. I'd appreciate any help someone can give me with: Q1: Is this behavior intended? Q2: When there is a definition and a declaration, does GCC keep around both sets of attributes? If so, where are they found? Q3: If not, what's the best way to represent this information with attributes? The same way as nonnull? -- Dave Mandelin Mozilla Platform Engineer
Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?
Hi all, I was looking for ways to improve the MaverickCrunch division routine on ARM ep93xx, and noticed that there are few other architectures that don't have a hardware divide. IA-64 has a "frcpa" instruction that returns an estimate of the reciprocal of a float or double. Likewise, RS-6000 has a "fres" that also returns an estimate of the reciprocal of a float or double. x86 seems to have something similar with SSE - called "rcpps" - that also returns the estimated reciprocal. They all seem to make use of FMAC / FNMAC instructions to calculate the correct answer for x/y, through an Newton-Raphson and MAC Instructions. And the algorithms they use in GCC are different, due to the accuracy of the reciprocal estimate. http://en.wikipedia.org/wiki/N-th_root_algorithm http://en.wikipedia.org/wiki/Multiply-accumulate They also seem to use a similar algorithm to implement their sqrt function... My question is, are there any other architectures in GCC that don't have a reciprocal estimate instruction, but have a FMAC? I'd like to implement something similar for MaverickCrunch, using the integer 32-bit MAC functions, but there is no reciprocal estimate function on the MaverickCrunch. I guess a lookup table could be implemented, but how many entries will need to be generated, and how accurate will it have to be IEEE754 compliant (in the swdiv routine)? Also, where should I be sticking such an instruction / table? Should I put it in the kernel, and trap an invalid instruction? Alternatively, should I put it in libgcc or in glibc/uclibc?