date:20080508

Re: Bad code generation on HPPA platform

2008-05-08 Thread Jeff Law


Steve Ellcey wrote:

Steve Ellcey wrote:

I am investigating a bad code generation bug on the 64 bit HPPA platform
with GCC 4.3.0 and would like some help and/or ideas on how to analyze
and fix it.  The failing test is the SPEC 2000 GCC benchmark (version
2.7.2.2) and I have been unable to create a smaller test case so far.

What I have found is that if I build GCC from version 127633 (after
applying the patch for PR middle-end/33029 so that the build will work)
I can build and run the SPEC GCC benchmark.  If I update GCC to version
127634 then the benchmark will abort when it is run.

The difference between those versions is this patch:

2007-08-19  Andrew Pinski  <[EMAIL PROTECTED]>
 PR middle-end/32940
 * cfgexpand.c  (expand_one_register_var): Mark pointer
 DECL_ARTIFICIAL as REG_POINTER also.
 * stmt.c (expand_decl): Likewise.

In the PR report for 32940 there is a pointer to:

http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00020.html

That patch fixed a bootstrap problem on HPPA and reverting it (which is
what the patch for PR 32940 seems to do) seems to be reintroducing the
problem that the earlier patch was intended to fix.  While reverting
that patch didn't seem to cause bootstrap problems it does seem to cause
problems when building the older GCC version that is in SPEC2000.  Any
advise on how to proceed from here?

Steve Ellcey
[EMAIL PROTECTED]

I'm well versed with the problems in this area and I'd be very leery of
Andrew's patch.  There are some thorny issues in this space and I'm
far from 100% sure that blindly propagating a pointer type into 
REG_POINTER is always valid, particularly for compiler generated 
temporaries.


Can you describe better what you're seeing in the 2.7.2.2 build that's 
causing problems?


Jeff


I am having trouble figuring out exactly what change in the generated
code is causing the failure.  The code still compiles but it aborts when
I run the compiled code.  In talking to David Anglin I do think it is a
problem with a base reg and index reg getting mixed up but I can't
put my finger on exactly where.
Hmmm, I missed the fact that this was pa64, not pa32.  It's been so 
long, I can't remember how PA64 HPUX sets up its spaces and whether or 
not that's likely to effect things in relevant ways.


If it aborts, as in calling abort, rather than segfaulting, then it's 
not a flipped base/index in a memory reference -- those almost always 
segfault.  This is the case that most worries me about Andrew's patch.


Virtual origins in Ada in particular can break with Andrew patch. 
Triggering the problem might be hard, but there's little doubt in my 
mind that setting REG_POINTER on every compiler temporary with a pointer 
type is a bad idea.  THe cases I've seen in old source code effectively 
mimicked Ada's virtual origins in C (which result in undefined behavior 
as you get pointers outside the object they're supposed to point to).


The aliasing code queries REG_POINTER to determine how to analyze an 
address.  You could be getting REG_POINTER on an index and mucking up 
the analysis that way.  Or we could simply be getting better information 
into the aliasing code, which in turn may be exposing aliasing problems 
in the older code you're compiling.


Jeff

Re: IRA for GCC 4.4

2008-05-08 Thread FX

>  With the compiler from the ira branch on x86_64-linux, here are the
>  timings reported by "gfortran -c -time -save-temps" with and without
>  IRA (two timings provided for each set of option, to check
>  reproducibility)

OK, I come back with fresh numbers from the current IRA branch, rev.
135035, which I believe includes the fix for -O0 compilation time
(thanks, by the way!). I'm still compiling the same huge testcase
(from CP2K), which is a good example of relatively heavy use of
Fortran 95 features. Memory used during compilation was up to 3 GB
when optimization is turned on (this is a 8GB system, and I checked
that disk swap didn't come into play). This is on x86_64-linux.


At -O0: 3% decrease wrt current, no further effect for -fira-algorithm=CB
At -O0 -g: 3% decrease wrt current, slightly smaller (-1.5%) with
-fira-algorithm=CB
At -O1: 7% increase wrt current; -fira-algorithm=CB turns this into
only a 2% increase
At -O2: 5% increase for -fira; only 1.5% increase when
-fira-algorithm=CB is used
At -O2 -ffast-math, -O3 and -O3 -ffast-math: roughly same as -O2, 3%
to 5% increase for -fira, down to a 1%-2% increase when
-fira-algorithm=CB is used.
With -funroll-loops, -ftree-vectorize or both: again, roughly the same.

I've also tried gfortran's -fbounds-check option, which increases a
lot the amount of code emitted by the front-end for a given source,
and haven't seen any significant different from the results reported
above (in particular, no performance degradation).

I've also played with -m32 at various optimization levels, and the
results are again in the same range as above for -m64.


*Conclusions*

All in all, the -O0 performance is now on par with the old allocator,
and at higher optimisation levels, we see a 3% to 5% regression. The
CB algorithm is faster, with a regression of only 1.5% to 2%.

I'll now turn to benchmarking of generated code (I'll run the
Polyhedron benchmark, which is widely known and referred to in the
Fortran community). I don't have the guts to do a systematic check of
memory consumption of the compiler, but I think it'd be nice if
someone could do that.

FX


PS: I attach the file containing all timings. For each set of option,
I ran the compiler twice; when timings differ significantly, that's
because of other users using the machine (which is a rather underused
dual-core biprocessor, with an average load during my tests of 1.09),
and I thus take the smallest number for calculations.

-- 
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/
-O0
# f951 135.59 6.88
# f951 135.91 9.86

-O0 -fira
# f951 131.26 6.41
# f951 131.19 6.49

-O0 -fira -fira-algorithm=CB
# f951 131.20 6.76
# f951 130.84 6.80

--

-O1
# f951 477.87 14.74
# f951 478.26 14.46

-O1 -fira
# f951 511.43 14.69
# f951 510.64 13.56

-O1 -fira -fira-algorithm=CB
# f951 488.57 14.45
# f951 488.54 13.67

--

-O2 
# f951 670.03 16.17
# f951 669.36 14.80
-O2 -fira
# f951 701.83 14.23
# f951 703.29 15.17
-O2 -fira -fira-algorithm=CB
# f951 682.19 15.01
# f951 678.86 15.06

--

-O2 -ffast-math 
# f951 675.44 16.60
# f951 673.41 16.63
-O2 -ffast-math -fira
# f951 706.19 14.39
# f951 706.00 13.76
-O2 -ffast-math -fira -fira-algorithm=CB
# f951 688.10 14.68
# f951 736.99 18.26

--

-O3 
# f951 844.27 15.13
# f951 845.93 14.35
-O3 -fira
# f951 872.07 16.54
# f951 873.54 13.72
-O3 -fira -fira-algorithm=CB
# f951 854.09 14.85
# f951 847.93 16.90

--

-O3 -ffast-math 
# f951 846.92 14.47
# f951 846.12 16.58
-O3 -ffast-math -fira
# f951 877.64 14.22
# f951 883.09 13.62
-O3 -ffast-math -fira -fira-algorithm=CB
# f951 865.35 13.44
# f951 891.76 16.52

--

-O3 -ffast-math -funroll-loops 
# f951 1112.40 15.43
# f951 1091.32 15.83
-O3 -ffast-math -funroll-loops -fira
# f951 1123.51 13.97
# f951 1126.89 15.50
-O3 -ffast-math -funroll-loops -fira -fira-algorithm=CB
# f951 1106.21 15.21
# f951 1108.12 15.91

--

-O3 -ffast-math -funroll-loops -ftree-vectorize 
# f951 1093.59 14.93
# f951 1092.91 15.98
-O3 -ffast-math -funroll-loops -ftree-vectorize -fira
# f951 1149.13 15.80
# f951 1134.78 14.84
-O3 -ffast-math -funroll-loops -ftree-vectorize -fira -fira-algorithm=CB
# f951 1107.87 14.71
# f951 1092.80 13.97

--

-O0 -m32 
# f951 133.29 6.63
# f951 133.38 6.97
-O0 -m32 -fira
# f951 132.86 7.68
# f951 134.41 7.03
-O0 -m32 -fira -fira-algorithm=CB
# f951 133.95 6.98
# f951 132.94 5.96

--

-O2 -m32 
# f951 654.35 14.56
# f951 652.43 13.62
-O2 -m32 -fira
# f951 675.74 14.10
# f951 686.01 13.97
-O2 -m32 -fira -fira-algorithm=CB
# f951 659.19 14.44
# f951 666.36 14.48

--

-O3 -ffast-math -funroll-loops -ftree-vectorize -m32 
# f951 974.28 15.45
# f951 1024.43 15.94
-O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira
# f951 1028.05 13.84
# f951 1029.07 14.03
-O3 -ff

Re: IRA for GCC 4.4

2008-05-08 Thread Vladimir Makarov


FX wrote:


PS: I attach the file containing all timings. For each set of option,
I ran the compiler twice; when timings differ significantly, that's
because of other users using the machine (which is a rather underused
dual-core biprocessor, with an average load during my tests of 1.09),
and I thus take the smallest number for calculations.

  

Thanks for testing IRA.  As I understand, in

# f951 135.59 6.88

the first number is wall compilation time.  Could you tell me what is 
the second one? Is it system time?


I am trying to analyze the results and it would be useful to know what 
kind of processor did you use (AMD or Intel, what model).


As for -O0, IRA does absolutely the same when -fira-algorithm=CB is used 
or not.


Thanks again for the testing.  I'll look forward for your results for 
Polyhedron benchmark.  I should acknowledge that I never tested it.  My 
observation of SPECFP2000 results, IRA gives less improvement than for 
SPECINT2000.  As I understand floating point benchmarks are more memory 
bound and RA could not help a lot.  More important is to have memory 
hierarchy optimizations for them.

Re: IRA for GCC 4.4

2008-05-08 Thread FX

>  Thanks for testing IRA.  As I understand, in
>
>  # f951 135.59 6.88
>
>  the first number is wall compilation time.  Could you tell me what is the
> second one? Is it system time?

I suppose so. The two times are the output from "gfortran -time -S".

>  I am trying to analyze the results and it would be useful to know what kind
> of processor did you use (AMD or Intel, what model).

vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2220
stepping: 3
cpu MHz : 2814.508
cache size  : 1024 KB


FX

-- 
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/

Re: Bad code generation on HPPA platform

2008-05-08 Thread Steve Ellcey

> If it aborts, as in calling abort, rather than segfaulting, then it's 
> not a flipped base/index in a memory reference -- those almost always 
> segfault.  This is the case that most worries me about Andrew's patch.

Sorry I wasn't clearer, it is a segfault.  Running under gdb:

Program received signal SIGSEGV, Segmentation fault
  si_code: 0 - SEGV_UNKNOWN - Unknown Error.
0x400e61e4 in life_analysis (f=, nregs=1680)
at flow.c:1166
1166  basic_block_live_at_end[i][j] |= x;



>From disassembly:

;;;  1166 basic_block_live_at_end[i][j] |= x;
0x400e61e0 :   ldd,s %ret0(%r20),%r19
0x400e61e4 :   ldw %r19(%r8),%r31
0x400e61e8 :   add,l %r19,%r8,%r19
0x400e61ec :   or %r31,%r6,%r31



(gdb) i r
  flags: 2f41

 r3: 83ffbfff9600   r4: 800100017c80
 r5: 8000   r6:0
 r7: 83ffb790   r8:0
 r9: 80010001bec8  r10:0
r11: 80010001be10  r12: 83ffbfe8a000
r13: 80010001c2ec  r14: 83ffbfffda90
r15: 83ffbfe89fc0  r16:  690
r17: 40033b94  r18: 40033bb8
   arg7/r19: 800100220438 arg6/r20: 83ffbfffcc30
   arg5/r21:0 arg4/r22: 83ffbfe9c748
   arg3/r23: 83ffbfff9610 arg2/r24:0
   arg1/r25: 83ffbfff9608 arg0/r26:  1ac
  dp/gp/r27: 800100017c80 ret0/r28:  1ac
ret1/ap/r29: 83ffb790   sp/r30: 83ffb7c0
mrp/r31:0 sar/cr11:   3c

If I am reading things right, the use of r8 and r19 in the ldw
instruction are switched around.  And in fact, when I created an
assembly language file, swapped them around by hand, and then assembled
the result and built an executable, everything seemed to work OK (I did
not get a segfault).

Steve Ellcey
[EMAIL PROTECTED]

Re: Bad code generation on HPPA platform

2008-05-08 Thread Jeff Law


Steve Ellcey wrote:
If it aborts, as in calling abort, rather than segfaulting, then it's 
not a flipped base/index in a memory reference -- those almost always 
segfault.  This is the case that most worries me about Andrew's patch.


Sorry I wasn't clearer, it is a segfault.  Running under gdb:

Program received signal SIGSEGV, Segmentation fault
  si_code: 0 - SEGV_UNKNOWN - Unknown Error.
0x400e61e4 in life_analysis (f=, nregs=1680)
at flow.c:1166
1166  basic_block_live_at_end[i][j] |= x;



From disassembly:

;;;  1166 basic_block_live_at_end[i][j] |= x;
0x400e61e0 :   ldd,s %ret0(%r20),%r19
0x400e61e4 :   ldw %r19(%r8),%r31
0x400e61e8 :   add,l %r19,%r8,%r19
0x400e61ec :   or %r31,%r6,%r31



(gdb) i r
  flags: 2f41

 r3: 83ffbfff9600   r4: 800100017c80
 r5: 8000   r6:0
 r7: 83ffb790   r8:0
 r9: 80010001bec8  r10:0
r11: 80010001be10  r12: 83ffbfe8a000
r13: 80010001c2ec  r14: 83ffbfffda90
r15: 83ffbfe89fc0  r16:  690
r17: 40033b94  r18: 40033bb8
   arg7/r19: 800100220438 arg6/r20: 83ffbfffcc30
   arg5/r21:0 arg4/r22: 83ffbfe9c748
   arg3/r23: 83ffbfff9610 arg2/r24:0
   arg1/r25: 83ffbfff9608 arg0/r26:  1ac
  dp/gp/r27: 800100017c80 ret0/r28:  1ac
ret1/ap/r29: 83ffb790   sp/r30: 83ffb7c0
mrp/r31:0 sar/cr11:   3c

If I am reading things right, the use of r8 and r19 in the ldw
instruction are switched around.  And in fact, when I created an
assembly language file, swapped them around by hand, and then assembled
the result and built an executable, everything seemed to work OK (I did
not get a segfault).



OK.  Thanks.  Nearly 100% certain we've got a flipped base/index.

And just to be certain, we've used a recent GCC trunk to compile an old 
rev of gcc (2.7 era?), which is then segfaulting when it's trying to 
compile code, right?


Assuming that's the case, I'd start by identifying the pseudos 
corresponding to %r19 and %r8 and verify that the one associated with 
%r8 has REG_POINTER set and %r19 is not yet.  Then work backwards to 
find out how REG_POINTER on the pseudo associated with %r8 was set.


Also be careful that tail merging hasn't merged two threads of control 
which it thought were identical, but where in one thread of control 
you've got %r8/%r19 as the base/index and in the other thread of control 
you've got %r19/%r8 as the base/index (they are identical in 
functionality on any sane target :-).  I fixed that eons ago, but it's 
always possible it's reared its ugly head again.


Jeff

Re: How to implement the instruction in the back end

2008-05-08 Thread Ian Lance Taylor

"Mohamed Shafi" <[EMAIL PROTECTED]> writes:

> For the 16-bit target that i porting now to gcc 4.1.2 doesn't have any
> branch instructions. It only has jump instructions. For comparison
> operation it has this instruction:
>
> if cond Rx Ry
>  execute this insn
>
> So compare and branch is implemented as
>
> if cond Rx Ry
>   jmp Label

For gcc's purposes this is no different from having a usual
conditional branch instruction.  It's just a jump with a condition.

> This instructions has also another form. To check whether a particular
> bit in a register is set or not.
>
> if bs Rx, bitNo
>  execute this insn
>
> My questions is how will i be able to implement this instruction in
> the back-end?

Sure, this is just a conditional instruction where the condition is a
ZERO_EXTRACT.

Look at the ARM backend for examples of how to work with conditional
instructions.

Ian

Re: Bad code generation on HPPA platform

2008-05-08 Thread Steve Ellcey

Jeff Law wrote:

> And just to be certain, we've used a recent GCC trunk to compile an old 
> rev of gcc (2.7 era?), which is then segfaulting when it's trying to 
> compile code, right?

Correct, I am using GCC 4.3.0 to compile the old (2.7) GCC and when I
run that old GCC it segfaults.  If I start with the ToT GCC instead of
4.3.0 GCC I have no problems.

> Assuming that's the case, I'd start by identifying the pseudos 
> corresponding to %r19 and %r8 and verify that the one associated with 
> %r8 has REG_POINTER set and %r19 is not yet.  Then work backwards to 
> find out how REG_POINTER on the pseudo associated with %r8 was set.

I have looked at the psuedo that started getting marked with REG_POINTER
after Andrew's change.  It corresponds to basic_block_live_at_end[i] and
this is a pointer (basic_block_live_at_end is an array of pointers), so
I don't believe (at least in this instance) that Andrews patch is
incorrectly marking something as a pointer when it isn't.

> Also be careful that tail merging hasn't merged two threads of control 
> which it thought were identical, but where in one thread of control 
> you've got %r8/%r19 as the base/index and in the other thread of control 
> you've got %r19/%r8 as the base/index (they are identical in 
> functionality on any sane target :-).  I fixed that eons ago, but it's 
> always possible it's reared its ugly head again.

The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19
does not.  I first see REG_POINTER set for ivtmp___1536 (the psuedo for
%r8) in flow.c.138r.loop2_invariant.  This seems interesting because
Peter's patch, that fixes this problem without undoing Andrews patch,
includes a change to loop-invariant.c, though that change should be
preserving REG_POINTER's during optimization not preventing them.

I have tested a port of Peter's changes (already on the main line) to
the 4.3 branch.  It does fix the problem and it causes no regressions on
hppa.  My current inclination is to submit that patch to gcc-patches as
a backport to the branch.

Steve Ellcey
[EMAIL PROTECTED]

Re: Bad code generation on HPPA platform

2008-05-08 Thread John David Anglin

> If I am reading things right, the use of r8 and r19 in the ldw
> instruction are switched around.

Yes.  If you do an rtl dump, you should be able to see where the
REG_POINTER flag gets set and if the operand order gets switched.
Sometimes the REG_POINTER flag gets removed by reload, etc.  So,
the operand order should not change after this point.

Dave
-- 
J. David Anglin  [EMAIL PROTECTED]
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

Re: Use of option -fprofile-arcs is not compatible with -fprofile-use

2008-05-08 Thread Edmar Wienskoski-RA8797

Oops, in my cutting and past I omitted the -O2 that goes with all 
compilations.

Without it no optimization gets done, so no warnings...

Regards
Edmar


Lijuan Hai wrote:

sorry that I couldn't re-produce the warning as you said.

micro# /import/dr3/s10/gcc-4.2/bin/gcc val-prof-1.c -fprofile-arcs -g
-o val-prof-1.x1
micro# /import/dr3/s10/gcc-4.2/bin/gcc -v
Using built-in specs.
Target: sparc-sun-solaris2.10
Configured with: /import/dr2/starlex/1/gcc-4.2-20070228/configure
--prefix=/import/dr3/s10/gcc-4.2 --enable-languages=c,c++,fortran
--enable-rpath --with-mpfr=/import/dr3/s10/gcc-4.2
--with-gmp=/import/dr3/s10/gcc-4.2
Thread model: posix
gcc version 4.2.0 20070228 (prerelease)

so is gcc-4.3 on the platform.

2008/5/7 Edmar Wienskoski-RA8797 <[EMAIL PROTECTED]>:
  

I said if you compile val-prof-1.c the same way bprob-1.c is compiled you
get an warning.


 gcc -g -fprofile-arcs val-prof-1.c -o val-prof-1.x1




 Lijuan Hai wrote:



seen in gcc-4.2, gcc.misc-tests/bprob-1.c is compiled with
-fprofile-arcs and -fbranch-probabilities.
gcc.dg/tree-prof/val-prof-1.c is compiled with -fprofile-generate and
-fprofile-use. so there won't be any warnings.

2008/4/25 Edmar Wienskoski-RA8797 <[EMAIL PROTECTED]>:


  

The test case gcc.misc/bprob-1.c is compiled with fprofile-arcs /
fprofile-use.

 The option fprofile-arcs does not enable value profiling.

 At the second stage compilation, the option fprofile-use enables value
profiling. Within tree_find_values_to_profile, if one of the value
optimizations algorithms sees an optimization opportunity, it will push


an


histogram on stack. Later, compute_value_histograms will call
get_coverage_counts to load this histogram, but none where generated.

 A warning is issued which means a FAIL under dejagnu.

 I found this problem with bprob-1.c while debugging a new value profile
optimization. But it can be reproduced in any target, with non-modified


gcc,


at any optimization level, using one of the value profile test cases and
compiler options fprofile-arcs / fprofile-use (same used with


bprob-1.c).


 Here is an example using gcc.dg/tree-prof/val-prof-1.c:
 ./gcc-trunk-reference/install_e600/bin/gcc -g -fprofile-arcs


val-prof-1.c


-o val-prof-1.x1
 ./val-prof-1.x1
 ./gcc-trunk-reference/install_e600/bin/gcc -g -fprofile-use


val-prof-1.c -o


val-prof-1.x2
 val-prof-1.c: In function 'main':
 val-prof-1.c:17: warning: no coverage for function 'main' found

 IMHO there are 3 ways to go with this:
 1 - Require user behavior change (create new option -fprofile-arcs-use


to


match -fprofile-arcs, mismatch of options is bad user behavior)
 2 - Record on the .gcda file how the first stage were done


(fprofile-arcs /


fprofile-generate, etc) and use it to disable other optimizations under
fprofile-use (Does this already exists ?, I am not familiar with the


.gcda


layout)
 3 - Let get_coverage_counts ignore inconsistencies when loading data.

 Helps / comments are appreciated.

 Edmar

Re: Bad code generation on HPPA platform

2008-05-08 Thread Jeff Law


Steve Ellcey wrote:


The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19
does not.  I first see REG_POINTER set for ivtmp___1536 (the psuedo for
%r8) in flow.c.138r.loop2_invariant.  This seems interesting because
Peter's patch, that fixes this problem without undoing Andrews patch,
includes a change to loop-invariant.c, though that change should be
preserving REG_POINTER's during optimization not preventing them.
OK.  So what is ivtmp__1536 -- is it a pointer or an index?  Can you 
show me the gimple code which declares and uses ivtmp_1536?



Hmmm, fails for 4.3...  Hmmm, does 4.3 have POINTER_PLUS_EXPR?
(search tree.def for POINTER_PLUS_EXPR).

Jeff

Re: Bad code generation on HPPA platform

2008-05-08 Thread Andrew Pinski

On Thu, May 8, 2008 at 11:48 AM, Jeff Law <[EMAIL PROTECTED]> wrote:
> Hmmm, fails for 4.3...  Hmmm, does 4.3 have POINTER_PLUS_EXPR?
> (search tree.def for POINTER_PLUS_EXPR).

Yes it made it in 4.3 :).  Which is why the other patch went in.

Thanks,
Andrew Pinski

Re: Bad code generation on HPPA platform

2008-05-08 Thread Peter Bergner

On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote:
> The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19
> does not.  I first see REG_POINTER set for ivtmp___1536 (the psuedo for
> %r8) in flow.c.138r.loop2_invariant.  This seems interesting because
> Peter's patch, that fixes this problem without undoing Andrews patch,
> includes a change to loop-invariant.c, though that change should be
> preserving REG_POINTER's during optimization not preventing them.

Similar to hppa, power6 cares about knowing whether a pseudo is a pointer
or not, because for regA + regB load/store addressing, we get much better
performance if regA is the pointer and regB is the offset rather than
the other way around.  What I found, was that the loop invariant and
GCSE code were creating some pseudos to copy expressions into, but was
failing to copy the REG_POINTER/MEM_POINTER attribute along with it.

The hunk from:

  http://gcc.gnu.org/ml/gcc-patches/2008-04/msg00693.html

which replaced the rtlanal.c from the first commit was needed at -O0,
because the only chance to order the operands at -O0 is at expand time.

Peter

Re: Bad code generation on HPPA platform

2008-05-08 Thread Jeff Law


Peter Bergner wrote:

On Thu, 2008-05-08 at 11:38 -0700, Steve Ellcey wrote:

The psuedo for %r8 does have REG_POINTER set and the psuedo for %r19
does not.  I first see REG_POINTER set for ivtmp___1536 (the psuedo for
%r8) in flow.c.138r.loop2_invariant.  This seems interesting because
Peter's patch, that fixes this problem without undoing Andrews patch,
includes a change to loop-invariant.c, though that change should be
preserving REG_POINTER's during optimization not preventing them.


Similar to hppa, power6 cares about knowing whether a pseudo is a pointer
OK.  Reasonably simpler, though on the PA if you get it wrong, you get a 
 hard failure rather than just poor performance :(



  What I found, was that the loop invariant and
GCSE code were creating some pseudos to copy expressions into, but was
failing to copy the REG_POINTER/MEM_POINTER attribute along with it.
I recall.  I was briefly worried that GCSE might have a problem similar 
to the tail merging problem I mentioend briefly, but I just did a quick 
audit and it looks clean (basically it doesn't take commutativity into

account when hashing values).

I can't offhand think of how LICM would run afoul muck things up in the 
way we're seeing, but obviously something isn't working the way we 
expect it to :-)


jeff

gcc-4.3-20080508 is now available

2008-05-08 Thread gccadmin

Snapshot gcc-4.3-20080508 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20080508/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.3 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_3-branch 
revision 135093

You'll find:

gcc-4.3-20080508.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.3-20080508.tar.bz2 C front end and core compiler

gcc-ada-4.3-20080508.tar.bz2  Ada front end and runtime

gcc-fortran-4.3-20080508.tar.bz2  Fortran front end and runtime

gcc-g++-4.3-20080508.tar.bz2  C++ front end and runtime

gcc-java-4.3-20080508.tar.bz2 Java front end and runtime

gcc-objc-4.3-20080508.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.3-20080508.tar.bz2The GCC testsuite

Diffs from 4.3-20080501 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.3
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

RFH: Building and testing gimple-tuples-branch

2008-05-08 Thread Diego Novillo



The tuples branch is at the point now that it should bootstrap all 
primary languages and targets.  There are things that are still broken 
and being worked on (http://gcc.gnu.org/wiki/tuples), but by and large 
things should Just Work.


I expect things like code generation to be sub-par because some 
optimizations are still not converted (notably, loop passes, PRE, and TER).


So, for folks with free cycles to spare, could you build the branch on 
your favourite target and report bugs?  Bugzilla and/or email reports 
are OK.  If you are creating a bugzilla report, please add my address to 
the CC field.


Other than obvious brokenness, we are interested in compile time slow 
downs and increased memory utilization.  Both of which are possible 
because we have spent no effort tuning the data structures yet.


To build the branch:

$ svn co svn://gcc.gnu.org/svn/gcc/branches/gimple-tuples-branch
$ mkdir bld && cd bld
$ ../gimple-tuples-branch/configure --disable-libgomp --disable-libmudflap
$ make && make -k check


Thanks.  Diego.

ssa_name issues

2008-05-08 Thread Sean Callanan


Dear mailing list:

I am writing GCC code that constructs GIMPLE (after pass_apply_inline  
and before pass_all_optimizations) to take the address of each of a  
function's parameters and store those addresses in an array.  The code  
is at the bottom of this message.  Right now I need help in dealing  
with errors of the form


--
test.c:10: error: invalid operand to unary operator
x_1(D)
--

where x_1(D) is an SSA_NAME for one of the variables I'm trying to  
store (in this case, x).  The errors appear at -O2.  The complaint is  
on a NOP_EXPR involving x; the backtrace is


--
#1  0x0087dbee in verify_expr (tp=0x2b484208,  
walk_subtrees=0x7fffb32ddf50, data=0x0) at /home/sean/fsl/aristotle/ 
src/modular-gcc/build-svn/../gcc-svn/gcc/tree-cfg.c:3284
#2  0x00a7860c in walk_tree (tp=0x2b484208, func=0x87cf15  
, data=0x0, pset=0x0) at /home/sean/fsl/aristotle/src/ 
modular-gcc/build-svn/../gcc-svn/gcc/tree.c:7978
#3  0x0087f415 in verify_stmt (stmt=0x2b4841e0,  
last_in_block=0 '\0') at /home/sean/fsl/aristotle/src/modular-gcc/ 
build-svn/../gcc-svn/gcc/tree-cfg.c:3407
#4  0x0087fbdc in verify_stmts () at /home/sean/fsl/aristotle/ 
src/modular-gcc/build-svn/../gcc-svn/gcc/tree-cfg.c:3615
#5  0x00a0200b in verify_ssa (check_modified_stmt=1 '\001')  
at /home/sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/ 
tree-ssa.c:614
#6  0x007a6860 in execute_function_todo (data=0x420) at /home/ 
sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c:927
#7  0x007a6315 in do_per_function (callback=0x7a65b2  
, data=0x420) at /home/sean/fsl/aristotle/src/ 
modular-gcc/build-svn/../gcc-svn/gcc/passes.c:770
#8  0x007a68eb in execute_todo (flags=1056) at /home/sean/fsl/ 
aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c:948
#9  0x007a6d6d in execute_one_pass (pass=0x12259a0) at /home/ 
sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c: 
1093
#10 0x007a6dfa in execute_pass_list (pass=0x12259a0) at /home/ 
sean/fsl/aristotle/src/modular-gcc/build-svn/../gcc-svn/gcc/passes.c: 
1123
#11 0x008edc9c in tree_rest_of_compilation  
(fndecl=0x2b475a80) at /home/sean/fsl/aristotle/src/modular-gcc/ 
build-svn/../gcc-svn/gcc/tree-optimize.c:412

--

The relevant part of verify_expr is:

--
static tree
verify_expr (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
{
  tree t = *tp, x;
  bool in_phi = (data != NULL);

  if (TYPE_P (t))
*walk_subtrees = 0;

  /* Check operand N for being valid GIMPLE and give error MSG if  
not.  */

#define CHECK_OP(N, MSG) \
  do { if (!is_gimple_val (TREE_OPERAND (t, N)))\
   { error (MSG); return TREE_OPERAND (t, N); }} while (0)

  switch (TREE_CODE (t))
{
/* snip */
case NOP_EXPR:
case CONVERT_EXPR:
case FIX_TRUNC_EXPR:
case FLOAT_EXPR:
case NEGATE_EXPR:
case ABS_EXPR:
case BIT_NOT_EXPR:
case NON_LVALUE_EXPR:
case TRUTH_NOT_EXPR:
  CHECK_OP (0, "invalid operand to unary operator");
  break;
--

Here's my code.  I freely acknowledge that I may be doing this in a  
very bad way; if you can point me toward a replacement for my hand- 
rolled build_automatic, or show me the right way to make the array  
addressable, or bang me on the head for not executing a fixing-up pass  
before or after this code, all these comments are more than welcome.


--
// Returns an automatic variable of the given type.
tree build_automatic(
  tree type,/* type of variable */
  const char* name) /* name for variable */
{
  tree ret = NULL;

  ret = build_decl(VAR_DECL, get_identifier(name), type);

  DECL_ARTIFICIAL(ret) = 1; /* declared by the compiler */
  DECL_IGNORED_P(ret) = 1;  /* no debug info */
  TREE_READONLY(ret) = 0;   /* writable */
  DECL_EXTERNAL(ret) = 0;   /* defined */
  TREE_STATIC(ret) = 0; /* automatic */
  TREE_USED(ret) = 1;   /* used */

  create_var_ann(ret);

  return ret;
}

tree build_pointer_array(
  tree* parms,  /* array of PARAM_DECLs */
  int num_parms,/* number of elements in parms */
  block_stmt_iterator* iter)/* pointer to a location where I will  
place the assignments to the array */

{
  tree array_size = build_int_cst(size_type_node, num_parms);
  tree array_index_type   = build_index_type(array_size);
  tree array_type = build_array_type(ptr_type_node,  
array_index_type);

  TREE_ADDRESSABLE(array_type) = 1;
  tree pointer_array  = build_automatic(array_type, "pointers");

  for(unsigned int i = 0; i < variables->size(); i++)
{
  tree index= build_int_cst(array_index_type, i);
  tree min_value= TYPE_MIN_VALUE(array_index_type);
  tree size_in_align= build_int_cst(size_type_node,  
tree_low_cst(TYPE_SIZE(ptr_type_node), 0) / TYPE_ALIGN(ptr_type_node));

  tree variable

Re: ssa_name issues

2008-05-08 Thread Andrew Pinski

On Thu, May 8, 2008 at 4:02 PM, Sean Callanan <[EMAIL PROTECTED]> wrote:
> Dear mailing list:
>
> I am writing GCC code that constructs GIMPLE (after pass_apply_inline and
> before pass_all_optimizations) to take the address of each of a function's
> parameters and store those addresses in an array.  The code is at the bottom
> of this message.  Right now I need help in dealing with errors of the form

Right, so you are taking a gimple register and turning it into a non
gimple register.
This will not work with extra work, the other way does work though
(and is done in the addressable pass and the aliasing TODO).
To fix it up, you have to fix up the rest of the IR to take into
account you just turned that symbol into a non gimple register.  You
have to create many extra statements, one for each use of the symbol.

Thanks,
Andrew Pinski

Re: RFH: Building and testing gimple-tuples-branch

2008-05-08 Thread Aldy Hernandez

> "Diego" == Diego Novillo <[EMAIL PROTECTED]> writes:

 > are OK.  If you are creating a bugzilla report, please add my address
 > to the CC field.

 Me too please.

Aldy

Questions about attributes

2008-05-08 Thread David Mandelin

I have questions about function parameter attributes. I'm trying to use 
attributes to indicate parameters that are used to pass values back out 
of functions and then analyze how they are used. I tried something like 
this:


 void foo(int *a __attribute__((user("out";

By itself, this works (in GCC 4.3, used for all tests discussed here): 
in GIMPLE, call sites for foo refer to a FUNCTION_DECL node that has a 
PARM_DECL for a with the the user "out" attribute.


But if a function definition is visible, this changes. With this:

 void foo(int *a) {}

at all sites, the FUNCTION_DECL's PARM_DECL has no attributes.

I tried various tests, with multiple declarations and having definitions 
or not, and the results seem to be:


 - If there is a definition present, parameter attributes are taken from
   the definition.

 - If there is no definition in the file, parameter attributes are taken
   from the first declaration.

I'd appreciate any help someone can give me with:

Q1: Is this behavior intended?

Q2: When there is a definition and a declaration, does GCC keep around 
both sets of attributes? If so, where are they found?



Q3: If not, what's the best way to represent this information with 
attributes? The same way as nonnull?


--
Dave Mandelin
Mozilla Platform Engineer

Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?

2008-05-08 Thread Hasjim Williams

Hi all,

I was looking for ways to improve the MaverickCrunch division routine on
ARM ep93xx, and noticed that there are few other architectures that
don't have a hardware divide.

IA-64 has a "frcpa" instruction that returns an estimate of the
reciprocal of a float or double.
Likewise, RS-6000 has a "fres" that also returns an estimate of the
reciprocal of a float or double.
x86 seems to have something similar with SSE - called "rcpps" - that
also returns the estimated reciprocal.

They all seem to make use of FMAC / FNMAC instructions to calculate the
correct answer for x/y, through an Newton-Raphson and MAC Instructions. 
And the algorithms they use in GCC are different, due to the accuracy of
the reciprocal estimate.

http://en.wikipedia.org/wiki/N-th_root_algorithm
http://en.wikipedia.org/wiki/Multiply-accumulate

They also seem to use a similar algorithm to implement their sqrt
function...

My question is, are there any other architectures in GCC that don't have
a reciprocal estimate instruction, but have a FMAC?

I'd like to implement something similar for MaverickCrunch, using the
integer 32-bit MAC functions, but there is no reciprocal estimate
function on the MaverickCrunch.  I guess a lookup table could be
implemented, but how many entries will need to be generated, and how
accurate will it have to be IEEE754 compliant (in the swdiv routine)?

Also, where should I be sticking such an instruction / table?  Should I
put it in the kernel, and trap an invalid instruction?  Alternatively,
should I put it in libgcc or in glibc/uclibc?

Re: Bad code generation on HPPA platform

Re: IRA for GCC 4.4

Re: IRA for GCC 4.4

Re: IRA for GCC 4.4

Re: Bad code generation on HPPA platform

Re: Bad code generation on HPPA platform

Re: How to implement the instruction in the back end

Re: Bad code generation on HPPA platform

Re: Bad code generation on HPPA platform

Re: Use of option -fprofile-arcs is not compatible with -fprofile-use

Re: Bad code generation on HPPA platform

Re: Bad code generation on HPPA platform

Re: Bad code generation on HPPA platform

Re: Bad code generation on HPPA platform

gcc-4.3-20080508 is now available

RFH: Building and testing gimple-tuples-branch

ssa_name issues

Re: ssa_name issues

Re: RFH: Building and testing gimple-tuples-branch

Questions about attributes

Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?

21 matches

Site Navigation

Mail list logo

Footer information