Feature request: Globalize symbol

2005-02-24 Thread Fredrik Hugosson
Hi!
When working with unit tests I frequently have the need to override a 
function or variable in a shared library. This works just as I want for 
global symbols, but if the symbol is local (declared static) I have to 
modify the source (remove the static using a STATIC preprocessor define) 
to make it work.

The setup is as follows:
app/Makefile
app/src.c
app/checktests/Makefile
app/checktests/tests.c
In the application Makefile I have a target to compile the application 
as a shared library. This target is invoked from the checktests Makefile 
and the lib is then linked with the tests. So I compile the application 
source under test from scratch and can control the flags. (To mess even 
less with the application under test I may change the setup in the 
future to include the application Makefile in a wrapper Makefile instead 
of adding a shared library target to it.)

All this makes it possible to override any global symbols in src.c by 
defining the symbol in the tests.c file. What I miss is the possibility 
to override local symbols in a similar manner, without touching the 
source. This problem could be fixed by adding some options to gcc to 
globalize symbols.

My proposal is the following new options:
-fglobalize-symbol=SYMBOLNAME
-fglobalize-symbols=FILENAME
-fglobalize-all-symbols
The first option makes the symbol SYMBOLNAME global.
The second option makes all symbols in FILENAME global.
The third option makes all symbols global.
The globalization should apply to all symbols that are visible on the 
file scope level but not globally visible. E.g. both functions declared 
'static' and variables declared 'static' (outside functions). The 
attribute '__attribute__ ((hidden))' may be overridden too, but for my 
purposes I don't have the need for this.

Waiting hopefully
/HUGO.


Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Wed, 23 Feb 2005, James E Wilson wrote:
Tarun Kawatra wrote:
During expression hash table construction in gcse pass(gcc vercion 3.4.1), 
expressions like a*b does not get included into the expression hash table. 
Such expressions occur in PARALLEL along with clobbers.
You didn't mention the target, or exactly what the mult looks like.
Target is i386 and the mult instruction looks like the following in RTL
(insn 22 21 23 1 (parallel [
(set (reg/v:SI 62 [ c ])
(mult:SI (reg:SI 66 [ a ])
(reg:SI 67 [ b ])))
(clobber (reg:CC 17 flags))
]) 172 {*mulsi3_1} (nil)
(nil))

However, this isn't hard to answer just by using the source. hash_scan_set 
calls want_to_cse_p calls can_assign_to_reg_p calls added_clobbers_hard_reg_p 
which presumably returns true, which prevents the optimization.  This makes 
sense.  If the pattern clobbers a hard reg, then we can't safely insert it at 
any place in the function.  It might be clobbering the hard reg at a point 
where it holds a useful value.
If that is the reason, then even plus expression (shown below) should not
be subjected to PRE as it also clobbers a hard register(CC). But it is being
subjected to PRE. Multiplication expression while it looks same does not
get even in hash table.
(insn 35 34 36 1 (parallel [
(set (reg/v:SI 74 [ c ])
(plus:SI (reg:SI 78 [ a ])
(reg:SI 79 [ b ])))
(clobber (reg:CC 17 flags))
]) 138 {*addsi_1} (nil)
(nil))
-tarun
While looking at this, I noticed can_assign_to_reg_p does something silly. 
It uses "FIRST_PSEUDO_REGISTER * 2" to try to generate a test pseudo 
register, but this can fail if a target has less than 4 registers, or if the 
set of virtual registers increases in the future. This should probably be 
LAST_VIRTUAL_REGISTER + 1 as used in another recent patch.



Re: C++ math optimization problem...

2005-02-24 Thread Richard Guenther
On Wed, 23 Feb 2005 10:36:07 -0800, Benjamin Redelings I
<[EMAIL PROTECTED]> wrote:
> Hi,
> I have a C++ program that runs slower under 4.0 CVS than 3.4.  So, I 
> am
> trying to make some test-cases that might help deduce the reason.
> However, when I reduced this testcase sufficiently, it began behaving
> badly under BOTH 3.4 and 4.0 but I guess I should start with the
> most reduced case first.
> 
> Basically, the code just does a lot of multiplies and adds.  However,
> if I take the main loop outside of an if-block, it goes 5x faster.
> Also, if I implement an array as 'double*' instead of 'vector'
> it also goes 5x faster.  Using valarray instead of
> vector does not give any improvement.

I'm sure this is an aliasing problem.  The compiler cannot deduce that storing
to result does not affect d.  Otherwise the generated code looks reasonable.
What is interesting though is, that removing the if makes the compiler recognize
that result and d do not alias.  In fact, the alias analysis seems to
be confused
by the scope of d - moving it outside of the if fixes the problem, too.   Maybe
Diego can shed some light on this effect.  The testcase looks like

#include 

const int OUTER = 10;
const int INNER = 1000;

using namespace std;

int main(int argn, char *argv[])
{
  int s = atoi(argv[1]);

  double result;

  {
vector d(INNER); // move outside of this scope to fix

// initialize d
for (int i = 0; i < INNER; i++)
  d[i] = double(1+i) / INNER;

// calc result
result=0;
for (int i = 0; i < OUTER; ++i)
  for (int j = 1; j < INNER; ++j)
result += d[j]*d[j-1] + d[j-1];
  }

  printf("result = %f\n",result);
  return 0;
}


Suggestion: Different exit code for ICE

2005-02-24 Thread Volker Reichelt
Regressions that cause ICE's on invalid code often go unnoticed in the
testsuite, since regular errors and ICE's both match { dg-error "" }.
See for example g++.dg/parse/error16.C which ICE's since yesterday,
but the testsuite still reports "PASS":

  Executing on host: /Work/reichelt/gccbuild/src-4.0/build/gcc/testsuite/../g++ 
-B/Work/reichelt/gccbuild/src-4.0/build/gcc/testsuite/../ 
/Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C  
-nostdinc++ 
-I/home/reichelt/Work/gccbuild/src-4.0/build/i686-pc-linux-gnu/libstdc++-v3/include/i686-pc-linux-gnu
 
-I/home/reichelt/Work/gccbuild/src-4.0/build/i686-pc-linux-gnu/libstdc++-v3/include
 -I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/libsupc++ 
-I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/include/backward 
-I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/testsuite 
-fmessage-length=0   -ansi -pedantic-errors -Wno-long-long  -S  -o error16.s
(timeout = 300)
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
error: redefinition of 'struct A::B'
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:5: 
error: previous definition of 'struct A::B'
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
internal compiler error: tree check: expected class 'type', have 'exceptional' 
(error_mark) in cp_parser_class_specifier, at cp/parser.c:12407
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See http://gcc.gnu.org/bugs.html> for instructions.
  compiler exited with status 1
  output is:
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
error: redefinition of 'struct A::B'
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:5: 
error: previous definition of 'struct A::B'
  /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
internal compiler error: tree check: expected class 'type', have 'exceptional' 
(error_mark) in cp_parser_class_specifier, at cp/parser.c:12407
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See http://gcc.gnu.org/bugs.html> for instructions.

  PASS: g++.dg/parse/error16.C  (test for errors, line 5)
  PASS: g++.dg/parse/error16.C  (test for errors, line 8)
  PASS: g++.dg/parse/error16.C (test for excess errors)

(Btw, Mark, I think the regression was caused by your patch for
PR c++/20152, could you please have a look?)

The method used right now is to not use "" in the last error message,
but that's forgotten too often.

This calls for a more robust method IMHO.
One way would be to make the testsuite smarter and make it recognize
typical ICE patterns itself. This can indeed be done (I for example
use it to monitor the testcases in Bugzilla, Phil borrowed the patterns
for his regression tester).

An easier way IMHO would be to return a different error code when
encountering an ICE. That's only a couple of places in diagnostic.c
and errors.c where we now have "exit (FATAL_EXIT_CODE);".
We could return an (appropriately defined) ICE_ERROR_CODE instead.
The testsuite would then just have to check the return value.

What do you think?

Regards,
Volker




Re: gcse pass: expression hash table

2005-02-24 Thread Steven Bosscher
On Feb 24, 2005 11:13 AM, Tarun Kawatra <[EMAIL PROTECTED]> wrote:
> >> Such expressions occur in PARALLEL along with clobbers.
> >
> > You didn't mention the target, or exactly what the mult looks like.
> 
> Target is i386 and the mult instruction looks like the following in RTL
> 
> (insn 22 21 23 1 (parallel [
>  (set (reg/v:SI 62 [ c ])
>  (mult:SI (reg:SI 66 [ a ])
>  (reg:SI 67 [ b ])))
>  (clobber (reg:CC 17 flags))
>  ]) 172 {*mulsi3_1} (nil)
>  (nil))

Hmm...,

Does GCSE look into stuff in PARALLELs at all?  From gcse.c:

1804:Single sets in a PARALLEL could be handled, but it's an extra 
complication
1805:that isn't dealt with right now.  The trick is handling the CLOBBERs 
that
1806:are also in the PARALLEL.  Later.

IIRC it is one of those things that worked on the cfg-branch or
the rtlopt-branch (and probably on the hammer-branch) but that
never got merged to the mainline.  Honza knows more about it, I
think...

Gr.
Steven



Is 'mfcr' a legal opcode for RS6000 RIOS1?

2005-02-24 Thread Kai Ruottu
In the crossgcc list was a problem with gcc-3.4 generating the opcode
'mfcr' with '-mcpu=power' for the second created multilib, when the
GCC target is 'rs6000-ibm-aix4.3'. The other multilibs produced as
default are for '-pthread', '-mcpu=powerpc' and '-maix64'... The AIX
users could judge if all these are normally required, but when the
builder also used the '--without-threads', the first sounds being vain
or even clashing with something. Building no multilibs using
'--disable-multilib' of course is possible...
But what is the case with the 'mfcr' and POWER ?  Bug in GNU as (the
Linux binutils-2.15.94.0.2.2 was tried) or in GCC (both gcc-3.3.5 and
gcc-3.4.3 were tried) ?


Inlining and estimate_num_insns

2005-02-24 Thread Richard Guenther
Hi!

I'm looking at improving inlining heuristics at the moment,
especially by questioning the estimate_num_insns.  All uses
of that function assume it to return a size cost, not a computation
cost - is that correct?  If so, why do we penaltize f.i. EXACT_DIV_EXPR
compared to MULT_EXPR?

Also, for the simple function

double foo1(double x)
{
return x;
}

we return 4 as a cost, because we have

   double tmp = x;
   return tmp;

and count the move cost (MODIFY_EXPR) twice.  We could fix this
by not walking (i.e. ignoring) RETURN_EXPR.

Also, INSNS_PER_CALL is rather high (10) - what is this choice
based on?  Wouldn't it be better to at least make it proportional
to the argument chain length?  Or even more advanced to the move
cost of the arguments?

Finally, is there a set of testcases that can be used as a metric
on wether improvements are improvements?

Thanks,
Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



Re: Inlining and estimate_num_insns

2005-02-24 Thread Steven Bosscher
On Feb 24, 2005 01:58 PM, Richard Guenther <[EMAIL PROTECTED]> wrote:
> I'm looking at improving inlining heuristics at the moment,
> especially by questioning the estimate_num_insns.

Good.  There is lots of room for improvement there.

>  All uses
> of that function assume it to return a size cost, not a computation
> cost - is that correct?

Yes.

> If so, why do we penaltize f.i. EXACT_DIV_EXPR
> compared to MULT_EXPR?

Dunno.  Because divide usually results in more insns per tree?
 
> Also, for the simple function
> 
> double foo1(double x)
> {
> return x;
> }
> 
> we return 4 as a cost, because we have
> 
>double tmp = x;
>return tmp;
> 
> and count the move cost (MODIFY_EXPR) twice.  We could fix this
> by not walking (i.e. ignoring) RETURN_EXPR.

That would be a good idea if all estimate_num_insns ever sees
is GIMPLE.  Are you sure that is the case (I think it is, but
I'm not sure).

> Also, INSNS_PER_CALL is rather high (10) - what is this choice
> based on?

History.  That's what it was in the old heuristics.

>  Wouldn't it be better to at least make it proportional
> to the argument chain length?  Or even more advanced to the move
> cost of the arguments?

That is what the RTL inliner used to do.  The problem now is that
you don't know what gets passes in a register and what is passed
on the stack.

> Finally, is there a set of testcases that can be used as a metric
> on wether improvements are improvements?

What I did in early 2003 was to add a mini-pass at the start of
rest_of_compilation that just counted the number of real insns
created for the current_function_decl, i.e. something like

int num_insn = 0; rtx insn;
for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
  if (INSN_P (insn))
 num_insn++;

and then compare the result with the estimate of the tree inliner.
The results were quite discouraging at the time, which is why
Honza rewrote the size estimate.  No idea how well or poor we do
today ;-)

Gr.
Steven



Re: Inlining and estimate_num_insns

2005-02-24 Thread Richard Guenther
On Thu, 24 Feb 2005, Steven Bosscher wrote:

> On Feb 24, 2005 01:58 PM, Richard Guenther <[EMAIL PROTECTED]> wrote:
> > I'm looking at improving inlining heuristics at the moment,
> > especially by questioning the estimate_num_insns.
>
> Good.  There is lots of room for improvement there.
>
> >  All uses
> > of that function assume it to return a size cost, not a computation
> > cost - is that correct?
>
> Yes.
>
> > If so, why do we penaltize f.i. EXACT_DIV_EXPR
> > compared to MULT_EXPR?
>
> Dunno.  Because divide usually results in more insns per tree?

Well, I don't know - but ia32 fdiv and fmul are certainly of the
same size ;)  Of course for f.i. ia64 inlined FP divide this is
not true, which asks for target dependent size estimates.  So,
pragmatically we should rather count tree nodes than trying to
second-guess what the target-specific cost is.

> > Also, for the simple function
> >
> > double foo1(double x)
> > {
> > return x;
> > }
> >
> > we return 4 as a cost, because we have
> >
> >double tmp = x;
> >return tmp;
> >
> > and count the move cost (MODIFY_EXPR) twice.  We could fix this
> > by not walking (i.e. ignoring) RETURN_EXPR.
>
> That would be a good idea if all estimate_num_insns ever sees
> is GIMPLE.  Are you sure that is the case (I think it is, but
> I'm not sure).

Also for GENERIC, at least for what the C and C++ frontends are
generating.  What is discouraging at the moment is that we do
not remove the "abstraction penalty" of

inline int foo1(void)
{
return 0;
}
int foo(void)
{
return foo1();
}

currently we have a cost of 2 for foo1 and a cost of 5 for foo with
foo1 inlined.  With the RETURN_EXPR ignore we get to 1 for foo1 and
2 for foo with inlined foo1.  I'll think about how to get that down
to 1.

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



Re: __register_frame_info and unwinding shared libraries

2005-02-24 Thread Andrew Haley
Andrew Haley writes:
 > Jakub Jelinek writes:
 > 
 >  > >  > While I still like using dl_iterate_phdr instead of
 >  > >  > __register_frame_info_bases for totally aesthetic reasons, there
 >  > >  > have been changes made to the dl_iterate_phdr interface since the
 >  > >  > gcc support was written that would allow the dl_iterate_phdr
 >  > >  > results to be cached.
 >  > > 
 >  > > That would be nice.  Also, we could fairly easily build a tree of
 >  > > nodes, one for each loaded object, then we wouldn't be doing a linear
 >  > > search through them.  We could do that lazily, so it wouldn't kick in
 >  > > 'til needed.
 >  > 
 >  > Here is a rough patch for what you can do.
 > 
 > Thanks very much.  I'm working on it.

OK, I've roughed out a very simple patch and it certainly seems to
improve things.

Here's the before:

samples  cum. samples  %cum. % app name symbol name
1796217962 25.8164  25.8164libgcc_s.so.1
_Unwind_IteratePhdrCallback
7019 24981 10.0882  35.9046libc-2.3.3.so
dl_iterate_phdr
6966 31947 10.0121  45.9167libgcc_s.so.1
read_encoded_value_with_base
3756 35703  5.3984  51.3151libgcj.so.6.0.0  GC_mark_from
3643 39346  5.2360  56.5511libgcc_s.so.1
search_object
2032 41378  2.9205  59.4717libgcc_s.so.1
__i686.get_pc_thunk.bx
1555 42933  2.2350  61.7066libgcj.so.6.0.0  
_Jv_MonitorExit
1413 44346  2.0309  63.7375libgcj.so.6.0.0  
_Jv_MonitorEnter
1288 45634  1.8512  65.5887libgcj.so.6.0.0  
java::util::IdentityHashMap::hash(java::lang::Object*)

And here's the after:

samples  cum. samples  %cum. % app name symbol name
7020 7020  14.7674  14.7674libgcc_s.so.1
read_encoded_value_with_base
3808 10828  8.0106  22.7780libgcc_s.so.1
_Unwind_IteratePhdrCallback
3680 14508  7.7413  30.5194libgcj.so.6.0.0  GC_mark_from
3463 17971  7.2849  37.8042libgcc_s.so.1
search_object
1587 19558  3.3385  41.1427libgcj.so.6.0.0  
_Jv_MonitorExit
1577 21135  3.3174  44.4601libc-2.3.3.so
dl_iterate_phdr
1288 22423  2.7095  47.1696libgcj.so.6.0.0  
_Jv_MonitorEnter
1230 23653  2.5875  49.7570libgcj.so.6.0.0  
java::util::IdentityHashMap::hash(java::lang::Object*)

So, the time spent unwinding before was about 50% of the total
runtime, and after about 28%.  I measured the a miss rate of 0.006%
with 27 entries used.

Still, 28% is a heavy overhead.  I think it's because we're doing a
great deal of class lookups, and that does a stack trace as a security
check.  I'll look at caching secirity contexts in libgcj.

Andrew.





Benchmark of gcc 4.0

2005-02-24 Thread Biagio Lucini
I run for my personal pleasure (since I am a number cruncher) the
Scimark2 tests on my P4 Linux machine. I tested GCC 4.0 (today's CVS) vs. GCC 
3.4.1 vs. Intel's ICC 8.1

For GCC, I used in both cases the flags
-march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math

Should be of some interest, for ICC I used
-ipo -tpp7 -xW -align -Zp16 -O3

The results were surprisingly bad, and this is why I am writing this message:


GCC 4.0 GCC 3.4.1   ICC 
Composite Score:   270.51   345.28  430.47
FFT  Mflops:   192.10   203.77  206.66
SOR Mflops:257.61   252.88  258.30
MC   Mflops:  58.61   67.96 312.13  
matmultMflops:376.64557.75  564.97
LUMflops:467.58 644.03  810.29

I leave aside any personal comments, except that being involved in Monte Carlo 
calculations, I would love if GCC were not outperformed by a factor of ~ 4.5 
in MC by ICC. 

I also would like to ask whether you see anything wrong with those benchmarks 
and/or you have suggestions to improve them.

Thanks,
Biagio
-- 
=

Biagio Lucini 
Institut Fuer Theoretische Physik
ETH Hoenggerberg  
CH-8093 Zuerich - Switzerland   
Tel. +41 (0)1 6332562  
 
=


Re: Is 'mfcr' a legal opcode for RS6000 RIOS1?

2005-02-24 Thread David Edelsohn
> Kai Ruottu writes:

Kai> In the crossgcc list was a problem with gcc-3.4 generating the opcode
Kai> 'mfcr' with '-mcpu=power' for the second created multilib, when the
Kai> GCC target is 'rs6000-ibm-aix4.3'. The other multilibs produced as
Kai> default are for '-pthread', '-mcpu=powerpc' and '-maix64'... The AIX
Kai> users could judge if all these are normally required, but when the
Kai> builder also used the '--without-threads', the first sounds being vain
Kai> or even clashing with something. Building no multilibs using
Kai> '--disable-multilib' of course is possible...

Kai> But what is the case with the 'mfcr' and POWER ?  Bug in GNU as (the
Kai> Linux binutils-2.15.94.0.2.2 was tried) or in GCC (both gcc-3.3.5 and
Kai> gcc-3.4.3 were tried) ?

First, AIX assembler is recommended on AIX.  This is mentioned in
the platform-specific installation information.  The use of GNU Assembler
on AIX probably is the source of your problems.

The mfcr instruction has existed since the original POWER
architecture.  It always is valid.

The instruction was updated in POWER4 and later chips to accept an
optional operand to specify which field to move.  That variant only is
enabled for processors that support the instruction.  The variant is not
enabled for -mcpu=power.

David


Bug in tree-inline.c:estimate_num_insns_1?

2005-02-24 Thread Richard Guenther
Hi!

In estimate_num_insns_1 we currently have:

/* Recognize assignments of large structures and constructors of
   big arrays.  */
case INIT_EXPR:
case MODIFY_EXPR:
  x = TREE_OPERAND (x, 0);
  /* FALLTHRU */
case TARGET_EXPR:
case CONSTRUCTOR:
  {
HOST_WIDE_INT size;
...

shouldn't TARGET_EXPR being moved up before x = TREE_OPERAND (x, 0); ?

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



Re: Inlining and estimate_num_insns

2005-02-24 Thread Jan Hubicka
> Hi!
> 
> I'm looking at improving inlining heuristics at the moment,
> especially by questioning the estimate_num_insns.  All uses
> of that function assume it to return a size cost, not a computation
> cost - is that correct?  If so, why do we penaltize f.i. EXACT_DIV_EXPR
> compared to MULT_EXPR?

Well, not really.  At least for inlining the idea of cost is mixed - if
the function is either slow or big, inlining is not good idea.

For post inline on CFG world, I plan to deambiguize these, but in
current implementation both quantities seemed so raw, that doing
something more precise on them didn't seem to make much sense.

But I have the patch for separating code/size computations for
tree-profiling around on my notebook (I believe), so I can pass you one
in the case you want to help with tunning this.
We can do pretty close esitmation there since we can build estimated
profile so we know that functions with loop takes longer, while tree
like functions can be fast even if they are big.
> 
> Also, for the simple function
> 
> double foo1(double x)
> {
> return x;
> }
> 
> we return 4 as a cost, because we have
> 
>double tmp = x;
>return tmp;
> 
> and count the move cost (MODIFY_EXPR) twice.  We could fix this
> by not walking (i.e. ignoring) RETURN_EXPR.

That would work, yes.  I was also thinking about ignoring MODIFY_EXPR
for var = var as those likely gets propagated later.
> 
> Also, INSNS_PER_CALL is rather high (10) - what is this choice
> based on?  Wouldn't it be better to at least make it proportional
> to the argument chain length?  Or even more advanced to the move
> cost of the arguments?

Probably.  The choice of constant is completely arbitrary.  It is not
too high cycle count wise (at least Athlon spends over 10 cycles per
each call), but I never experimented with different values of this.

There are two copies of this constant (I believe), one in tree-inline,
other in cgraphunit that needs to be in sync.  I have to cleanup this.
> 
> Finally, is there a set of testcases that can be used as a metric
> on wether improvements are improvements?

This is major problem here - I use combination of spec (for C
benchmarks), Gerald's applicaqtion and tramp3d, but all of these have
very different behaviour and thus they hardly cover "common cases".
If someone can come up with some more resonable testing method, I would
be very happy - so far I simply test on all those and when results seems
to be win in all three tests (or at least no loss), I apply them.

Honza
> 
> Thanks,
> Richard.
> 
> --
> Richard Guenther 
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


Re: Bug in tree-inline.c:estimate_num_insns_1?

2005-02-24 Thread Andrew Pinski
On Feb 24, 2005, at 10:07 AM, Richard Guenther wrote:
Hi!
In estimate_num_insns_1 we currently have:
/* Recognize assignments of large structures and constructors of
   big arrays.  */
case INIT_EXPR:
case MODIFY_EXPR:
  x = TREE_OPERAND (x, 0);
  /* FALLTHRU */
case TARGET_EXPR:
case CONSTRUCTOR:
  {
HOST_WIDE_INT size;
...
shouldn't TARGET_EXPR being moved up before x = TREE_OPERAND (x, 0); ?
TARGET_EXPR is not in gimple at all so really does not matter.
-- Pinski


Re: Bug in tree-inline.c:estimate_num_insns_1?

2005-02-24 Thread Richard Guenther
On Thu, 24 Feb 2005 10:13:11 -0500, Andrew Pinski
<[EMAIL PROTECTED]> wrote:
> 
> On Feb 24, 2005, at 10:07 AM, Richard Guenther wrote:
> 
> > Hi!
> >
> > In estimate_num_insns_1 we currently have:
> >
> > /* Recognize assignments of large structures and constructors of
> >big arrays.  */
> > case INIT_EXPR:
> > case MODIFY_EXPR:
> >   x = TREE_OPERAND (x, 0);
> >   /* FALLTHRU */
> > case TARGET_EXPR:
> > case CONSTRUCTOR:
> >   {
> > HOST_WIDE_INT size;
> >   ...
> >
> > shouldn't TARGET_EXPR being moved up before x = TREE_OPERAND (x, 0); ?
> 
> TARGET_EXPR is not in gimple at all so really does not matter.

Then how do I get bitten by this?  I guess cgraph gets feeded GENERIC.

Richard.


Re: Benchmark of gcc 4.0

2005-02-24 Thread Vladimir Makarov
Biagio Lucini wrote:
I run for my personal pleasure (since I am a number cruncher) the
Scimark2 tests on my P4 Linux machine. I tested GCC 4.0 (today's CVS) vs. GCC 
3.4.1 vs. Intel's ICC 8.1

For GCC, I used in both cases the flags
-march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math
Should be of some interest, for ICC I used
-ipo -tpp7 -xW -align -Zp16 -O3
The results were surprisingly bad, and this is why I am writing this message:
GCC 4.0 GCC 3.4.1   ICC 
Composite Score:   270.51   345.28  430.47
FFT  Mflops:   192.10   203.77  206.66
SOR Mflops:257.61   252.88  258.30
MC   Mflops:  58.61   67.96 312.13  
matmultMflops:376.64557.75  564.97
LUMflops:467.58 644.03  810.29
I leave aside any personal comments, except that being involved in Monte Carlo 
calculations, I would love if GCC were not outperformed by a factor of ~ 4.5 
in MC by ICC. 

I also would like to ask whether you see anything wrong with those benchmarks 
and/or you have suggestions to improve them.

 

Thanks for reporting this.  Although it would be more usefull if you 
made some analysis what is wrong with gcc.  For example, icc reports 
loop vectorization.  Or may be it is  a memory heirarchy optimization, 
or usage of better standard function like random function (I am not 
familiar well with MC).  Usually vectorization is a reason of such big 
difference.  People in gcc community work on vectorization.  Although I 
don't know when it will be used for x86.  We have no such resources as 
Intel has (several hundred engineers mainly working on optimizations 
only for 3 their architectures).

As for gcc4 vs gcc3.4,  degradataion on x86 architecture is most 
probably because of higher register pressure created with more 
aggressive SSA optimizations in gcc4.  The current  register allocator 
does not deal well with such problem.  So code generated by gcc4 can be 
worse for architectures with few registers.  For architectures with many 
registers (like ia64), gcc4 generates a better code than gcc3.4.  Again 
gcc community works on the register allocator problem too.

Vlad



Change in treelang maintainership

2005-02-24 Thread Gerald Pfeifer
It is my pleasure to announce that the steering committee has appointed 
James A. Morrison maintainer of our treelang frontend; Jim has been 
working on that for some time now.

We'd also like to take the opportunity and thank Tim Josling for the
time and effort he has spent on this frontend.


Please adjust the MAINTAINERS file accordingly, Jim.  Happy hacking!

Gerald


Re: Benchmark of gcc 4.0

2005-02-24 Thread Paolo Bonzini
For GCC, I used in both cases the flags
-march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math
>
As for gcc4 vs gcc3.4,  degradataion on x86 architecture is most 
probably because of higher register pressure created with more 
aggressive SSA optimizations in gcc4.
Try these five combinations:
-O2 -fomit-frame-pointer -ffast-math
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse
-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse
You may also want to try -mfpmath=sse,387 in case your benchmarks use 
sin, cos and other trascendental functions that GCC knows about when 
using 387 instructions.

Paolo


Re: Benchmark of gcc 4.0

2005-02-24 Thread Richard Guenther
I just got interested and did a test myself.  Comparing gcc 4.0 (-O2
-funroll-loops -D__NO_MATH_INLINES -ffast-math -march=pentium4
-mfpmath=sse -ftree-vectorize)
and icc 9.0 beta (-O3 -xW -ip):
  gcc 4.0 icc 9.0
Composite Score:  543.65609.20
FFT Mflops:   313.71   318.29
SOR Mflops:   441.96  426.32
MonteCarlo: Mflops:   105.68 71.20
Sparse matmult  Mflops:   574.88   891.65
LU  Mflops:  1282.00  1338.56

which looks not too bad ;)

Richard.


Re: Benchmark of gcc 4.0

2005-02-24 Thread Biagio Lucini
On Thursday 24 February 2005 16.52, Paolo Bonzini wrote:
>
> Try these five combinations:
>
[...]
>
> -O3 -fomit-frame-pointer -ffast-math -fno-tree-pre

[...]

This + 387 math is the one with the larger impact: it rises MC to around 80, 
but composite is still 279 (vs. ~ 345 for GCC 3.4). I will test on amd64, 
just to see whether there is any difference.

Thanks,
Biagio

-- 
=

Biagio Lucini 
Institut Fuer Theoretische Physik
ETH Hoenggerberg  
CH-8093 Zuerich - Switzerland   
Tel. +41 (0)1 6332562  
 
=


Re: Suggestion: Different exit code for ICE

2005-02-24 Thread Sam Lauber

> Regressions that cause ICE's on invalid code often go unnoticed in the
> testsuite, since regular errors and ICE's both match { dg-error "" }.
> See for example g++.dg/parse/error16.C which ICE's since yesterday,
> but the testsuite still reports "PASS":
> 
>Executing on host: 
> /Work/reichelt/gccbuild/src-4.0/build/gcc/testsuite/../g++ 
> -B/Work/reichelt/gccbuild/src-4.0/build/gcc/testsuite/../ 
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C  
> -nostdinc++ 
> -I/home/reichelt/Work/gccbuild/src-4.0/build/i686-pc-linux-gnu/libstdc++-v3/include/i686-pc-linux-gnu
>  
> -I/home/reichelt/Work/gccbuild/src-4.0/build/i686-pc-linux-gnu/libstdc++-v3/include
>  -I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/libsupc++ 
> -I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/include/backward 
> -I/home/reichelt/Work/gccbuild/src-4.0/gcc/libstdc++-v3/testsuite 
> -fmessage-length=0   -ansi -pedantic-errors -Wno-long-long  -S  -o error16.s  
>   (timeout = 
> 300)
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
> error: redefinition of 'struct A::B'
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:5: 
> error: previous definition of 'struct A::B'
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
> internal compiler error: tree check: expected class 'type', have 
> 'exceptional' (error_mark) in cp_parser_class_specifier, at 
> cp/parser.c:12407
>Please submit a full bug report,
>with preprocessed source if appropriate.
>See http://gcc.gnu.org/bugs.html> for instructions.
>compiler exited with status 1
>output is:
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
> error: redefinition of 'struct A::B'
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:5: 
> error: previous definition of 'struct A::B'
>
> /Work/reichelt/gccbuild/src-4.0/gcc/gcc/testsuite/g++.dg/parse/error16.C:8: 
> internal compiler error: tree check: expected class 'type', have 
> 'exceptional' (error_mark) in cp_parser_class_specifier, at 
> cp/parser.c:12407
>Please submit a full bug report,
>with preprocessed source if appropriate.
>See http://gcc.gnu.org/bugs.html> for instructions.
> 
>PASS: g++.dg/parse/error16.C  (test for errors, line 5)
>PASS: g++.dg/parse/error16.C  (test for errors, line 8)
>PASS: g++.dg/parse/error16.C (test for excess errors)
> 
> (Btw, Mark, I think the regression was caused by your patch for
> PR c++/20152, could you please have a look?)
> 
> The method used right now is to not use "" in the last error message,
> but that's forgotten too often.
> 
> This calls for a more robust method IMHO.
> One way would be to make the testsuite smarter and make it recognize
> typical ICE patterns itself. This can indeed be done (I for example
> use it to monitor the testcases in Bugzilla, Phil borrowed the patterns
> for his regression tester).
> 
> An easier way IMHO would be to return a different error code when
> encountering an ICE. That's only a couple of places in diagnostic.c
> and errors.c where we now have "exit (FATAL_EXIT_CODE);".
> We could return an (appropriately defined) ICE_ERROR_CODE instead.
> The testsuite would then just have to check the return value.
> 
> What do you think?
That would certantly be a Good Thing.  As far as I know, 
regular errors return exit code 1.  I have two suggestions 
on that:

 a) use a testsuite that supports regexps and match a 1 
exit code agianst

/^Please submit a full bug report/

 b) make it return a diffrent exit code (say -127 or even 
2 ;-).  

 c) make a seperate function for ICEs and make _that_ 
return exit code indicating ICE.  There would be a 
disadvantage: as with any code moving, there would still be 
some code that didn't call that function

Samuel Lauber
-- 
_
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.

Powered by Outblaze


Re: Benchmark of gcc 4.0

2005-02-24 Thread Richard Guenther
On Thu, 24 Feb 2005 17:09:46 +0100, Biagio Lucini <[EMAIL PROTECTED]> wrote:
> On Thursday 24 February 2005 16.52, Paolo Bonzini wrote:
> >
> > Try these five combinations:
> >
> [...]
> >
> > -O3 -fomit-frame-pointer -ffast-math -fno-tree-pre
> 
> [...]
> 
> This + 387 math is the one with the larger impact: it rises MC to around 80,
> but composite is still 279 (vs. ~ 345 for GCC 3.4). I will test on amd64,
> just to see whether there is any difference.

I think the Intel compiler with -iop will inline Random_nextDouble which should
explain the difference you see.  The best options for gcc I found were compiling
and linking via
  gcc-4.0 -O3 -funroll-loops -D__NO_MATH_INLINES -ffast-math
-march=pentium4 -mfpmath=sse -ftree-vectorize -onestep -o scimark2
scimark2.c FFT.c kernel.c Stopwatch.c Random.c SOR.c SparseCompRow.c
array.c MonteCarlo.c LU.c -lm -fomit-frame-pointer -finline-functions

Note that gcc with -onestep still cannot inline over unit-boundaries.

Richard.


Re: Benchmark of gcc 4.0

2005-02-24 Thread Biagio Lucini
On Thursday 24 February 2005 17.06, Richard Guenther wrote:
> I just got interested and did a test myself.  Comparing gcc 4.0 (-O2
> -funroll-loops -D__NO_MATH_INLINES -ffast-math -march=pentium4
> -mfpmath=sse -ftree-vectorize)
> and icc 9.0 beta (-O3 -xW -ip):
>   gcc 4.0 icc 9.0
> Composite Score:  543.65609.20
> FFT Mflops:   313.71   318.29
> SOR Mflops:   441.96  426.32
> MonteCarlo: Mflops:   105.68 71.20
> Sparse matmult  Mflops:   574.88   891.65
> LU  Mflops:  1282.00  1338.56
>
> which looks not too bad ;)
>
> Richard.

Hi Richard,

thanks a lot for your test. I have redone it, the way you suggest, and 
I do 
find:

  GCC4.0ICC 8.1   GCC 3.4.1
Composite Score:  330.18384.53361.55
FFT  Mflops:  206.66193.80206.66
SOR Mflops:  264.91 398.13253.55
MC   Mflops:63.91 61.29  67.45
Sparse matmult :   348.60   436.91469.79
LU  Mflops:767.04   832.52810.29

I would leave aside ICC 8.1 because (as I have showed in my previous message) 
I can choose other flags and get a speed rise of about 50%. I would
take your optimisation flags for GCC better than mine, since they increase the 
composite score of both (which is what matters to me). Even so, there is at 
least one place where - if I can say that - we have a regression.

Ready to test again,
Biagio


-- 
=

Biagio Lucini 
Institut Fuer Theoretische Physik
ETH Hoenggerberg  
CH-8093 Zuerich - Switzerland   
Tel. +41 (0)1 6332562  
 
=


Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Wed, 23 Feb 2005, James E Wilson wrote:
Tarun Kawatra wrote:
During expression hash table construction in gcse pass(gcc vercion 3.4.1), 
expressions like a*b does not get included into the expression hash table. 
Such expressions occur in PARALLEL along with clobbers.
You didn't mention the target, or exactly what the mult looks like.
However, this isn't hard to answer just by using the source. hash_scan_set 
calls want_to_cse_p calls can_assign_to_reg_p calls added_clobbers_hard_reg_p 
which presumably returns true, which prevents the optimization.  This makes 
sense.  If the pattern clobbers a hard reg, then we can't safely insert it at 
any place in the function.  It might be clobbering the hard reg at a point 
where it holds a useful value.

While looking at this, I noticed can_assign_to_reg_p does something silly.
^^^
			I could not find this function anywhere in gcc 
3.4.1 source. Although FIRST_PSEUDO_REGISTER * 2 is being used in 
make_insn_raw in want_to_gcse_p directly as follows

if (test_insn == 0)
{
 test_insn = make_insn_raw (gen_rtx_SET (VOIDmode, gen_rtx_REG (word_mode,
FIRST_PSEUDO_REGISTER * 2), const0_rtx));
 NEXT_INSN (test_insn) = PREV_INSN (test_insn) = 0;
 }
It uses "FIRST_PSEUDO_REGISTER * 2" to try to generate a test pseudo 
register, but this can fail if a target has less than 4 registers, or if the 
set of virtual registers increases in the future. This should probably be 
LAST_VIRTUAL_REGISTER + 1 as used in another recent patch.
I could not get this point.
-tarun


Re: Suggestion: Different exit code for ICE

2005-02-24 Thread Janis Johnson
On Thu, Feb 24, 2005 at 11:46:20AM +0100, Volker Reichelt wrote:
> Regressions that cause ICE's on invalid code often go unnoticed in the
> testsuite, since regular errors and ICE's both match { dg-error "" }.
> See for example g++.dg/parse/error16.C which ICE's since yesterday,
> but the testsuite still reports "PASS":
> 
[snip]
> 
> This calls for a more robust method IMHO.
> One way would be to make the testsuite smarter and make it recognize
> typical ICE patterns itself. This can indeed be done (I for example
> use it to monitor the testcases in Bugzilla, Phil borrowed the patterns
> for his regression tester).
> 
> An easier way IMHO would be to return a different error code when
> encountering an ICE. That's only a couple of places in diagnostic.c
> and errors.c where we now have "exit (FATAL_EXIT_CODE);".
> We could return an (appropriately defined) ICE_ERROR_CODE instead.
> The testsuite would then just have to check the return value.
> 
> What do you think?

I don't think that it's appropriate for any test to use { dg-error "" };
there should always be some substring of the expected message there.  If
the message changes then tests need to be updated, but that's better
than not noticing when the message changes unexpectedly or, worse yet,
the message is for an ICE.  A quick count, however, shows that 1022
tests use { dg-error "" }.  Given that, using and detecting a different
error code for an ICE is an excellent idea.

Janis


Seeking patch for bug in lifetime of __cur in deque::_M_fill_initialize (powerpc dw2 EH gcc 3.4.2)

2005-02-24 Thread Earl Chew
Is there a patch for the following problem?
I am having problems with _M_fill_initialize in deque on the powerpc
version compiled at -O2.
  template 
void
deque<_Tp,_Alloc>::
_M_fill_initialize(const value_type& __value)
{
  _Map_pointer __cur;
  try
{
  for (__cur = this->_M_impl._M_start._M_node;
	   __cur < this->_M_impl._M_finish._M_node;
	   ++__cur)
std::uninitialized_fill(*__cur, *__cur + _S_buffer_size(), 
__value);
/ HERE /
  std::uninitialized_fill(this->_M_impl._M_finish._M_first,
  this->_M_impl._M_finish._M_cur,
  __value);
}
  catch(...)
{
  std::_Destroy(this->_M_impl._M_start, iterator(*__cur, __cur));
  __throw_exception_again;
}
}

The test code is reproduced below. The assembler output of the
salient part of _M_fill_initialize is:
.L222:
lwz 3,0(31)
mr 5,30
addi 6,1,8
addi 4,3,512
.LEHB3:
bl 
_ZSt24__uninitialized_fill_auxIP9TestClassS0_EvT_S2_RKT0_12__false_ty
pe
lwz 0,36(29)
addi 31,31,4
cmplw 7,0,31
bgt+ 7,.L222
li 31,0
.L240:

Examining the output .L240 corresponds to /*** HERE ***/.
When the for() loop terminates, it appears __cur is in r31 and is
zapped with 0. I suspect the optimizer has marked __cur as dead
at this point.
An exception caught whilst executing the 2nd unitialized_fill()
attempts to use __cur, but is thwarted because the value has been
lost.
I'm working with a port of dw2 based EH for powerpc VxWorks on
gcc 3.4.2. The compiler builds and is working. I have ported the EH
test from STLport, and most of the tests run.
(BTW the ported EH tests run to completion on cygwin.)
Earl
-
bash-2.05b$ /gnu/local/bin/powerpc-wrs-vxworks-g++ -v -S -O2 bug.cpp
Reading specs from /gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/specs
Configured with: ../gcc-3.4.2/configure --target=powerpc-wrs-vxworks 
--disable-libstdcxx-pch --disable-shared --with-included-gettext 
--with-gnu-as --with-gnu-ld --with-ld=powerpc-wrs-vxworks-ld 
--with-as=powerpc-wrs-vxworks-as --exec-prefix=/gnu/local 
--prefix=/gnu/local --enable-languages=c,c++
Thread model: vxworks
gcc version 3.4.2
 /gnu/local/libexec/gcc/powerpc-wrs-vxworks/3.4.2/cc1plus.exe -quiet -v 
-DCPU_FAMILY=PPC -D__ppc -D__EABI__ -DCPU=PPC604 -D__hardfp bug.cpp 
-mcpu=604 -mstrict-align -quiet -dumpbase bug.cpp -auxbase bug -O2 
-version -o bug.s
ignoring nonexistent directory 
"/gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/../
../../../powerpc-wrs-vxworks/sys-include"
ignoring nonexistent directory "*CYGWIN1512PATH"
#include "..." search starts here:
#include <...> search starts here:
 /gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/../../../../include/c++/3.4.2

/gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/../../../../include/c++/3.4.2/powe
rpc-wrs-vxworks
/gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/../../../../include/c++/3.4.2/back
ward
 /gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/include
/gnu/local/lib/gcc/powerpc-wrs-vxworks/3.4.2/../../../../powerpc-wrs-vxworks/in
clude
End of search list.
GNU C++ version 3.4.2 (powerpc-wrs-vxworks)
compiled by GNU C version 3.4.1 (cygming special).
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
bug.cpp:6: warning: inline function `TestClass::TestClass()' used but 
never defi
ned
bug.cpp:9: warning: inline function `TestClass::~TestClass()' used but 
never def
ined
bug.cpp:8: warning: inline function `TestClass::TestClass(const 
TestClass&)' use
d but never defined
bug.cpp:11: warning: inline function `TestClass& 
TestClass::operator=(const Test
Class&)' used but never defined
-
#include 

class TestClass
{
public:
inline TestClass();
inline TestClass( int value );
inline TestClass( const TestClass& rhs );
inline ~TestClass();
inline TestClass& operator=( const TestClass& rhs );
inline int value() const;

inline TestClass operator!() const;
bool operator==( const TestClass& rhs ) const;
bool operator<( const TestClass& rhs ) const;
protected:
static inline unsigned int get_random(unsigned range = UINT_MAX);
private:
inline void Init( int value );
};
template class std::deque;



Re: Suggestion: Different exit code for ICE

2005-02-24 Thread Mark Mitchell
Janis Johnson wrote:
On Thu, Feb 24, 2005 at 11:46:20AM +0100, Volker Reichelt wrote:
Regressions that cause ICE's on invalid code often go unnoticed in the
testsuite, since regular errors and ICE's both match { dg-error "" }.
See for example g++.dg/parse/error16.C which ICE's since yesterday,
but the testsuite still reports "PASS":
[snip]
This calls for a more robust method IMHO.
One way would be to make the testsuite smarter and make it recognize
typical ICE patterns itself. This can indeed be done (I for example
use it to monitor the testcases in Bugzilla, Phil borrowed the patterns
for his regression tester).
An easier way IMHO would be to return a different error code when
encountering an ICE. That's only a couple of places in diagnostic.c
and errors.c where we now have "exit (FATAL_EXIT_CODE);".
We could return an (appropriately defined) ICE_ERROR_CODE instead.
The testsuite would then just have to check the return value.
What do you think?

I don't think that it's appropriate for any test to use { dg-error "" };
I actually disagree; I think that sometimes it's important to know that 
there's some kind of diagnostic, but trying to match the wording seems 
like overkill to me.  I don't feel that strongly about it, but I don't 
see anything wrong with the empty string.

the message is for an ICE.  A quick count, however, shows that 1022
tests use { dg-error "" }.  Given that, using and detecting a different
error code for an ICE is an excellent idea.
I definitely agree.  I think that would be great.
--
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304


Re: [wwwdocs] CVS annotate brings me to GNATS

2005-02-24 Thread Gerald Pfeifer
On Sat, 11 Dec 2004, Gerald Pfeifer wrote:
>>> http://gcc.gnu.org/cgi-bin/cvsweb.cgi/old-gcc/PROBLEMS?annotate=1.1
>> The thing matching "PR" must be a little overzealous :)
> Yup.  I think I know how to fix this and hope to do it in the next few
> days (after some other technical issues have been clarified).

Fixed now, with the following change to httpd.conf. (I believe we actually 
might be able to remove the Rewrite... stuff.)

Sorry for the delay, various things happened in between...

Gerald

   # Support short URLs for referring to PRs.
   RewriteCond %{QUERY_STRING}  ([0-9]+)$
-  RewriteRule PR   
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=%1 [R]
+  RewriteRule ^PR  
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=%1 [R]

   RedirectMatch ^/PR([0-9]+)$  
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=$1
   include /etc/httpd/conf/spamblock



Quick 4.0 status update

2005-02-24 Thread Mark Mitchell
Those of you who read my status reports closely will recognize that 
today is the day I announced as the day on which I would create the 4.0 
release branch.  I still plan to do that sometime today, where "today" 
is generously defined as "before I go to sleep tonight here in California".

I've received a lot of good proposals for 4.1, and am working on 
ordering them as best I can.  I'll be posting that information later 
today -- before I create the branch.

FYI,
--
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304


Re: Inlining and estimate_num_insns

2005-02-24 Thread Richard Guenther
Jan Hubicka wrote:
Also, for the simple function
double foo1(double x)
{
   return x;
}
we return 4 as a cost, because we have
  double tmp = x;
  return tmp;
and count the move cost (MODIFY_EXPR) twice.  We could fix this
by not walking (i.e. ignoring) RETURN_EXPR.

That would work, yes.  I was also thinking about ignoring MODIFY_EXPR
for var = var as those likely gets propagated later.
This looks like a good idea.  In fact going even further and ignoring
all assigns to DECL_IGNORED_P allows us to have the same size estimates
for all functions down the inlining chain for
int foo(int x) { return x; }
int foo1(int x) { return foo(x); }
...
and for the equivalent with int foo(void) { return 0; }
and all related functions.  Which is what we want.  Of course ignoring
all stores to artificial variables may have other bad side-effects.
This results in a tramp3d-v3 performance increase from 1m56s to 27s 
(leafify brings us down to 23.5s).

Note that we still assign reasonable cost to memory stores:
inline void foo(double *x)
{
*x = 1.0;
}
has a cost of 2, and
double y;
void bar(void)
{
foo(&y);
}
too, if foo is inlined.  Nice.  Patch attached (with some unrelated 
stuff that just is cleanup) for you to play.

Any thoughts on this radical approach?  A testcase that could be 
pessimized by this?  Of course default inlining limits would need to be 
adjusted if we do this.

Richard.
Index: cgraphunit.c
===
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.93
diff -c -3 -p -r1.93 cgraphunit.c
*** cgraphunit.c21 Feb 2005 14:39:46 -  1.93
--- cgraphunit.c24 Feb 2005 19:04:18 -
*** Software Foundation, 59 Temple Place - S
*** 190,197 
  #include "function.h"
  #include "tree-gimple.h"
  
- #define INSNS_PER_CALL 10
- 
  static void cgraph_expand_all_functions (void);
  static void cgraph_mark_functions_to_output (void);
  static void cgraph_expand_function (struct cgraph_node *);
--- 190,195 
Index: tree-inline.h
===
RCS file: /cvs/gcc/gcc/gcc/tree-inline.h,v
retrieving revision 1.14
diff -c -3 -p -r1.14 tree-inline.h
*** tree-inline.h   8 Nov 2004 22:40:09 -   1.14
--- tree-inline.h   24 Feb 2005 19:04:18 -
*** bool tree_inlinable_function_p (tree);
*** 29,34 
--- 29,35 
  tree copy_tree_r (tree *, int *, void *);
  void clone_body (tree, tree, void *);
  tree save_body (tree, tree *, tree *);
+ int estimate_move_cost (tree type);
  int estimate_num_insns (tree expr);
  
  /* 0 if we should not perform inlining.
*** int estimate_num_insns (tree expr);
*** 38,41 
--- 39,47 
  
  extern int flag_inline_trees;
  
+ /* Instructions per call.  Used in estimate_num_insns and in the
+inliner to account for removed calls.  */
+ 
+ #define INSNS_PER_CALL 10
+ 
  #endif /* GCC_TREE_INLINE_H */
Index: tree-inline.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-inline.c,v
retrieving revision 1.170
diff -c -3 -p -r1.170 tree-inline.c
*** tree-inline.c   27 Jan 2005 14:36:17 -  1.170
--- tree-inline.c   24 Feb 2005 19:04:19 -
*** inlinable_function_p (tree fn)
*** 1165,1170 
--- 1165,1189 
return inlinable;
  }
  
+ /* Estimate the number of instructions needed for a move of
+the specified type.  */
+ 
+ int
+ estimate_move_cost (tree type)
+ {
+   HOST_WIDE_INT size;
+ 
+   if (VOID_TYPE_P (type))
+ return 0;
+ 
+   size = int_size_in_bytes (type);
+ 
+   if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO)
+ return INSNS_PER_CALL;
+   else
+ return ((size + MOVE_MAX_PIECES - 1) / MOVE_MAX_PIECES);
+ }
+ 
  /* Used by estimate_num_insns.  Estimate number of instructions seen
 by given statement.  */
  
*** estimate_num_insns_1 (tree *tp, int *wal
*** 1245,1266 
  
  /* Recognize assignments of large structures and constructors of
 big arrays.  */
- case INIT_EXPR:
  case MODIFY_EXPR:
x = TREE_OPERAND (x, 0);
/* FALLTHRU */
- case TARGET_EXPR:
  case CONSTRUCTOR:
!   {
!   HOST_WIDE_INT size;
! 
!   size = int_size_in_bytes (TREE_TYPE (x));
! 
!   if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO)
! *count += 10;
!   else
! *count += ((size + MOVE_MAX_PIECES - 1) / MOVE_MAX_PIECES);
!   }
break;
  
/* Assign cost of 1 to usual operations.
--- 1264,1278 
  
  /* Recognize assignments of large structures and constructors of
 big arrays.  */
  case MODIFY_EXPR:
+   if (DECL_P (TREE_OPERAND (x, 0)) && DECL_IGNORED_P (TREE_OPERAND (x, 
0)))
+   break;
+ case INIT_EXPR:
+ case TARGET_EXPR:
x = TREE_OPERAND (x, 0);
/* FALLTHRU */
  case CONSTRUCTOR:
!   *count += estimate_move_cost

Re: Inlining and estimate_num_insns

2005-02-24 Thread Richard Guenther
On Thu, 24 Feb 2005 20:05:37 +0100, Richard Guenther
<[EMAIL PROTECTED]> wrote:
> Jan Hubicka wrote:
> 
> >>Also, for the simple function
> >>
> >>double foo1(double x)
> >>{
> >>return x;
> >>}
> >>
> >>we return 4 as a cost, because we have
> >>
> >>   double tmp = x;
> >>   return tmp;
> >>
> >>and count the move cost (MODIFY_EXPR) twice.  We could fix this
> >>by not walking (i.e. ignoring) RETURN_EXPR.
> >
> >
> > That would work, yes.  I was also thinking about ignoring MODIFY_EXPR
> > for var = var as those likely gets propagated later.
> 
> This looks like a good idea.  In fact going even further and ignoring
> all assigns to DECL_IGNORED_P allows us to have the same size estimates
> for all functions down the inlining chain for

Note that this behavior also more closely matches the counting of gcc 3.4
that has a cost of zero for
  inline int foo(void) { return 0; }
and a cost of one for
  int bar(void) { return foo(); }
while with the patch we have zero for foo and zero for bar.

For
  inline void foo(double *x) { *x = 1.0; }
  double y; void bar(void) { foo(&y); }
3.4 has 3 and 5 after inlining, with the patch we get 2 and 2.

For
  inline double foo(double x) { return x*x; }
  inline double foo1(double x) { return foo(x); }
  double foo2(double x) { return foo1(x); }
3.4 has 1, 2 and 3, with the patch we get 1, 1 and 1.

For a random collection of C files out of scimark2 we get
   3.4  4.0 4.0 patched
SOR54, 10125, 26   63, 14
FFT 44, 11, 200, 59   65, 10, 406, 11151, 10, 243, 71

so apart from a constant factor 4.0 patched goes back to 3.4
behavior (at least it doesn't show weird numbers).  Given that
we didn't change inlining limits between 3.4 and 4.0 that
looks better anyway.  And of course the testcases above show
we are better in removing abstraction penalty.

Richard.


Re: gcse pass: expression hash table

2005-02-24 Thread James E Wilson
On Thu, 2005-02-24 at 02:13, Tarun Kawatra wrote:
> If that is the reason, then even plus expression (shown below) should not
> be subjected to PRE as it also clobbers a hard register(CC). But it is being
> subjected to PRE. Multiplication expression while it looks same does not
> get even in hash table.

My assumption here was that if I gave you a few pointers, you would try
to debug the problem yourself.  If you want someone else to debug it for
you, then you need to give much better info.  See for instance
http://gcc.gnu.org/bugs.html
which gives info on how to properly report a bug.  I have the target and
gcc version, but I need a testcase, compiler options, and perhaps other
info.

How do you know that adds are getting optimized?  Did you judge this by
looking at one of the dump files, or looking at the assembly output? 
Maybe you are looking at the wrong thing, or misunderstanding what you
are looking at?  You need to give more details here.

If I try compiling a trivial example with -O2 -da -S for both IA-64 and
x86, and then looking at the .gcse dump file, I see that both the
multiply and the add are in the hash table dump for the IA-64, but
neither are in the hash table dump for the x86.  The reason why is as I
explained, the added_clobbers_hard_reg_p call returns true for both on
x86, but not on IA-64.

If you are seeing something different, then you need to give more
details.  Perhaps you are looking at a different part of gcse than I am.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com




Re: gcse pass: expression hash table

2005-02-24 Thread James E Wilson
On Thu, 2005-02-24 at 09:20, Tarun Kawatra wrote:
> On Wed, 23 Feb 2005, James E Wilson wrote:
> > While looking at this, I noticed can_assign_to_reg_p does something silly.
>   ^^^
>   I could not find this function anywhere in gcc 
> 3.4.1 source.

I was looking at current gcc sources.

>> but this can fail if a target has less than 4 registers
> I could not get this point.

Don't worry about that, you don't need to understand this bit.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com




Re: gcse pass: expression hash table

2005-02-24 Thread James E Wilson
On Thu, 2005-02-24 at 03:15, Steven Bosscher wrote:
> On Feb 24, 2005 11:13 AM, Tarun Kawatra <[EMAIL PROTECTED]> wrote:
> Does GCSE look into stuff in PARALLELs at all?  From gcse.c:

Shrug.  The code in hash_scan_set seems to be doing something
reasonable.

The problem I saw wasn't with finding expressions to gcse, it was with
inserting them later.  The insertion would create a cc reg clobber, so
we don't bother adding it to the hash table.  I didn't look any further,
but it seemed reasonable that if it isn't in the hash table, then it
isn't going to be optimized.

It seems that switching the x86 backend from using cc0 to using a cc
hard register has effectively crippled the RTL gcse pass for it.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com




-Wfatal-errors=n

2005-02-24 Thread Benjamin Kosnik

>From here:
http://gcc.gnu.org/ml/gcc/2005-02/msg00923.html

I so want this. I've created a bugzilla entry for this as an enhancement so 
this does not get lost.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20201

-benjamin


Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Thu, 24 Feb 2005, James E Wilson wrote:
On Thu, 2005-02-24 at 03:15, Steven Bosscher wrote:
On Feb 24, 2005 11:13 AM, Tarun Kawatra <[EMAIL PROTECTED]> wrote:
Does GCSE look into stuff in PARALLELs at all?  From gcse.c:
Shrug.  The code in hash_scan_set seems to be doing something
reasonable.
The problem I saw wasn't with finding expressions to gcse, it was with
inserting them later.  The insertion would create a cc reg clobber, so
we don't bother adding it to the hash table.  I didn't look any further,
but it seemed reasonable that if it isn't in the hash table, then it
isn't going to be optimized.
You are write here that if some expr doesn't get into hash table, it will 
not get optimized. But since plus expressions on x86 also clobber CC as 
shown below

(insn 40 61 42 2 (parallel [
(set (reg/v:SI 74 [ c ])
(plus:SI (reg:SI 86)
(reg:SI 85)))
(clobber (reg:CC 17 flags))
]) 138 {*addsi_1} (nil)
(nil))
then why the same reasoning does not apply to plus expressions. Why will 
there insertion later will not create any problems?

Actually I am trying to extend PRE implementation so that it performs 
strength reduction as well. it requires multiplication expressions to get 
into hash table.

I am debugging the code to find where the differences for the two kind of 
expressions occur. Will let you all know if I found anything interesting. 
If you know this already please share with me.

Thanks
-tarun
It seems that switching the x86 backend from using cc0 to using a cc
hard register has effectively crippled the RTL gcse pass for it.


Re: gcse pass: expression hash table

2005-02-24 Thread Andrew Pinski
On Feb 24, 2005, at 3:55 PM, Tarun Kawatra wrote:
Actually I am trying to extend PRE implementation so that it performs 
strength reduction as well. it requires multiplication expressions to 
get into hash table.
Why do you want to do that?
Strength reduction is done already in loop.c.
Thanks,
Andrew Pinski


Re: gcse pass: expression hash table

2005-02-24 Thread Daniel Berlin
On Thu, 2005-02-24 at 15:59 -0500, Andrew Pinski wrote:
> On Feb 24, 2005, at 3:55 PM, Tarun Kawatra wrote:
> 
> > Actually I am trying to extend PRE implementation so that it performs 
> > strength reduction as well. it requires multiplication expressions to 
> > get into hash table.
> 
> Why do you want to do that?
> Strength reduction is done already in loop.c.
> 

Generally, PRE based strength reduction also includes straight line code
strength reduction.
Non-ssa based ones don't do much better in terms of redundancy
elimination, but the SSA based ones can eliminate many more redundnacies
when you integrate strength reduction into them.

IE given something like:


b = a * b;

if (argc)
{
a = a  + 1;
}
else
{
a = a + 2;
}
c = a * b;


It will remove the second multiply in favor of additions at the site of
the changes of a.

--Dan



Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Thu, 24 Feb 2005, James E Wilson wrote:
On Thu, 2005-02-24 at 03:15, Steven Bosscher wrote:
On Feb 24, 2005 11:13 AM, Tarun Kawatra <[EMAIL PROTECTED]> wrote:
Does GCSE look into stuff in PARALLELs at all?  From gcse.c:
Shrug.  The code in hash_scan_set seems to be doing something
reasonable.
The problem I saw wasn't with finding expressions to gcse, it was with
inserting them later.  The insertion would create a cc reg clobber, so
we don't bother adding it to the hash table.  I didn't look any further,
but it seemed reasonable that if it isn't in the hash table, then it
isn't going to be optimized.
This is with reference to my latest mail.
I found that while doing insertions of plus kinda expressions, the 
experssions inserted does not contain clobbering of CC, even if it is 
there in original instruction.
For example for the instruction
(insn 40 61 42 2 (parallel [
(set (reg/v:SI 74 [ c ])
(plus:SI (reg:SI 86)
(reg:SI 85)))
(clobber (reg:CC 17 flags))
]) 138 {*addsi_1} (nil)

the instruction inserted is
(insn 72 64 36 2 (set (reg:SI 87)
(plus:SI (reg:SI 86 [ a ])
(reg:SI 85 [ b ]))) 134 {*lea_1} (nil)
(nil))
That is it converts addsi_1 to lea_1. 
-tarun
 >
It seems that switching the x86 backend from using cc0 to using a cc
hard register has effectively crippled the RTL gcse pass for it.


Re: gcse pass: expression hash table

2005-02-24 Thread James E Wilson
On Thu, 2005-02-24 at 12:55, Tarun Kawatra wrote:
> You are write here that if some expr doesn't get into hash table, it will 
> not get optimized.

That was an assumption on my part.  You shouldn't take it as the literal
truth.  I'm not an expert on all implementation details of the gcse.c
pass.

>  But since plus expressions on x86 also clobber CC as 
> shown below
> then why the same reasoning does not apply to plus expressions. Why will 
> there insertion later will not create any problems?

Obviously, plus expressions will have the same problem.  That is why I
question whether plus expressions are properly getting optimized.  

Since you haven't provided any example that shows that they are being
optimized, or pointed me at anything in the gcse.c file I can look at,
there isn't anything more I can do to help you.  All I can do is tell
you that you need to give more details, or debug the problem yourself.

> Actually I am trying to extend PRE implementation so that it performs 
> strength reduction as well. it requires multiplication expressions to get 
> into hash table.

Current sources have a higher level intermediate language (gimple) and
SSA based optimization passes that operate on them.  This includes a
tree-ssa-pre.c pass.  It might be more useful to extend this to do
strength reduction that to try to extend the RTL gcse pass.

> I am debugging the code to find where the differences for the two kind of 
> expressions occur.
> Will let you all know if I found anything interesting. 

Good.

> If you know this already please share with me.

It is unlikely that anyone already knows this info offhand.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com




Re: gcse pass: expression hash table

2005-02-24 Thread Steven Bosscher
On Thursday 24 February 2005 21:16, James E Wilson wrote:
> On Thu, 2005-02-24 at 03:15, Steven Bosscher wrote:
> > On Feb 24, 2005 11:13 AM, Tarun Kawatra <[EMAIL PROTECTED]> wrote:
> > Does GCSE look into stuff in PARALLELs at all?  From gcse.c:
>
> Shrug.  The code in hash_scan_set seems to be doing something
> reasonable.
>
> The problem I saw wasn't with finding expressions to gcse, it was with
> inserting them later.  The insertion would create a cc reg clobber, so
> we don't bother adding it to the hash table.  I didn't look any further,
> but it seemed reasonable that if it isn't in the hash table, then it
> isn't going to be optimized.
>
> It seems that switching the x86 backend from using cc0 to using a cc
> hard register has effectively crippled the RTL gcse pass for it.

Not that it matters so much.  GCSE does more harm than good for
lots of code (including SPEC - the mean for int and fp goes *up*
if you disable GCSE for x86*).

The problem indeed appears to be inserting the expressions.  I
am quite sure there was a patch to allow GCSE to do more with
PARALLELs, but I can't find it anywhere.  I did stuble into this
mail: http://gcc.gnu.org/ml/gcc/2003-07/msg02064.html:

"
 - My code for GCSE on parallels that is actually in cfg branch only and
   first halve of the changes went into mainline (basic code motion
   infrastructure)
"

In one of the replies, rth said:
"I'm not sure these are worthwhile long term.  I expect the rtl GCSE
optimizer to collapse to almost nothing with the tree-ssa merge."

Which probably explains why these bits where never merged from
the cfg-branch for GCC 3.4.

Ah, archeology, so much fun.

Gr.
Steven




Re: gcse pass: expression hash table

2005-02-24 Thread Steven Bosscher
On Thursday 24 February 2005 21:59, Andrew Pinski wrote:
> On Feb 24, 2005, at 3:55 PM, Tarun Kawatra wrote:
> > Actually I am trying to extend PRE implementation so that it performs
> > strength reduction as well. it requires multiplication expressions to
> > get into hash table.
>
> Why do you want to do that?
> Strength reduction is done already in loop.c.

First, that's a different kind of strength reduction.  Second,
we'd like to blow away loop.c so replacing it would not be a 
bad thing ;-)  But the kind of strength reduction PRE can do
is something different.  Didn't Dan already have patches for
that in the old tree SSAPRE, and some ideas on how to do it
in GVN-PRE?

Gr.
Steven


Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Thu, 24 Feb 2005, Andrew Pinski wrote:
On Feb 24, 2005, at 3:55 PM, Tarun Kawatra wrote:
Actually I am trying to extend PRE implementation so that it performs 
strength reduction as well. it requires multiplication expressions to get 
into hash table.
Why do you want to do that?
Strength reduction is done already in loop.c.
We may then get rid of loop optimization pass if the optimizations 
captured by extended pre approach is comparable to that of loop.c

May be not all, but then this approach can capture straight code strength 
reduction(which need not depend on any loop, like in case of induction variables
based optimization).

-tarun
Thanks,
Andrew Pinski



Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
On Thu, 24 Feb 2005, James E Wilson wrote:
On Thu, 2005-02-24 at 12:55, Tarun Kawatra wrote:
You are write here that if some expr doesn't get into hash table, it will
^^
right.
-tarun
not get optimized.
That was an assumption on my part.  You shouldn't take it as the literal
truth.  I'm not an expert on all implementation details of the gcse.c
pass.
 But since plus expressions on x86 also clobber CC as
shown below
then why the same reasoning does not apply to plus expressions. Why will
there insertion later will not create any problems?
Obviously, plus expressions will have the same problem.  That is why I
question whether plus expressions are properly getting optimized.
Since you haven't provided any example that shows that they are being
optimized, or pointed me at anything in the gcse.c file I can look at,
there isn't anything more I can do to help you.  All I can do is tell
you that you need to give more details, or debug the problem yourself.
Actually I am trying to extend PRE implementation so that it performs
strength reduction as well. it requires multiplication expressions to get
into hash table.
Current sources have a higher level intermediate language (gimple) and
SSA based optimization passes that operate on them.  This includes a
tree-ssa-pre.c pass.  It might be more useful to extend this to do
strength reduction that to try to extend the RTL gcse pass.
I am debugging the code to find where the differences for the two kind of
expressions occur.
Will let you all know if I found anything interesting.
Good.
If you know this already please share with me.
It is unlikely that anyone already knows this info offhand.


Re: gcse pass: expression hash table

2005-02-24 Thread Daniel Berlin
On Thu, 2005-02-24 at 22:28 +0100, Steven Bosscher wrote:
> On Thursday 24 February 2005 21:59, Andrew Pinski wrote:
> > On Feb 24, 2005, at 3:55 PM, Tarun Kawatra wrote:
> > > Actually I am trying to extend PRE implementation so that it performs
> > > strength reduction as well. it requires multiplication expressions to
> > > get into hash table.
> >
> > Why do you want to do that?
> > Strength reduction is done already in loop.c.
> 
> First, that's a different kind of strength reduction.  Second,
> we'd like to blow away loop.c so replacing it would not be a 
> bad thing ;-)  But the kind of strength reduction PRE can do
> is something different.  Didn't Dan already have patches for
> that in the old tree SSAPRE, and some ideas on how to do it
> in GVN-PRE?

yes and yes.
:)




mthumb in specs file

2005-02-24 Thread Shaun Jackman
Is it possible by hacking the specs file to change the target for
arm-elf-gcc from -marm to -mthumb? I tried a few obvious things like
changing marm in *multilib_defaults to mthumb, but this did not have
the desired effect.

Please cc me in your reply. Thanks!
Shaun


Re: gcse pass: expression hash table

2005-02-24 Thread Tarun Kawatra
My assumption here was that if I gave you a few pointers, you would try
to debug the problem yourself.  If you want someone else to debug it for
you, then you need to give much better info.  See for instance
   http://gcc.gnu.org/bugs.html
which gives info on how to properly report a bug.  I have the target and
gcc version, but I need a testcase, compiler options, and perhaps other
info.
I will take this into consideration now onwards.
The test case I am using (for multiplication expression is)

#include
void foo();
int main()
{
foo();
}
void foo()
{
int a, b, c;
int cond;
scanf(" %d %d %d", &a, &b, &cond);
if( cond )
c = a * b;
c = a * b;
printf("Value of C is %d", c);
}
--
and for plus, a*b replaced by a+b everywhere.
I am compiling it as
gcc --param max-gcse-passes=2 -dF -dG -O3 filename.c
The reason for max-gcse-passes=2 is that in first pass a+b kind of 
expressions will be using different sets of pseudo registers at first and 
second occurance of a+b. After one gcse pass, both will become same 
(because of intermediate constant/copy propagation passes).
Then a+b gets optimized. As can be seen from dumps

filename.c.07.addressof
and filename.c.08.gcse
A part of expression hash table for program containing plus is
Expression hash table (11 buckets, 11 entries)
Index 0 (hash value 3)
  (plus:SI (reg/f:SI 20 frame)
(const_int -4 [0xfffc]))

Index 8 (hash value 1)
  (mem/f:SI (plus:SI (reg/f:SI 20 frame)
(const_int -8 [0xfff8])) [2 b+0 S4 A32])
Index 9 (hash value 6)
  (plus:SI (reg:SI 78 [ a ])
(reg:SI 79 [ b ]))
Index 10 (hash value 10)
  (plus:SI (reg:SI 80 [ a ])
(reg:SI 81 [ b ]))
Which clearly shows that clobbering CC in a+b is being ignored if the 
expressions which requires to be inserted will not be containing 
clobbering of CC.

How do you know that adds are getting optimized?  Did you judge this by
I am looking at dump files.
looking at one of the dump files, or looking at the assembly output?
Maybe you are looking at the wrong thing, or misunderstanding what you
are looking at?  You need to give more details here.
Regards,
-tarun


-Ttext with -mthumb causes relocation truncated to fit

2005-02-24 Thread Shaun Jackman
When -Ttext is used in combination with -mthumb it causes a relocation
truncated to fit message. What does this mean, and how do I fix it?

Please cc me in your reply. Thanks,
Shaun

$ arm-elf-gcc --version | head -1
arm-elf-gcc (GCC) 3.4.0
$ cat hello.c
int main() { return 0; }
$ arm-elf-gcc -Ttext 0x200 -mthumb hello.c
/opt/pathport/lib/gcc/arm-elf/3.4.0/thumb/crtbegin.o(.init+0x0): In
function `$t':
: relocation truncated to fit: R_ARM_THM_PC22 frame_dummy
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/lib/thumb/crt0.o(.text+0x9a):../../../../../../../gcc-3.4.0/newlib/libc/sys/arm/crt0.S:200:
relocation truncated to fit: R_ARM_THM_PC22 _init
/opt/pathport/lib/gcc/arm-elf/3.4.0/thumb/crtend.o(.init+0x0): In function `$t':
: relocation truncated to fit: R_ARM_THM_PC22 __do_global_ctors_aux
collect2: ld returned 1 exit status


Re: C++ math optimization problem...

2005-02-24 Thread Benjamin Redelings I
Hello,
	Regarding the testcase I mentioned before, I have been checking out the 
Intel compiler to see if it would generate better code.  Interestingly 
enough, it displays EXACTLY the same run-times as gcc for the two tests 
(0.2s for math in if-block, 1.0s for math out of if-block).

So this is rather strange.
Shall I file a PR if it doesn't become clear what is going on?
thanks,
-BenRI
include 
const int OUTER = 10;
const int INNER = 1000;
using namespace std;
int main(int argn, char *argv[])
{
  int s = atoi(argv[1]);
  double result;
  {
vector d(INNER); // move outside of this scope to fix
// initialize d
for (int i = 0; i < INNER; i++)
  d[i] = double(1+i) / INNER;
// calc result
result=0;
for (int i = 0; i < OUTER; ++i)
  for (int j = 1; j < INNER; ++j)
result += d[j]*d[j-1] + d[j-1];
  }
  printf("result = %f\n",result);
  return 0;
}
P.S. Um, is the gcc listserv intelligent enough not to send you all a 
second copy of this e-mail?


Re: -Ttext with -mthumb causes relocation truncated to fit

2005-02-24 Thread Daniel Jacobowitz
On Thu, Feb 24, 2005 at 03:23:53PM -0800, Shaun Jackman wrote:
> When -Ttext is used in combination with -mthumb it causes a relocation
> truncated to fit message. What does this mean, and how do I fix it?
> 
> Please cc me in your reply. Thanks,
> Shaun

Don't use -Ttext with an ELF toolchain; use a linker script instead.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


Specifying a linker script from the specs file

2005-02-24 Thread Shaun Jackman
I have had no trouble specifiying the linker script using the -T
switch to gcc. I am now trying to specify the linker script from a
specs file like so:

%rename link old_link
*link:
-Thello.ld%s %(old_link)

gcc complains though about linking Thumb code against ARM libraries --
I've specified -mthumb to gcc --
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/bin/ld:
/opt/pathport/arm-elf/lib/libc.a(memset.o)(memset): warning:
interworking not enabled.

Why does the above specs snippet cause gcc to forget it's linking
against thumb libraries?

Please cc me in your reply. Thanks,
Shaun

$ arm-elf-gcc --version | head -1
arm-elf-gcc (GCC) 3.4.0
$ cat hello.c
int main() { return 0; }
$ cat hello.specs
%rename link old_link
*link:
-Thello.ld%s %(old_link)
$ diff /opt/pathport/arm-elf/lib/ldscripts/armelf.xc hello.ld
12c12
<   PROVIDE (__executable_start = 0x8000); . = 0x8000;
---
>   PROVIDE (__executable_start = 0x200); . = 0x200;
181c181
< .stack 0x8 :
---
> .stack 0x2100 :
$ arm-elf-gcc -mthumb -Thello.ld hello.c
$ arm-elf-gcc -mthumb -specs=hello.specs hello.c 2>&1 | head -1
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/bin/ld:
/opt/pathport/arm-elf/lib/libc.a(memset.o)(memset): warning:
interworking not enabled.


what's the proper way to configure/enable/disable C exception handling?

2005-02-24 Thread Paul Schlie
In attempting to configure a target limited to 32-bit C type support, it
became obvious that exception support seems to be unconditionally required,
and defaults to assuming target support for 64-bit data types although not
necessarily configured to support data types this large?

- is this intentional/necessary for C language compilation?

- is not, what's the recommended way to specify the configuration to
  either eliminate the necessity, or select an exception model which
  doesn't require 64-bit type target type support.

- might forcing sjlj exceptions help? With what consequences?

- or might it be best for me to attempt to refine the baseline exception
  data structure definitions to be more target type size support aware?
 
  - if so, which target configuration header files or facilities would be
the officially most ideal/correct ones to use to convey the target's
supported type size configuration information to the exception handling
implementations files?

Any insight/recommendations would be appreciated,

Thanks, -paul-




GNU INTERCAL front-end for GCC?

2005-02-24 Thread Sam Lauber
I am thinking of including a front-end for INTERCAL for 
GCC.  INTERCAL is an estoric programming langauge that was 
created in 1972 with the goal of having nothing in common 
with other langauges (see http://catb.org/~esr/intercal).  
There is a C implementation of INTERCAL (called C-INTERCAL) 
that is avalible there.  I think it would be a good 
project(1) as a front-end(2) to GCC.  

Samuel Lauber

(1) -> Don't say that I'm crazy.  
(2) -> Some of us would like 

DO .1 <- #0

to be translated into 

movl $0, v1


-- 
_
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.

Powered by Outblaze


Re: Benchmark of gcc 4.0

2005-02-24 Thread Uros Bizjak
Hello!
I just got interested and did a test myself.  Comparing gcc 4.0 (-O2
-funroll-loops -D__NO_MATH_INLINES -ffast-math -march=pentium4
-mfpmath=sse -ftree-vectorize)
and icc 9.0 beta (-O3 -xW -ip):
 

Here are the results of scimark with '-O3 -march=pentium4 -mfpmath=... 
-funroll-loops -ftree-vectorize -ffast-math -D__NO_MATH_INLINES 
-fomit-frame-pointer' and various -mfpmath settings:

-mpfmath=sse:
Composite Score:  664.47
FFT Mflops:   371.12(N=1024)
SOR Mflops:   511.13(100 x 100)
MonteCarlo: Mflops:   130.94
Sparse matmult  Mflops:   856.68(N=1000, nz=5000)
LU  Mflops:  1452.48(M=100, N=100)
-mfpmath=387:
Composite Score:  624.14
FFT Mflops:   391.09(N=1024)
SOR Mflops:   465.45(100 x 100)
MonteCarlo: Mflops:   188.38
Sparse matmult  Mflops:   811.59(N=1000, nz=5000)
LU  Mflops:  1264.20(M=100, N=100)
-mfpmath=sse,387:
Composite Score:  665.51
FFT Mflops:   372.70(N=1024)
SOR Mflops:   509.78(100 x 100)
MonteCarlo: Mflops:   148.72
Sparse matmult  Mflops:   832.20(N=1000, nz=5000)
LU  Mflops:  1464.16(M=100, N=100)
I think that the results will be even better once PR18463 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463) is fixed. The LU 
benchmark is one of tescases where these problems were found. You can 
check asm code for sequences like:

   leal0(,%ecx,8), %edx
   movsd(%ebx,%edx), %xmm0
instead of:
   movsd   (%ebx,%ecx,8), %xmm0
Uros.