Re: OpenMP bug with gfortran when compile under Windows platform

2006-05-30 Thread François-Xavier Coudert

[CCing the OpenMP experts]

Henry,

The -fopenmp option doesn't work under mingw32. Since I am the one
building the Windows (mingw32) binary packages you downloaded, I'm
rather interesting in getting it to work... So here are a few things
we could sort out:

 1. currently, using the -fopenmp options gives:


$ gfortran -fopenmp a.f
gfortran.exe: unrecognized option '-pthread'
gfortran.exe: libgomp.spec: No such file or directory


Could we have a clearer error message? (perhaps saying that openmp is
not available on that platform) The current message is clearly... not
clear for users!


 2. I looked at pthreads win32
(http://sources.redhat.com/pthreads-win32/), an opensource thread
support for win32, including mingw32. Not all POSIX functions are
implemented, but a fair amount of them. I'll try to get libgomp
compiling with against those, and report progress here.


 3. why is libgomp building conditional on target triplet, and not on
detecting a working pthread implementation?


Thanks,
FX


Re: OpenMP bug with gfortran when compile under Windows platform

2006-05-30 Thread Jakub Jelinek
On Tue, May 30, 2006 at 11:19:09AM +0200, Fran?ois-Xavier Coudert wrote:
> [CCing the OpenMP experts]
> 
> Henry,
> 
> The -fopenmp option doesn't work under mingw32. Since I am the one
> building the Windows (mingw32) binary packages you downloaded, I'm
> rather interesting in getting it to work... So here are a few things
> we could sort out:
> 
>  1. currently, using the -fopenmp options gives:
> 
> >$ gfortran -fopenmp a.f
> >gfortran.exe: unrecognized option '-pthread'
> >gfortran.exe: libgomp.spec: No such file or directory
> 
> Could we have a clearer error message? (perhaps saying that openmp is
> not available on that platform) The current message is clearly... not
> clear for users!

Then mingw32 should do something similar to config/i386/cygwin.h, which has
/* Every program on cygwin links against cygwin1.dll which contains
   the pthread routines.  There is no need to explicitly link them
   and the -pthread flag is not recognized.  */
#undef GOMP_SELF_SPECS
#define GOMP_SELF_SPECS ""

>  2. I looked at pthreads win32
> (http://sources.redhat.com/pthreads-win32/), an opensource thread
> support for win32, including mingw32. Not all POSIX functions are
> implemented, but a fair amount of them. I'll try to get libgomp
> compiling with against those, and report progress here.
> 
> 
>  3. why is libgomp building conditional on target triplet, and not on
> detecting a working pthread implementation?

Most of the things are detected, only very few things are keyed on target
triplet and in those cases it is desirable (e.g. arch specific assembly,
etc.).  Once you do 2., you just port libgomp to mingw32 + pthreads-win32
and assuming pthreads-win32 is sufficiently rich and not too buggy, it will
just work.

Jakub


[libmudflap] build warnings...

2006-05-30 Thread Christian Joensson

I just wanted to ping the list here on current gcc trunk libmudflap
build warnings:

../../../gcc/libmudflap/mf-runtime.c:1706: warning: format '%06lu'
expects type 'long unsigned int', but argument 15 has type
'__suseconds_t'
../../../gcc/libmudflap/mf-runtime.c:1729: warning: format '%06lu'
expects type 'long unsigned int', but argument 4 has type
'__suseconds_t'
../../../gcc/libmudflap/mf-runtime.c:1998: warning: format '%06lu'
expects type 'long unsigned int', but argument 6 has type
'__suseconds_t'
../../../../gcc/libmudflap/mf-runtime.c:1706: warning: format '%06lu'
expects type 'long unsigned int', but argument 15 has type
'__suseconds_t'
../../../../gcc/libmudflap/mf-runtime.c:1729: warning: format '%06lu'
expects type 'long unsigned int', but argument 4 has type
'__suseconds_t'
../../../../gcc/libmudflap/mf-runtime.c:1998: warning: format '%06lu'
expects type 'long unsigned int', but argument 6 has type
'__suseconds_t'


Are these something one simply has to accept or are something more
deep lurking here?

--
Cheers,

/ChJ


Re: OpenMP bug with gfortran when compile under Windows platform

2006-05-30 Thread François-Xavier Coudert

you just port libgomp to mingw32 + pthreads-win32
and assuming pthreads-win32 is sufficiently rich and not too buggy, it will
just work.


With the attached patch, I can compile libgomp with

../gcc/configure --prefix=/mingw --disable-nls --with-ld=/mingw/bin/ld
--with-as=/mingw/bin/as --disable-werror --enable-bootstrap
--enable-threads=posix --with-win32-nlsapi=unicode
--host=i386-pc-mingw32 --enable-languages=c,fortran --enable-libgomp

and the resulting compiler and generated executables seem to work (I
tried a few C and Fortran toy codes). The main changes are to
libgomp/config/posix/time.c, which used functions not available on
mingw32. Would they be acceptable in this form (protected with #ifdef
_WIN32)? If so, I'll do some more testing, and officially submit the
patch.

FX
Index: libgomp/configure
===
--- libgomp/configure	(revision 114196)
+++ libgomp/configure	(working copy)
@@ -8397,7 +8397,9 @@
 # Check for functions needed.
 
 
-for ac_func in getloadavg clock_gettime
+
+
+for ac_func in getloadavg clock_gettime gettimeofday sysconf
 do
 as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh`
 echo "$as_me:$LINENO: checking for $ac_func" >&5
Index: libgomp/configure.ac
===
--- libgomp/configure.ac	(revision 114196)
+++ libgomp/configure.ac	(working copy)
@@ -162,7 +162,7 @@
[AC_MSG_ERROR([Pthreads are required to build libgomp])])])
 
 # Check for functions needed.
-AC_CHECK_FUNCS(getloadavg clock_gettime)
+AC_CHECK_FUNCS(getloadavg clock_gettime gettimeofday sysconf)
 
 # Check for broken semaphore implementation on darwin.
 # sem_init returns: sem_init error: Function not implemented.
Index: libgomp/config.h.in
===
--- libgomp/config.h.in	(revision 114196)
+++ libgomp/config.h.in	(working copy)
@@ -18,6 +18,9 @@
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
+/* Define to 1 if you have the `gettimeofday' function. */
+#undef HAVE_GETTIMEOFDAY
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_INTTYPES_H
 
@@ -42,6 +45,9 @@
 /* Define to 1 if the target supports __sync_*_compare_and_swap */
 #undef HAVE_SYNC_BUILTINS
 
+/* Define to 1 if you have the `sysconf' function. */
+#undef HAVE_SYSCONF
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_LOADAVG_H
 
Index: libgomp/config/posix/time.c
===
--- libgomp/config/posix/time.c	(revision 114196)
+++ libgomp/config/posix/time.c	(working copy)
@@ -48,32 +48,52 @@
 double
 omp_get_wtime (void)
 {
-#ifdef HAVE_CLOCK_GETTIME
+#ifdef HAVE_GETTIMEOFDAY
+# ifdef HAVE_CLOCK_GETTIME
   struct timespec ts;
-# ifdef CLOCK_MONOTONIC
+#  ifdef CLOCK_MONOTONIC
   if (clock_gettime (CLOCK_MONOTONIC, &ts) < 0)
-# endif
+#  endif
 clock_gettime (CLOCK_REALTIME, &ts);
   return ts.tv_sec + ts.tv_nsec / 1e9;
-#else
+# else
   struct timeval tv;
   gettimeofday (&tv, NULL);
   return tv.tv_sec + tv.tv_usec / 1e6;
+# endif
+#else
+# ifdef _WIN32
+
+#include 
+  struct _timeb timebuf;
+  _ftime (&timebuf);
+  return (timebuf.time + (long)(timebuf.millitm) / 1e3);
+# else
+#  error "Either clock_gettime or gettimeofday are required"
+# endif
 #endif
 }
 
 double
 omp_get_wtick (void)
 {
-#ifdef HAVE_CLOCK_GETTIME
+#ifdef HAVE_SYSCONF
+# ifdef HAVE_CLOCK_GETTIME
   struct timespec ts;
-# ifdef CLOCK_MONOTONIC
+#  ifdef CLOCK_MONOTONIC
   if (clock_getres (CLOCK_MONOTONIC, &ts) < 0)
-# endif
+#  endif
 clock_getres (CLOCK_REALTIME, &ts);
   return ts.tv_sec + ts.tv_nsec / 1e9;
+# else
+  return 1.0 / sysconf(_SC_CLK_TCK);
+# endif
 #else
-  return 1.0 / sysconf(_SC_CLK_TCK);
+# ifdef _WIN32
+  return 1e-3;
+# else
+#  error "Either clock_getres or sysconf are required"
+# endif
 #endif
 }
 
Index: gcc/config/i386/mingw32.h
===
--- gcc/config/i386/mingw32.h	(revision 114196)
+++ gcc/config/i386/mingw32.h	(working copy)
@@ -108,3 +108,8 @@
 /* Define as short unsigned for compatibility with MS runtime.  */
 #undef WINT_TYPE
 #define WINT_TYPE "short unsigned int"
+
+/* The mingw32 compiler doesn't know the -pthread option, but requires
+   explicitly linking the libpthread.  */
+#undef GOMP_SELF_SPECS
+#define GOMP_SELF_SPECS "-lpthread"


Re: OpenMP bug with gfortran when compile under Windows platform

2006-05-30 Thread Jakub Jelinek
On Tue, May 30, 2006 at 04:37:35PM +0200, Fran?ois-Xavier Coudert wrote:
> >you just port libgomp to mingw32 + pthreads-win32
> >and assuming pthreads-win32 is sufficiently rich and not too buggy, it will
> >just work.
> 
> With the attached patch, I can compile libgomp with
> 
> ../gcc/configure --prefix=/mingw --disable-nls --with-ld=/mingw/bin/ld
> --with-as=/mingw/bin/as --disable-werror --enable-bootstrap
> --enable-threads=posix --with-win32-nlsapi=unicode
> --host=i386-pc-mingw32 --enable-languages=c,fortran --enable-libgomp
> 
> and the resulting compiler and generated executables seem to work (I
> tried a few C and Fortran toy codes). The main changes are to
> libgomp/config/posix/time.c, which used functions not available on
> mingw32. Would they be acceptable in this form (protected with #ifdef
> _WIN32)? If so, I'll do some more testing, and officially submit the
> patch.

_WIN32 #ifdefs are just too ugly, additionally including a system
header inside of a routine is a big no no.
I think it would be much cleaner if you added config/mingw32/time.c
instead and tweaked config.tgt, after all, the file only contains the
2 routines and you use completely different bodies for those routines
on mingw32 than on any other target.

Jakub


Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread Maxim Kuvyrkov

Daniel Jacobowitz wrote:

Hi Maxim and Vlad,

I just tracked an ICE while building glibc for ARM to this patch,
which introduced --param max-sched-extend-regions-iters with a default
of two:

  http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00998.html


...


The register variables and their initializations get hoisted all the way out
of the first if.  On ia64, with a million execution units to spare and a
fat pipeline, this may make sense.  On targets with a simpler execution
model, though, it's pretty awful.  If the condition (which we have no
information on the likelihood of) is false, we've added lots of cycles for
no gain.  It's not like the scheduler was filling holes; the initializations
were scheduled as early as possible because they had no dependencies.

With the parameter turned back down to one, the testcase compiles, and the
code looks sensible again.  No, I wasn't able to work out why profiling was
necessary to trigger this problem; I suspect it makes some register
unavailable, but I'm not sure which.  I didn't look into that further.

What's your opinion?  We could easily change the default of the parameter
for ARM, but I assume there are other affected targets.  I don't know if we
need the extended region scheduling to be smarter, or if it should simply be
turned off for some targets.



Hi Daniel!

Sorry for the delay, I needed time to investigate the cause of the problem.

The real problem lies in the computation of the instruction priorities. 
 ARM has fairly simple scheduling model: on each cycle insn standing 
first in the ready list gets scheduled.  This behavior puts *all* the 
responsibility for the resulting schedule on how the ready list is 
arranged.  The main decision factor of the sorting of the ready list is 
INSN_PRIORITY.


Instructions in inner 'if' get somewhat greater priority then the 
instructions from the dominator block and hence get hoisted from their 
original block.  The good solution for this case would be more precise 
evaluation of the insn priorities.  This includes transformation of the 
insn priority from the region-scope to the block-scope value: e.g. in 
this case, while scheduling the first block, the priorities of the insns 
from the 'if'-block will be multiplied by probability of the 
'then'-branch and, therefore, will be significantly lower than the 
priority of the insns from the current block.  I've started to implement 
this idea some time ago, but never finished :(


Anyway, this work is for stage 1 or 2 and for now I propose following 
fix: implement targetm.sched.reorder hook so that it will ensure that if 
there is an insn from the current block in the ready list, then insn 
from the other block won't stand first in the line (and, therefore, 
won't be chosen for schedule).  I feel that this will be what you are 
calling 'filling holes'.  Please find an example patch attached (arm.patch).


While debugging the testcase I found two somewhat unrelated bugs in the 
handling of INSN_PRIORITY: first one is in the haifa-sched.c: priority 
().  When insn has no forward dependencies its priority is set to its 
latency.  The bug occurs when insn has some deps and all of them get 
rejected by current_sched_info->contributes_to_priority () hook - in 
this case INSN_PRIORITY should also be initialized with insn latency, 
but present code misses that.


The second one is not as critical as the first one.  It is in 
haifa-sched.c: adjust_priority ().  This function plainly calls the 
targetm.sched_adjust_priority () hook when insn is being added to the 
ready list.  As I understand all targets assume this hook to be invoked 
once: after all priorities are are computed, but before insn is added to 
the ready list.  But for insns with no dependencies from the source 
blocks this hook can be called many times - therefore priorities of that 
insns can become sensibly inadequate.


The patch for these two small bugs is also attached 
(priority-bugs.patch) .  Is it ok for trunk?  If so I will repost it to 
gcc-patches list.



Best regards,
Maxim
--- config/arm/arm.c(/gcc-local/trunk/gcc)  (revision 19877)
+++ config/arm/arm.c(/gcc-local/arm-bug/gcc)(revision 19877)
@@ -52,6 +52,7 @@
 #include "target-def.h"
 #include "debug.h"
 #include "langhooks.h"
+#include "sched-int.h"
 
 /* Forward definitions of types.  */
 typedef struct minipool_nodeMnode;
@@ -118,6 +119,9 @@ static void thumb_output_function_prolog
 static int arm_comp_type_attributes (tree, tree);
 static void arm_set_default_type_attributes (tree);
 static int arm_adjust_cost (rtx, rtx, rtx, int);
+static void arm_reorder (rtx *, int);
+static int arm_reorder1 (FILE *, int, rtx *, int *, int);
+static int arm_reorder2 (FILE *, int, rtx *, int *, int);
 static int count_insns_for_constant (HOST_WIDE_INT, int);
 static int arm_get_strip_length (int);
 static bool arm_function_ok_for_sibcall (tree, tree);
@@ -245,6 +249,12 @@ static bool arm_tls_symbol_p (rtx x);
 #undef  TARGET_SCHED_ADJ

Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread David Edelsohn
> Maxim Kuvyrkov writes:

Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following 
Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that if 
Maxim> there is an insn from the current block in the ready list, then insn 
Maxim> from the other block won't stand first in the line (and, therefore, 
Maxim> won't be chosen for schedule).  I feel that this will be what you are 
Maxim> calling 'filling holes'.  Please find an example patch attached 
(arm.patch).

What about all of the other GCC targets?

If your patch changed the default behavior of the scheduler
assumed by all other ports, you should fix the scheduler and modify the
IA-64 port to get the behavior desired.

David



Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread Mark Mitchell
David Edelsohn wrote:
>> Maxim Kuvyrkov writes:
> 
> Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following 
> Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that 
> if 
> Maxim> there is an insn from the current block in the ready list, then insn 
> Maxim> from the other block won't stand first in the line (and, therefore, 
> Maxim> won't be chosen for schedule).  I feel that this will be what you are 
> Maxim> calling 'filling holes'.  Please find an example patch attached 
> (arm.patch).
> 
>   What about all of the other GCC targets?
> 
>   If your patch changed the default behavior of the scheduler
> assumed by all other ports, you should fix the scheduler and modify the
> IA-64 port to get the behavior desired.

Exactly.

I think this is a serious regression, and I would like to consider our
options.  Daniel has suggested changing the default value of the
max-sched-extend-regions-iters param to 1.  However, I think we should
conservatively change it to zero, for now, and then use a target macro
to allow IA64 to set it to two, and other ports to gradually turn this
on if useful.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Successful gcc 4.1.1 build on alphaev68-dec-osf5.1b Tru64(c,c++,fortran,objc,treelang)

2006-05-30 Thread Stefano Curtarolo, Ph.D.


[EMAIL PROTECTED]:~#gcc -v
Using built-in specs.
Target: alphaev68-dec-osf5.1b
Configured with:
../configure
--host=alphaev68-dec-osf5.1b 
--enable-threads=posix
--enable-languages=c,c++,fortran,objc,treelang 
--prefix=/usr/local

--enable-version-specific-runtime-libs
--enable-shared 
--enable-libgcj

--enable-nls
--enable-interpreter
Thread model: posix
gcc version 4.1.1


I had some problems with java so I turned it off.

See my previous email:
http://gcc.gnu.org/ml/gcc/2005-07/msg00601.html
for f95 and java compilation.


Sincerely,
Stefano Curtarolo


--
Prof. Stefano Curtarolo
Assistant Professor of Materials Science
Duke University, Dept. Mechanical Engineering and Materials Science
144 Hudson Hall, Box 90300, Durham, NC  27708-0300
phone 919-660-5506 [EMAIL PROTECTED] http://alpha.mems.duke.edu
--


--
[This email was composed with renewable energy.
 When you are done reading this email, please dispose of it in an
 environmentally friendly manner, such as electronic composting.]
--


Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread Vladimir Makarov

Mark Mitchell wrote:


David Edelsohn wrote:
 


Maxim Kuvyrkov writes:
 

Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following 
Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that if 
Maxim> there is an insn from the current block in the ready list, then insn 
Maxim> from the other block won't stand first in the line (and, therefore, 
Maxim> won't be chosen for schedule).  I feel that this will be what you are 
Maxim> calling 'filling holes'.  Please find an example patch attached (arm.patch).


What about all of the other GCC targets?

If your patch changed the default behavior of the scheduler
assumed by all other ports, you should fix the scheduler and modify the
IA-64 port to get the behavior desired.
   



Exactly.

I think this is a serious regression, and I would like to consider our
options.  Daniel has suggested changing the default value of the
max-sched-extend-regions-iters param to 1.  However, I think we should
conservatively change it to zero, for now, and then use a target macro
to allow IA64 to set it to two, and other ports to gradually turn this
on if useful.

 

I am agree with this.  Two months ago Maxim submitted patches which 
affects only ia64 except one thing affecting all targets - the patch 
which builds more scheduling regions and as consequence permits more 
aggressive interblock scheduling.


Insn scheduling before the register allocation even without Maxim's 
patches is not safe when hard registers are used in RTL.  It is a known 
bug (e.g. for x86_64) and it is in bugzilla.  Jim Wilson wrote several 
possible solutions for this, no one is easy to implement except for 
switching off insn scheduling before RA (what is done for x86_64).


But we can restore the state (probably safe for most programs) what was 
before Maxim's patch.  So Maxim could you do this (of course you can 
save max-sched-extend-regions-iters value for ia64 because it is 
probably safe for targets with many registers).


Vlad




Re: c++ regression in trunk

2006-05-30 Thread Mike Stump

On May 29, 2006, at 1:17 PM, Jack Howarth wrote:
In building xplor-nih against the gcc trunk, I noticed that there  
is a c++ related

regression


I'll let Andrew comment if it sounds like anything he's seen.  I'd  
recommend a binary search to narrow down the translation unit and the  
compiler version that went bad if the prospects of actually trying to  
debug this are daunting.


Off-hand, sounds like throw across dylib/bundle boundary problem, or  
someone using the wrong visibility on classes.


You can  check for the later by something like:

nm -m *.o | c++filt | grep info | grep -v external

and seeing if you get any hits.  If so, they could be the cause of  
the problem.


The first case might be easier to test in the small, as if it is  
totally busted, even the most trivial of programs will show the  
breakage.


Re: Freeing memory for basic-blocks and edges

2006-05-30 Thread Mike Stump

On May 29, 2006, at 1:07 PM, sean yang wrote:
For example, I know to allocate a chunk of memory to hold BB  
information is done by "ggc_alloc_cleared()". But after a function  
analysis/optimization is done, the memory should be freed.


  ggc_free (ptr);

can be used, but, if you use it, you have to be absolutely certain  
that there doesn't exist a reachable pointer to that memory,  
anywhere.  You must explicitly zero out any old pointers that pointed  
to the data, if those pointers would have been reachable otherwise.


A reload failure which I can't figure out

2006-05-30 Thread Rask Ingemann Lambertsen
Hi.

I have problems figuring out why reload gives up on this:

reload failure for reload 1
../../../cvssrc/gcc/gcc/libgcc2.c: In function '__moddi3':
../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: unable to find a register to
spill in class 'DX_REGS'
../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: these are the reloads for
insn # 425:
Reload 0: reload_in (HI) = (reg:HI 4 d [+2 ])
reload_out (HI) = (reg:HI 0 c [199])
AX_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg:HI 4 d [+2 ])
reload_out_reg: (reg:HI 0 c [199])
Reload 1: reload_out (HI) = (scratch:HI)
DX_REGS, RELOAD_FOR_OUTPUT (opnum = 3)
reload_out_reg: (scratch:HI)
../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: this is the insn:
(insn 425 422 432 30 ../../../cvssrc/gcc/gcc/libgcc2.c:911 (parallel [
(set (reg:HI 0 c [199])
(mult:HI (reg:HI 4 d [+2 ])
(reg:HI 6 b [orig:629 __d0 ] [629])))
(clobber (scratch:HI))
(clobber (reg:CC 12 cc))
]) 309 {*mulhi3} (nil)
(expr_list:REG_UNUSED (reg:CC 12 cc)
(expr_list:REG_UNUSED (scratch:HI)
(expr_list:REG_DEAD (reg:HI 4 d [+2 ])
(expr_list:REG_UNUSED (reg:CC 12 cc)
(expr_list:REG_UNUSED (scratch:HI)
(nil)))
../../../cvssrc/gcc/gcc/libgcc2.c:1101: internal compiler error: in
spill_failure, at reload1.c:1915

The *mulhi3 pattern is this:
(define_insn "*mulhi3"
[(set (match_operand:HI 0 "single_register_operand" "=a")
  (mult:HI (match_operand:HI 1 "single_register_operand" "%0")
   (match_operand:HI 2 "general_operand" "rm")))
 (clobber (match_scratch:HI 3 "=d"))
 (clobber (reg:CC CC_REG))]
""
"mulw\t%2"
)

where the constraint "a" matches the register class AX_REGS consiting of
(reg:HI 2 a), the constraint "d" matches the register class DX_REGS
consiting of (reg:HI 4 d) and the predicate "single_register_operand" is:

(define_predicate "single_register_operand"
(and (match_operand 0 "register_operand")
 (ior (not (match_code "subreg"))
  (match_test "GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)))
   <= UNITS_PER_WORD")))
)

In other words, like "register_operand" register but reject subregs of
registers larger than a word - 16 bits.

The insn seems perfectly possible to reload:
(set (mem:HI stack-slot-d) (reg:HI 4 d))
(set (mem:HI stack-slot-a) (reg:HI 2 a))
(set (reg:HI 2 a) (mem:HI stack-slot-d))

(parallel [(set (reg:HI 2 a)
(mult:HI (reg:HI 2 a) (reg:HI 6 b)))
   (clobber (reg:HI 4 d))
   (clobber (reg:CC CC_REG))]

(set (reg:HI 0 c) (reg:HI 2 a))
(set (reg:HI a) (mem:HI stack-slot-a))
(set (reg:HI d) (mem:HI stack-slot-d))

What am I missing?

-- 
Rask Ingemann Lambertsen


Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread Paolo Bonzini

Maxim Kuvyrkov wrote:

Anyway, this work is for stage 1 or 2 and for now I propose following 
fix: implement targetm.sched.reorder hook so that it will ensure that if 
there is an insn from the current block in the ready list, then insn 
from the other block won't stand first in the line (and, therefore, 
won't be chosen for schedule).  I feel that this will be what you are 
calling 'filling holes'.  Please find an example patch attached 
(arm.patch).


Do you think this could be a default implementation of the reorder hook,
like this?  (After suitable performance testing.  This looks O(n^2) to me).

--- in defaults.h ---

#ifndef TARGET_SCHED_REORDER
#define TARGET_SCHED_REORDER default_reorder
#endif

#ifndef TARGET_SCHED_REORDER2
#define TARGET_SCHED_REORDER2 default_reorder2
#endif

--- in targhooks.c ---

int
default_reorder (FILE *dump, int sched_verbose,
 rtx *ready, int *pn_ready, int clock_var)
{
  default_reorder2 (dump, sched_verbose, ready, pn_ready, clock_var);
  if (targetm.sched.issue_rate)
return targetm.sched.issue_rate ();
  else
return 1;
}

int
default_reorder2 (FILE *dump ATTRIBUTE_UNUSED,
  int sched_verbose ATTRIBUTE_UNUSED,
  rtx *ready, int *pn_ready,
  int clock_var ATTRIBUTE_UNUSED)
{
  int n_ready = *pn_ready;

  /* This is correct for sched-rgn.c only.  */
  if (reload_completed
  && (flag_sched2_use_superblocks || flag_sched2_use_traces))
return 0;

  if (n_ready > 1)
{
  basic_block bb = BLOCK_FOR_INSN (current_sched_info->prev_head);

  if (BLOCK_FOR_INSN (ready[n_ready - 1]) != bb)
{
  int i;

  for (i = n_ready - 1; i >= 0; i--)
{
  rtx insn = ready[i];

  if (BLOCK_FOR_INSN (insn) != bb)
continue;

  memcpy (ready + i, ready + i + 1,
  (n_ready - i - 1) * sizeof (*ready));
  ready[n_ready - 1] = insn;
  break;
}
}
}

  return 0;
}

Paolo



Re: IA-64 speculation patches have bad impact on ARM

2006-05-30 Thread Daniel Jacobowitz
On Tue, May 30, 2006 at 08:57:57PM +0200, Paolo Bonzini wrote:
> int
> default_reorder2 (FILE *dump ATTRIBUTE_UNUSED,
>   int sched_verbose ATTRIBUTE_UNUSED,
>   rtx *ready, int *pn_ready,
>   int clock_var ATTRIBUTE_UNUSED)
> {
>   int n_ready = *pn_ready;
> 
>   /* This is correct for sched-rgn.c only.  */
>   if (reload_completed
>   && (flag_sched2_use_superblocks || flag_sched2_use_traces))
> return 0;
> 
>   if (n_ready > 1)
> {
>   basic_block bb = BLOCK_FOR_INSN (current_sched_info->prev_head);
> 
>   if (BLOCK_FOR_INSN (ready[n_ready - 1]) != bb)
> {
>   int i;
> 
>   for (i = n_ready - 1; i >= 0; i--)
> {
>   rtx insn = ready[i];
> 
>   if (BLOCK_FOR_INSN (insn) != bb)
> continue;
> 
>   memcpy (ready + i, ready + i + 1,
>   (n_ready - i - 1) * sizeof (*ready));
>   ready[n_ready - 1] = insn;
>   break;
> }
> }
> }
> 
>   return 0;
> }

Not even a single comment - shame on you both! :-)  If this is the
solution we choose, can we make sure that there's at least a comment
explaining what's going on?

-- 
Daniel Jacobowitz
CodeSourcery


Re: call_insns in RTX form--two questions

2006-05-30 Thread Mike Stump
[ I wasn't going to answer this, because you've left out all the  
details that would be required for me to answer it well, but since  
you've asked me specifically to answer, I'll try.  ]


On May 29, 2006, at 9:09 PM, sean yang wrote:
The first question is: If I want to find a BB that containing a  
specific function call (say 'foo'), is there an easy way in the RTX  
level?


Yes or no, depending on what you consider easy and how you define the  
problem.  Anyway, if you look at the rtl:


(call_insn 8 28 9 0 (parallel [
(call (mem:SI (symbol_ref:SI ("&L_foo$stub")  
) [0 S4 A8])

(const_int 32 [0x20]))
(use (const_int 0 [0x0]))
(clobber (reg:SI 65 lr))
]) 368 {*call_nonlocal_sysv} (nil)
(nil)
(nil))

all you have to do it wonder the insns looking for a call_insn and  
then check to see if the form is (mem (symbol_ref X)), and then check  
to see if the X mentioned is the one your interested in.  The above  
is a nice example, because it shows that the spelling of X need not  
be very portable, so doing this reliably can be `hard'.


Further, the compiler can use a register and an indirect call, thus  
making strcmp for the function name non-trivial.  If you want to wave  
your hands and not solve that problem, it is then easier, if not,  
then it can be impossible to solve.  As impossible as:


main() {
extern (*bar)():
(*bar)();
}

is to figure out.  However, it need not be that hard, could only be  
as hard as checking the NOTEs structure and doing a strcmp on them.   
That is easy to do.


Hint, if pr can show you the information you're interested in, then,  
trivially, the answer is yes.  Just follow the rtl dumper to the info  
you want.


The second one is: how can i get the order of different call foo in  
the final assembly code.


The language gap causes this question to be hard for us to  
understand.  If you told us what you're trying to do and why in 20  
pages or more it would be easier to answer.


Can I get it by dumping some information, say the order of the  
instruction link list, in RTL representation?


First, you never defined what ordering you want.  Do you want order  
as defined by the address of the instruction?  Do you want source  
code order?  Do you want order in the rtl for the function?


For rtl order, yes, just walk the rtl in order, and the first call to  
foo is the first, and the second call to foo is the second...


For source code order, you can approximate it by checking the debug  
line information associated with call in question.  Just collect all  
that info, sort increasing by line, the first one is first, the  
second one is second and so on.  This, as you can tell can't help  
with multiple calls to foo on the same line nor with compiles that  
lack debug information.  Though, if one throws the column numbers  
into the debug information, then one can solve for multiple calls on  
one line as well.  EXPR_LOCATION and EXPR_LINENO might have some of  
the data you're interested in.


For address ordering, well, generally that is the same as rtl order,  
but hot/cold partitioning can alter that.  I'll not answer this, as  
I'm hoping you don't care about the details.


Re: Modifying ARM code generator for elimination of 8bit writes - need help

2006-05-30 Thread Paul Brook
> I found arm.md and the moveqi insns, but because of the different
> addressing modes of strb and swpb, its not easy to make the change.
> And there must be a compiler option for this, too.
>
> Could somebody please tell me how to implement this change?

Short answer is probably not.

There are a couple of complications that spring to mind. The different 
addressing modes and the fact that swp clobbers a register are the most 
immediate ones.

You'll need to modify at least the movqi insn patterns, memory constraints and 
the legitimate address stuff. I'm not sure about the clobber, that might need 
additional reload-related machinery.

Paul


Re: Modifying ARM code generator for elimination of 8bit writes - need help

2006-05-30 Thread Daniel Jacobowitz
On Tue, May 30, 2006 at 09:03:54PM +0100, Paul Brook wrote:
> > I found arm.md and the moveqi insns, but because of the different
> > addressing modes of strb and swpb, its not easy to make the change.
> > And there must be a compiler option for this, too.
> >
> > Could somebody please tell me how to implement this change?
> 
> Short answer is probably not.
> 
> There are a couple of complications that spring to mind. The different 
> addressing modes and the fact that swp clobbers a register are the most 
> immediate ones.
> 
> You'll need to modify at least the movqi insn patterns, memory constraints 
> and 
> the legitimate address stuff. I'm not sure about the clobber, that might need 
> additional reload-related machinery.

I suspect it would be better to make GCC do halfword stores instead
(read/modify/write).

-- 
Daniel Jacobowitz
CodeSourcery


Re: c++ regression in trunk

2006-05-30 Thread Jack Howarth
Mike,
   I've checked all of the object files in xplor-nih with...

nm -m *.o | c++filt | grep info | grep -v external

...and I get no hits suggesting it can't be a wrong visibility problem.
Is there some approach I can use to figure out if it is a throw across
a dylib or bundle boundary? My initial guess is that it isn't that either.
The .cc source files that show up in the backtrace (dinternal.cc,
dint-xplor.cc and dint-powell.cc) are all linked together into
libintVar.dylib using...

g++-4 -dynamiclib -flat_namespace -undefined suppress -single_module 
dinternal.o dint-atom.o dint-node.o dint-loop.o dint-step.o dint-powell.o 
dint-conmin.o dint-simplex.o dint-pc6.o dint-xplor.o publicIVM.o -o 
libintVar.dylib-lcrypto 
/System/Library/Frameworks/vecLib.framework/Versions/A/vecLib 

Also, I don't see this c++ regression when building xplor-nih with
the gcc/g++ from Xcode 2.3 and gfortran from gcc trunk or with
gcc/g++/gfortran from gcc 4.1.1. If it really were a problem with
a throw across a dylib wouldn't those versions break as well?
I noticed from the 4.2 change page that...

The configure variable enable-__cxa_atexit is now enabled by default for more 
targets. Enabling this variable is necessary in order for static destructors to 
be executed in the correct order, but it depends upon the presence of a 
non-standard C library in the target library in order to work. The variable is 
now enabled for more targets which are known to have suitable C libraries.

Is darwin one of those targets that changed at 4.2? If so is there a flag
to reverse that behavior for individual source files?
  Jack


Re: c++ regression in trunk

2006-05-30 Thread Mike Stump

On May 30, 2006, at 3:25 PM, Jack Howarth wrote:
...and I get no hits suggesting it can't be a wrong visibility  
problem.


I prefer the idea that it reduces the likelihood of such a problem.  :-)


Is there some approach I can use to figure out if it is a throw across
a dylib or bundle boundary?


Yes.  If you aren't using bundles forget about that issue.  The  
approach is to write a dylib and app that mirror the direction you  
are throwing.


For example, the obvious:

main() {
  try { dylib(); } catch(...) { printf("It worked"."); }
}

and in the dylib:

void dolib() {
  throw 1;
}

tests to see if a throw from a dylib into the app works.


I noticed from the 4.2 change page that...

The configure variable enable-__cxa_atexit is now enabled by default


Is darwin one of those targets that changed at 4.2? If so is there  
a flag

to reverse that behavior for individual source files?


Why ask me, when you can ask the compiler with documented -fuse-cxa- 
atexit and -fno-use-cxa-atexit flags and have it tell you with  
amazing accuracy?  Yes, Geoff has been playing in this area.  It had  
been off, and it later versions, it is on.  You can check the  
ChangeLog file and find:


2006-03-15  Geoffrey Keating  <[EMAIL PROTECTED]>

* config.gcc (*-*-darwin*): Don't build crt2.o for all  
Darwin ports.

Do switch on default_use_cxa_atexit.
(powerpc*-*-darwin*): Build crt2.o on powerpc.
* config/darwin-crt3.o: New.
* config/darwin.h (LINK_SPEC): If -shared-libgcc, make  
linker default
to 10.3.  Pass '-multiply_defined suppress' if crt3.o is in  
use.
(STARTFILE_SPEC): Add crt3.o when -shared-libgcc and  
appropriate

OS version.
* config/rs6000/t-darwin: Move crt2.o building to here.
* config/rs6000/darwin.h (C_COMMON_OVERRIDE_OPTIONS): Update
Mac OS version for using __cxa_get_exception_ptr.  Don't  
test versions

of __cxa_atexit.

and you can also test before and after it to see if it is related,  
though, in theory the -f flags should be enough.


Also, this list isn't the right place to obtain help with broken user  
code, just broken compiler code.  Ideally, we want you to first debug  
it and then ask here for help after broken user code is ruled out.   
If the app shows an issue with the new cxa code-gen, I'd hazard a  
guess that it is an app bug [ fingers crossed ].


Re: Modifying ARM code generator for elimination of 8bit writes - need help

2006-05-30 Thread Paul Brook
> > There are a couple of complications that spring to mind. The different
> > addressing modes and the fact that swp clobbers a register are the most
> > immediate ones.
> >
> > You'll need to modify at least the movqi insn patterns, memory
> > constraints and the legitimate address stuff. I'm not sure about the
> > clobber, that might need additional reload-related machinery.
>
> I suspect it would be better to make GCC do halfword stores instead
> (read/modify/write).

Does gcc have mechanisms for doing this, or would you have to hide it all 
inside the movqi pattern?
If the latter I don't see much difference in terms of implementation. strb and 
strh use different addressing modes :-)

Whether ldrh/strh are faster than swpb probably depends whether your cpu 
supports unaligned loads. It also means byte stores are no longer atomic, 
though I don't know whether that is important.

Paul


independent study

2006-05-30 Thread Matt Davis

Greetings,
I am working on my Master's Degree, and have started an independent
study this summer.  Two of my goals for this study are to
1 - Provide something useful for GCC
2 - Learn more about the innards of GCC so that I can be of benefit
towards the community

I figured what better way to get credit and accomplish a personal goal.
While I have barely begun my study, my advisor and I have elected to
continue with my previous independent study revolving around 
evolutionary programming.  While at first this might seem a bit of a
benefit for GCC optimizations, further thought contradicts that notion. 
 I still would like to provide help on GCC optimizations, and was

wondering if there is anyone willing to allow me on their team so that I
can accomplish my goals.  I am also wondering if evolutionary/genetic
algorithms do seem of benefit to one particular optimization construct,
that I might have over looked or not considered.  While the project page
does list some interesting projects, I am putting this out in hopes of
finding others that might provide a stable direction for me.
I do realize that there is a project out there now that uses 
evolutionary concepts to optimize GCC with the current available command 
line options.  While I think that is a great project, I would like to 
develop, or tweak, some type of internal aspect of GCC.


Thanks!

-Matt


Re: c++ regression in trunk

2006-05-30 Thread Jack Howarth
Mike,
   Actually the problem appears unrelated to cxa_atexit as neither
-fuse-cxa-atexit nor -fno-use-cxa-atexit eliminates the problem
with the throw aborting the program. 
   I do believe I have found a work-around to the problem which
identifies the nature of the issue as well. The xplor-nih program
(xplor) is linked with gcc (since it contains only a single object
file of c). However shared libraries of c++ and fortran code are
linked to the main xplor program as well. The main c routine in
xplor calls a fortran routine in one of its fortran shared libraries
which in turn calls and returns from c++ code in a c++ shared library.
  I am able to suppress the aborts on the throw in xplor-nih when
built under gcc 4.2 if I link the main xplor program with g++ rather
than gcc. Should this be considered a regression from gcc 4.0/4.1?
Since the c++ shared libraries are linked with g++ and the fortran
shared libraries are linked with gfortran, it seemed that the main
program (containing only a single object file from c code) should
be linkable with gcc (despite c++ and fortran shared libraries being
linked in to the main xplor program).
   Jack



Re: independent study

2006-05-30 Thread Ben Elliston
> I do realize that there is a project out there now that uses
> evolutionary concepts to optimize GCC with the current available
> command line options.  While I think that is a great project, I
> would like to develop, or tweak, some type of internal aspect of
> GCC.

I did a Masters project that extended this concept to re-ordering the
tree optimisers and it was certainly interesting work.  There are
plenty of unanswered questions, so if you would like, contact me by
private mail and I'll send you a copy of the paper.

Cheers, Ben


Can libcalls be nested?

2006-05-30 Thread Steven Bosscher
Hello,

In some places in the RTL optimizers we assume that libcalls are
never nested.  In other places we assume that they do nest.  In
the documentation, I can't find which assumption is right.  So I
am asking here :-)  Can libcalls be nested?

Thanks,

Gr.
Steven


Problem about gcc 4.1 + binutil 2.16.92 + glibc 2.4 + ARM EABI

2006-05-30 Thread Fengwei Yin

Hi,
When enable the gcc 4.1 with EABI support for ARM, I met such situation about
the alignment.

Here is my test case:

#include 
#include 

struct test {
char c1;
long long  n;
char c2;
};

int my_temp(void)
{
int i;
int j;
char str[2];
long long l1;
long long l2;
char str2;

printf("%s:\tj address: 0x%08x, str address:0x%08x\n",
__func__, &j, str);
printf("%s:\tl1 address:0x%08x, l2 address: 0x%08x\n",
__func__, &l1, &l2);
printf("%s:\tstr2 address:  0x%08x\n", __func__, &str2);
return 0;
}

main()
{
struct test t;

my_temp();

printf("%s:\ttest.c1 address:   0x%08x\n", __func__, &(t.c1));
printf("%s:\ttest.n address:0x%08x\n", __func__, &(t.n));
printf("%s:\ttest.c2 address:   0x%08x\n", __func__, &(t.c2));
}

With the gcc 4.1 + EABI for ARM(build optioni: -mabi=aapcs-linux -O2),
I get following output:

[EMAIL PROTECTED] /test]#./test.gnu.eabi
my_temp:j address:  0xbea85ac0, str address:0xbea85abe
my_temp:l1 address: 0xbea85ab0, l2 address: 0xbea85aa8
my_temp:str2 address:   0xbea85aa7
main:   test.c1 address:0xbea85ad8
main:   test.n address: 0xbea85ae0
main:   test.c2 address:0xbea85ae8

With the gcc 3.4.3 without EABI(build option: -O2), I get following output:
[EMAIL PROTECTED] /test]#./test.gnu
my_temp:j address:  0xbeb69bf8, str address:0xbeb69bf0
my_temp:l1 address: 0xbeb69be8, l2 address: 0xbeb69be0
my_temp:str2 address:   0xbeb69bdf
main:   test.c1 address:0xbeb69c10
main:   test.n address: 0xbeb69c18
main:   test.c2 address:0xbeb69c20

Please notice the address of the str and j. there IS NO memory hole
between therm
when using gcc 3.4.3 and there IS memory hole when using gcc 4.1. My question
is: Why there is difference there. And what is the root cause of the
difference (EABI
or gcc update)? Is there any gcc option to make them align?


Thanks & Regards
Yin, Fengwei