Re: OpenMP bug with gfortran when compile under Windows platform
[CCing the OpenMP experts] Henry, The -fopenmp option doesn't work under mingw32. Since I am the one building the Windows (mingw32) binary packages you downloaded, I'm rather interesting in getting it to work... So here are a few things we could sort out: 1. currently, using the -fopenmp options gives: $ gfortran -fopenmp a.f gfortran.exe: unrecognized option '-pthread' gfortran.exe: libgomp.spec: No such file or directory Could we have a clearer error message? (perhaps saying that openmp is not available on that platform) The current message is clearly... not clear for users! 2. I looked at pthreads win32 (http://sources.redhat.com/pthreads-win32/), an opensource thread support for win32, including mingw32. Not all POSIX functions are implemented, but a fair amount of them. I'll try to get libgomp compiling with against those, and report progress here. 3. why is libgomp building conditional on target triplet, and not on detecting a working pthread implementation? Thanks, FX
Re: OpenMP bug with gfortran when compile under Windows platform
On Tue, May 30, 2006 at 11:19:09AM +0200, Fran?ois-Xavier Coudert wrote: > [CCing the OpenMP experts] > > Henry, > > The -fopenmp option doesn't work under mingw32. Since I am the one > building the Windows (mingw32) binary packages you downloaded, I'm > rather interesting in getting it to work... So here are a few things > we could sort out: > > 1. currently, using the -fopenmp options gives: > > >$ gfortran -fopenmp a.f > >gfortran.exe: unrecognized option '-pthread' > >gfortran.exe: libgomp.spec: No such file or directory > > Could we have a clearer error message? (perhaps saying that openmp is > not available on that platform) The current message is clearly... not > clear for users! Then mingw32 should do something similar to config/i386/cygwin.h, which has /* Every program on cygwin links against cygwin1.dll which contains the pthread routines. There is no need to explicitly link them and the -pthread flag is not recognized. */ #undef GOMP_SELF_SPECS #define GOMP_SELF_SPECS "" > 2. I looked at pthreads win32 > (http://sources.redhat.com/pthreads-win32/), an opensource thread > support for win32, including mingw32. Not all POSIX functions are > implemented, but a fair amount of them. I'll try to get libgomp > compiling with against those, and report progress here. > > > 3. why is libgomp building conditional on target triplet, and not on > detecting a working pthread implementation? Most of the things are detected, only very few things are keyed on target triplet and in those cases it is desirable (e.g. arch specific assembly, etc.). Once you do 2., you just port libgomp to mingw32 + pthreads-win32 and assuming pthreads-win32 is sufficiently rich and not too buggy, it will just work. Jakub
[libmudflap] build warnings...
I just wanted to ping the list here on current gcc trunk libmudflap build warnings: ../../../gcc/libmudflap/mf-runtime.c:1706: warning: format '%06lu' expects type 'long unsigned int', but argument 15 has type '__suseconds_t' ../../../gcc/libmudflap/mf-runtime.c:1729: warning: format '%06lu' expects type 'long unsigned int', but argument 4 has type '__suseconds_t' ../../../gcc/libmudflap/mf-runtime.c:1998: warning: format '%06lu' expects type 'long unsigned int', but argument 6 has type '__suseconds_t' ../../../../gcc/libmudflap/mf-runtime.c:1706: warning: format '%06lu' expects type 'long unsigned int', but argument 15 has type '__suseconds_t' ../../../../gcc/libmudflap/mf-runtime.c:1729: warning: format '%06lu' expects type 'long unsigned int', but argument 4 has type '__suseconds_t' ../../../../gcc/libmudflap/mf-runtime.c:1998: warning: format '%06lu' expects type 'long unsigned int', but argument 6 has type '__suseconds_t' Are these something one simply has to accept or are something more deep lurking here? -- Cheers, /ChJ
Re: OpenMP bug with gfortran when compile under Windows platform
you just port libgomp to mingw32 + pthreads-win32 and assuming pthreads-win32 is sufficiently rich and not too buggy, it will just work. With the attached patch, I can compile libgomp with ../gcc/configure --prefix=/mingw --disable-nls --with-ld=/mingw/bin/ld --with-as=/mingw/bin/as --disable-werror --enable-bootstrap --enable-threads=posix --with-win32-nlsapi=unicode --host=i386-pc-mingw32 --enable-languages=c,fortran --enable-libgomp and the resulting compiler and generated executables seem to work (I tried a few C and Fortran toy codes). The main changes are to libgomp/config/posix/time.c, which used functions not available on mingw32. Would they be acceptable in this form (protected with #ifdef _WIN32)? If so, I'll do some more testing, and officially submit the patch. FX Index: libgomp/configure === --- libgomp/configure (revision 114196) +++ libgomp/configure (working copy) @@ -8397,7 +8397,9 @@ # Check for functions needed. -for ac_func in getloadavg clock_gettime + + +for ac_func in getloadavg clock_gettime gettimeofday sysconf do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 Index: libgomp/configure.ac === --- libgomp/configure.ac (revision 114196) +++ libgomp/configure.ac (working copy) @@ -162,7 +162,7 @@ [AC_MSG_ERROR([Pthreads are required to build libgomp])])]) # Check for functions needed. -AC_CHECK_FUNCS(getloadavg clock_gettime) +AC_CHECK_FUNCS(getloadavg clock_gettime gettimeofday sysconf) # Check for broken semaphore implementation on darwin. # sem_init returns: sem_init error: Function not implemented. Index: libgomp/config.h.in === --- libgomp/config.h.in (revision 114196) +++ libgomp/config.h.in (working copy) @@ -18,6 +18,9 @@ /* Define to 1 if you have the `getloadavg' function. */ #undef HAVE_GETLOADAVG +/* Define to 1 if you have the `gettimeofday' function. */ +#undef HAVE_GETTIMEOFDAY + /* Define to 1 if you have the header file. */ #undef HAVE_INTTYPES_H @@ -42,6 +45,9 @@ /* Define to 1 if the target supports __sync_*_compare_and_swap */ #undef HAVE_SYNC_BUILTINS +/* Define to 1 if you have the `sysconf' function. */ +#undef HAVE_SYSCONF + /* Define to 1 if you have the header file. */ #undef HAVE_SYS_LOADAVG_H Index: libgomp/config/posix/time.c === --- libgomp/config/posix/time.c (revision 114196) +++ libgomp/config/posix/time.c (working copy) @@ -48,32 +48,52 @@ double omp_get_wtime (void) { -#ifdef HAVE_CLOCK_GETTIME +#ifdef HAVE_GETTIMEOFDAY +# ifdef HAVE_CLOCK_GETTIME struct timespec ts; -# ifdef CLOCK_MONOTONIC +# ifdef CLOCK_MONOTONIC if (clock_gettime (CLOCK_MONOTONIC, &ts) < 0) -# endif +# endif clock_gettime (CLOCK_REALTIME, &ts); return ts.tv_sec + ts.tv_nsec / 1e9; -#else +# else struct timeval tv; gettimeofday (&tv, NULL); return tv.tv_sec + tv.tv_usec / 1e6; +# endif +#else +# ifdef _WIN32 + +#include + struct _timeb timebuf; + _ftime (&timebuf); + return (timebuf.time + (long)(timebuf.millitm) / 1e3); +# else +# error "Either clock_gettime or gettimeofday are required" +# endif #endif } double omp_get_wtick (void) { -#ifdef HAVE_CLOCK_GETTIME +#ifdef HAVE_SYSCONF +# ifdef HAVE_CLOCK_GETTIME struct timespec ts; -# ifdef CLOCK_MONOTONIC +# ifdef CLOCK_MONOTONIC if (clock_getres (CLOCK_MONOTONIC, &ts) < 0) -# endif +# endif clock_getres (CLOCK_REALTIME, &ts); return ts.tv_sec + ts.tv_nsec / 1e9; +# else + return 1.0 / sysconf(_SC_CLK_TCK); +# endif #else - return 1.0 / sysconf(_SC_CLK_TCK); +# ifdef _WIN32 + return 1e-3; +# else +# error "Either clock_getres or sysconf are required" +# endif #endif } Index: gcc/config/i386/mingw32.h === --- gcc/config/i386/mingw32.h (revision 114196) +++ gcc/config/i386/mingw32.h (working copy) @@ -108,3 +108,8 @@ /* Define as short unsigned for compatibility with MS runtime. */ #undef WINT_TYPE #define WINT_TYPE "short unsigned int" + +/* The mingw32 compiler doesn't know the -pthread option, but requires + explicitly linking the libpthread. */ +#undef GOMP_SELF_SPECS +#define GOMP_SELF_SPECS "-lpthread"
Re: OpenMP bug with gfortran when compile under Windows platform
On Tue, May 30, 2006 at 04:37:35PM +0200, Fran?ois-Xavier Coudert wrote: > >you just port libgomp to mingw32 + pthreads-win32 > >and assuming pthreads-win32 is sufficiently rich and not too buggy, it will > >just work. > > With the attached patch, I can compile libgomp with > > ../gcc/configure --prefix=/mingw --disable-nls --with-ld=/mingw/bin/ld > --with-as=/mingw/bin/as --disable-werror --enable-bootstrap > --enable-threads=posix --with-win32-nlsapi=unicode > --host=i386-pc-mingw32 --enable-languages=c,fortran --enable-libgomp > > and the resulting compiler and generated executables seem to work (I > tried a few C and Fortran toy codes). The main changes are to > libgomp/config/posix/time.c, which used functions not available on > mingw32. Would they be acceptable in this form (protected with #ifdef > _WIN32)? If so, I'll do some more testing, and officially submit the > patch. _WIN32 #ifdefs are just too ugly, additionally including a system header inside of a routine is a big no no. I think it would be much cleaner if you added config/mingw32/time.c instead and tweaked config.tgt, after all, the file only contains the 2 routines and you use completely different bodies for those routines on mingw32 than on any other target. Jakub
Re: IA-64 speculation patches have bad impact on ARM
Daniel Jacobowitz wrote: Hi Maxim and Vlad, I just tracked an ICE while building glibc for ARM to this patch, which introduced --param max-sched-extend-regions-iters with a default of two: http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00998.html ... The register variables and their initializations get hoisted all the way out of the first if. On ia64, with a million execution units to spare and a fat pipeline, this may make sense. On targets with a simpler execution model, though, it's pretty awful. If the condition (which we have no information on the likelihood of) is false, we've added lots of cycles for no gain. It's not like the scheduler was filling holes; the initializations were scheduled as early as possible because they had no dependencies. With the parameter turned back down to one, the testcase compiles, and the code looks sensible again. No, I wasn't able to work out why profiling was necessary to trigger this problem; I suspect it makes some register unavailable, but I'm not sure which. I didn't look into that further. What's your opinion? We could easily change the default of the parameter for ARM, but I assume there are other affected targets. I don't know if we need the extended region scheduling to be smarter, or if it should simply be turned off for some targets. Hi Daniel! Sorry for the delay, I needed time to investigate the cause of the problem. The real problem lies in the computation of the instruction priorities. ARM has fairly simple scheduling model: on each cycle insn standing first in the ready list gets scheduled. This behavior puts *all* the responsibility for the resulting schedule on how the ready list is arranged. The main decision factor of the sorting of the ready list is INSN_PRIORITY. Instructions in inner 'if' get somewhat greater priority then the instructions from the dominator block and hence get hoisted from their original block. The good solution for this case would be more precise evaluation of the insn priorities. This includes transformation of the insn priority from the region-scope to the block-scope value: e.g. in this case, while scheduling the first block, the priorities of the insns from the 'if'-block will be multiplied by probability of the 'then'-branch and, therefore, will be significantly lower than the priority of the insns from the current block. I've started to implement this idea some time ago, but never finished :( Anyway, this work is for stage 1 or 2 and for now I propose following fix: implement targetm.sched.reorder hook so that it will ensure that if there is an insn from the current block in the ready list, then insn from the other block won't stand first in the line (and, therefore, won't be chosen for schedule). I feel that this will be what you are calling 'filling holes'. Please find an example patch attached (arm.patch). While debugging the testcase I found two somewhat unrelated bugs in the handling of INSN_PRIORITY: first one is in the haifa-sched.c: priority (). When insn has no forward dependencies its priority is set to its latency. The bug occurs when insn has some deps and all of them get rejected by current_sched_info->contributes_to_priority () hook - in this case INSN_PRIORITY should also be initialized with insn latency, but present code misses that. The second one is not as critical as the first one. It is in haifa-sched.c: adjust_priority (). This function plainly calls the targetm.sched_adjust_priority () hook when insn is being added to the ready list. As I understand all targets assume this hook to be invoked once: after all priorities are are computed, but before insn is added to the ready list. But for insns with no dependencies from the source blocks this hook can be called many times - therefore priorities of that insns can become sensibly inadequate. The patch for these two small bugs is also attached (priority-bugs.patch) . Is it ok for trunk? If so I will repost it to gcc-patches list. Best regards, Maxim --- config/arm/arm.c(/gcc-local/trunk/gcc) (revision 19877) +++ config/arm/arm.c(/gcc-local/arm-bug/gcc)(revision 19877) @@ -52,6 +52,7 @@ #include "target-def.h" #include "debug.h" #include "langhooks.h" +#include "sched-int.h" /* Forward definitions of types. */ typedef struct minipool_nodeMnode; @@ -118,6 +119,9 @@ static void thumb_output_function_prolog static int arm_comp_type_attributes (tree, tree); static void arm_set_default_type_attributes (tree); static int arm_adjust_cost (rtx, rtx, rtx, int); +static void arm_reorder (rtx *, int); +static int arm_reorder1 (FILE *, int, rtx *, int *, int); +static int arm_reorder2 (FILE *, int, rtx *, int *, int); static int count_insns_for_constant (HOST_WIDE_INT, int); static int arm_get_strip_length (int); static bool arm_function_ok_for_sibcall (tree, tree); @@ -245,6 +249,12 @@ static bool arm_tls_symbol_p (rtx x); #undef TARGET_SCHED_ADJ
Re: IA-64 speculation patches have bad impact on ARM
> Maxim Kuvyrkov writes: Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that if Maxim> there is an insn from the current block in the ready list, then insn Maxim> from the other block won't stand first in the line (and, therefore, Maxim> won't be chosen for schedule). I feel that this will be what you are Maxim> calling 'filling holes'. Please find an example patch attached (arm.patch). What about all of the other GCC targets? If your patch changed the default behavior of the scheduler assumed by all other ports, you should fix the scheduler and modify the IA-64 port to get the behavior desired. David
Re: IA-64 speculation patches have bad impact on ARM
David Edelsohn wrote: >> Maxim Kuvyrkov writes: > > Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following > Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that > if > Maxim> there is an insn from the current block in the ready list, then insn > Maxim> from the other block won't stand first in the line (and, therefore, > Maxim> won't be chosen for schedule). I feel that this will be what you are > Maxim> calling 'filling holes'. Please find an example patch attached > (arm.patch). > > What about all of the other GCC targets? > > If your patch changed the default behavior of the scheduler > assumed by all other ports, you should fix the scheduler and modify the > IA-64 port to get the behavior desired. Exactly. I think this is a serious regression, and I would like to consider our options. Daniel has suggested changing the default value of the max-sched-extend-regions-iters param to 1. However, I think we should conservatively change it to zero, for now, and then use a target macro to allow IA64 to set it to two, and other ports to gradually turn this on if useful. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Successful gcc 4.1.1 build on alphaev68-dec-osf5.1b Tru64(c,c++,fortran,objc,treelang)
[EMAIL PROTECTED]:~#gcc -v Using built-in specs. Target: alphaev68-dec-osf5.1b Configured with: ../configure --host=alphaev68-dec-osf5.1b --enable-threads=posix --enable-languages=c,c++,fortran,objc,treelang --prefix=/usr/local --enable-version-specific-runtime-libs --enable-shared --enable-libgcj --enable-nls --enable-interpreter Thread model: posix gcc version 4.1.1 I had some problems with java so I turned it off. See my previous email: http://gcc.gnu.org/ml/gcc/2005-07/msg00601.html for f95 and java compilation. Sincerely, Stefano Curtarolo -- Prof. Stefano Curtarolo Assistant Professor of Materials Science Duke University, Dept. Mechanical Engineering and Materials Science 144 Hudson Hall, Box 90300, Durham, NC 27708-0300 phone 919-660-5506 [EMAIL PROTECTED] http://alpha.mems.duke.edu -- -- [This email was composed with renewable energy. When you are done reading this email, please dispose of it in an environmentally friendly manner, such as electronic composting.] --
Re: IA-64 speculation patches have bad impact on ARM
Mark Mitchell wrote: David Edelsohn wrote: Maxim Kuvyrkov writes: Maxim> Anyway, this work is for stage 1 or 2 and for now I propose following Maxim> fix: implement targetm.sched.reorder hook so that it will ensure that if Maxim> there is an insn from the current block in the ready list, then insn Maxim> from the other block won't stand first in the line (and, therefore, Maxim> won't be chosen for schedule). I feel that this will be what you are Maxim> calling 'filling holes'. Please find an example patch attached (arm.patch). What about all of the other GCC targets? If your patch changed the default behavior of the scheduler assumed by all other ports, you should fix the scheduler and modify the IA-64 port to get the behavior desired. Exactly. I think this is a serious regression, and I would like to consider our options. Daniel has suggested changing the default value of the max-sched-extend-regions-iters param to 1. However, I think we should conservatively change it to zero, for now, and then use a target macro to allow IA64 to set it to two, and other ports to gradually turn this on if useful. I am agree with this. Two months ago Maxim submitted patches which affects only ia64 except one thing affecting all targets - the patch which builds more scheduling regions and as consequence permits more aggressive interblock scheduling. Insn scheduling before the register allocation even without Maxim's patches is not safe when hard registers are used in RTL. It is a known bug (e.g. for x86_64) and it is in bugzilla. Jim Wilson wrote several possible solutions for this, no one is easy to implement except for switching off insn scheduling before RA (what is done for x86_64). But we can restore the state (probably safe for most programs) what was before Maxim's patch. So Maxim could you do this (of course you can save max-sched-extend-regions-iters value for ia64 because it is probably safe for targets with many registers). Vlad
Re: c++ regression in trunk
On May 29, 2006, at 1:17 PM, Jack Howarth wrote: In building xplor-nih against the gcc trunk, I noticed that there is a c++ related regression I'll let Andrew comment if it sounds like anything he's seen. I'd recommend a binary search to narrow down the translation unit and the compiler version that went bad if the prospects of actually trying to debug this are daunting. Off-hand, sounds like throw across dylib/bundle boundary problem, or someone using the wrong visibility on classes. You can check for the later by something like: nm -m *.o | c++filt | grep info | grep -v external and seeing if you get any hits. If so, they could be the cause of the problem. The first case might be easier to test in the small, as if it is totally busted, even the most trivial of programs will show the breakage.
Re: Freeing memory for basic-blocks and edges
On May 29, 2006, at 1:07 PM, sean yang wrote: For example, I know to allocate a chunk of memory to hold BB information is done by "ggc_alloc_cleared()". But after a function analysis/optimization is done, the memory should be freed. ggc_free (ptr); can be used, but, if you use it, you have to be absolutely certain that there doesn't exist a reachable pointer to that memory, anywhere. You must explicitly zero out any old pointers that pointed to the data, if those pointers would have been reachable otherwise.
A reload failure which I can't figure out
Hi. I have problems figuring out why reload gives up on this: reload failure for reload 1 ../../../cvssrc/gcc/gcc/libgcc2.c: In function '__moddi3': ../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: unable to find a register to spill in class 'DX_REGS' ../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: these are the reloads for insn # 425: Reload 0: reload_in (HI) = (reg:HI 4 d [+2 ]) reload_out (HI) = (reg:HI 0 c [199]) AX_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg:HI 4 d [+2 ]) reload_out_reg: (reg:HI 0 c [199]) Reload 1: reload_out (HI) = (scratch:HI) DX_REGS, RELOAD_FOR_OUTPUT (opnum = 3) reload_out_reg: (scratch:HI) ../../../cvssrc/gcc/gcc/libgcc2.c:1101: error: this is the insn: (insn 425 422 432 30 ../../../cvssrc/gcc/gcc/libgcc2.c:911 (parallel [ (set (reg:HI 0 c [199]) (mult:HI (reg:HI 4 d [+2 ]) (reg:HI 6 b [orig:629 __d0 ] [629]))) (clobber (scratch:HI)) (clobber (reg:CC 12 cc)) ]) 309 {*mulhi3} (nil) (expr_list:REG_UNUSED (reg:CC 12 cc) (expr_list:REG_UNUSED (scratch:HI) (expr_list:REG_DEAD (reg:HI 4 d [+2 ]) (expr_list:REG_UNUSED (reg:CC 12 cc) (expr_list:REG_UNUSED (scratch:HI) (nil))) ../../../cvssrc/gcc/gcc/libgcc2.c:1101: internal compiler error: in spill_failure, at reload1.c:1915 The *mulhi3 pattern is this: (define_insn "*mulhi3" [(set (match_operand:HI 0 "single_register_operand" "=a") (mult:HI (match_operand:HI 1 "single_register_operand" "%0") (match_operand:HI 2 "general_operand" "rm"))) (clobber (match_scratch:HI 3 "=d")) (clobber (reg:CC CC_REG))] "" "mulw\t%2" ) where the constraint "a" matches the register class AX_REGS consiting of (reg:HI 2 a), the constraint "d" matches the register class DX_REGS consiting of (reg:HI 4 d) and the predicate "single_register_operand" is: (define_predicate "single_register_operand" (and (match_operand 0 "register_operand") (ior (not (match_code "subreg")) (match_test "GET_MODE_SIZE (GET_MODE (SUBREG_REG (op))) <= UNITS_PER_WORD"))) ) In other words, like "register_operand" register but reject subregs of registers larger than a word - 16 bits. The insn seems perfectly possible to reload: (set (mem:HI stack-slot-d) (reg:HI 4 d)) (set (mem:HI stack-slot-a) (reg:HI 2 a)) (set (reg:HI 2 a) (mem:HI stack-slot-d)) (parallel [(set (reg:HI 2 a) (mult:HI (reg:HI 2 a) (reg:HI 6 b))) (clobber (reg:HI 4 d)) (clobber (reg:CC CC_REG))] (set (reg:HI 0 c) (reg:HI 2 a)) (set (reg:HI a) (mem:HI stack-slot-a)) (set (reg:HI d) (mem:HI stack-slot-d)) What am I missing? -- Rask Ingemann Lambertsen
Re: IA-64 speculation patches have bad impact on ARM
Maxim Kuvyrkov wrote: Anyway, this work is for stage 1 or 2 and for now I propose following fix: implement targetm.sched.reorder hook so that it will ensure that if there is an insn from the current block in the ready list, then insn from the other block won't stand first in the line (and, therefore, won't be chosen for schedule). I feel that this will be what you are calling 'filling holes'. Please find an example patch attached (arm.patch). Do you think this could be a default implementation of the reorder hook, like this? (After suitable performance testing. This looks O(n^2) to me). --- in defaults.h --- #ifndef TARGET_SCHED_REORDER #define TARGET_SCHED_REORDER default_reorder #endif #ifndef TARGET_SCHED_REORDER2 #define TARGET_SCHED_REORDER2 default_reorder2 #endif --- in targhooks.c --- int default_reorder (FILE *dump, int sched_verbose, rtx *ready, int *pn_ready, int clock_var) { default_reorder2 (dump, sched_verbose, ready, pn_ready, clock_var); if (targetm.sched.issue_rate) return targetm.sched.issue_rate (); else return 1; } int default_reorder2 (FILE *dump ATTRIBUTE_UNUSED, int sched_verbose ATTRIBUTE_UNUSED, rtx *ready, int *pn_ready, int clock_var ATTRIBUTE_UNUSED) { int n_ready = *pn_ready; /* This is correct for sched-rgn.c only. */ if (reload_completed && (flag_sched2_use_superblocks || flag_sched2_use_traces)) return 0; if (n_ready > 1) { basic_block bb = BLOCK_FOR_INSN (current_sched_info->prev_head); if (BLOCK_FOR_INSN (ready[n_ready - 1]) != bb) { int i; for (i = n_ready - 1; i >= 0; i--) { rtx insn = ready[i]; if (BLOCK_FOR_INSN (insn) != bb) continue; memcpy (ready + i, ready + i + 1, (n_ready - i - 1) * sizeof (*ready)); ready[n_ready - 1] = insn; break; } } } return 0; } Paolo
Re: IA-64 speculation patches have bad impact on ARM
On Tue, May 30, 2006 at 08:57:57PM +0200, Paolo Bonzini wrote: > int > default_reorder2 (FILE *dump ATTRIBUTE_UNUSED, > int sched_verbose ATTRIBUTE_UNUSED, > rtx *ready, int *pn_ready, > int clock_var ATTRIBUTE_UNUSED) > { > int n_ready = *pn_ready; > > /* This is correct for sched-rgn.c only. */ > if (reload_completed > && (flag_sched2_use_superblocks || flag_sched2_use_traces)) > return 0; > > if (n_ready > 1) > { > basic_block bb = BLOCK_FOR_INSN (current_sched_info->prev_head); > > if (BLOCK_FOR_INSN (ready[n_ready - 1]) != bb) > { > int i; > > for (i = n_ready - 1; i >= 0; i--) > { > rtx insn = ready[i]; > > if (BLOCK_FOR_INSN (insn) != bb) > continue; > > memcpy (ready + i, ready + i + 1, > (n_ready - i - 1) * sizeof (*ready)); > ready[n_ready - 1] = insn; > break; > } > } > } > > return 0; > } Not even a single comment - shame on you both! :-) If this is the solution we choose, can we make sure that there's at least a comment explaining what's going on? -- Daniel Jacobowitz CodeSourcery
Re: call_insns in RTX form--two questions
[ I wasn't going to answer this, because you've left out all the details that would be required for me to answer it well, but since you've asked me specifically to answer, I'll try. ] On May 29, 2006, at 9:09 PM, sean yang wrote: The first question is: If I want to find a BB that containing a specific function call (say 'foo'), is there an easy way in the RTX level? Yes or no, depending on what you consider easy and how you define the problem. Anyway, if you look at the rtl: (call_insn 8 28 9 0 (parallel [ (call (mem:SI (symbol_ref:SI ("&L_foo$stub") ) [0 S4 A8]) (const_int 32 [0x20])) (use (const_int 0 [0x0])) (clobber (reg:SI 65 lr)) ]) 368 {*call_nonlocal_sysv} (nil) (nil) (nil)) all you have to do it wonder the insns looking for a call_insn and then check to see if the form is (mem (symbol_ref X)), and then check to see if the X mentioned is the one your interested in. The above is a nice example, because it shows that the spelling of X need not be very portable, so doing this reliably can be `hard'. Further, the compiler can use a register and an indirect call, thus making strcmp for the function name non-trivial. If you want to wave your hands and not solve that problem, it is then easier, if not, then it can be impossible to solve. As impossible as: main() { extern (*bar)(): (*bar)(); } is to figure out. However, it need not be that hard, could only be as hard as checking the NOTEs structure and doing a strcmp on them. That is easy to do. Hint, if pr can show you the information you're interested in, then, trivially, the answer is yes. Just follow the rtl dumper to the info you want. The second one is: how can i get the order of different call foo in the final assembly code. The language gap causes this question to be hard for us to understand. If you told us what you're trying to do and why in 20 pages or more it would be easier to answer. Can I get it by dumping some information, say the order of the instruction link list, in RTL representation? First, you never defined what ordering you want. Do you want order as defined by the address of the instruction? Do you want source code order? Do you want order in the rtl for the function? For rtl order, yes, just walk the rtl in order, and the first call to foo is the first, and the second call to foo is the second... For source code order, you can approximate it by checking the debug line information associated with call in question. Just collect all that info, sort increasing by line, the first one is first, the second one is second and so on. This, as you can tell can't help with multiple calls to foo on the same line nor with compiles that lack debug information. Though, if one throws the column numbers into the debug information, then one can solve for multiple calls on one line as well. EXPR_LOCATION and EXPR_LINENO might have some of the data you're interested in. For address ordering, well, generally that is the same as rtl order, but hot/cold partitioning can alter that. I'll not answer this, as I'm hoping you don't care about the details.
Re: Modifying ARM code generator for elimination of 8bit writes - need help
> I found arm.md and the moveqi insns, but because of the different > addressing modes of strb and swpb, its not easy to make the change. > And there must be a compiler option for this, too. > > Could somebody please tell me how to implement this change? Short answer is probably not. There are a couple of complications that spring to mind. The different addressing modes and the fact that swp clobbers a register are the most immediate ones. You'll need to modify at least the movqi insn patterns, memory constraints and the legitimate address stuff. I'm not sure about the clobber, that might need additional reload-related machinery. Paul
Re: Modifying ARM code generator for elimination of 8bit writes - need help
On Tue, May 30, 2006 at 09:03:54PM +0100, Paul Brook wrote: > > I found arm.md and the moveqi insns, but because of the different > > addressing modes of strb and swpb, its not easy to make the change. > > And there must be a compiler option for this, too. > > > > Could somebody please tell me how to implement this change? > > Short answer is probably not. > > There are a couple of complications that spring to mind. The different > addressing modes and the fact that swp clobbers a register are the most > immediate ones. > > You'll need to modify at least the movqi insn patterns, memory constraints > and > the legitimate address stuff. I'm not sure about the clobber, that might need > additional reload-related machinery. I suspect it would be better to make GCC do halfword stores instead (read/modify/write). -- Daniel Jacobowitz CodeSourcery
Re: c++ regression in trunk
Mike, I've checked all of the object files in xplor-nih with... nm -m *.o | c++filt | grep info | grep -v external ...and I get no hits suggesting it can't be a wrong visibility problem. Is there some approach I can use to figure out if it is a throw across a dylib or bundle boundary? My initial guess is that it isn't that either. The .cc source files that show up in the backtrace (dinternal.cc, dint-xplor.cc and dint-powell.cc) are all linked together into libintVar.dylib using... g++-4 -dynamiclib -flat_namespace -undefined suppress -single_module dinternal.o dint-atom.o dint-node.o dint-loop.o dint-step.o dint-powell.o dint-conmin.o dint-simplex.o dint-pc6.o dint-xplor.o publicIVM.o -o libintVar.dylib-lcrypto /System/Library/Frameworks/vecLib.framework/Versions/A/vecLib Also, I don't see this c++ regression when building xplor-nih with the gcc/g++ from Xcode 2.3 and gfortran from gcc trunk or with gcc/g++/gfortran from gcc 4.1.1. If it really were a problem with a throw across a dylib wouldn't those versions break as well? I noticed from the 4.2 change page that... The configure variable enable-__cxa_atexit is now enabled by default for more targets. Enabling this variable is necessary in order for static destructors to be executed in the correct order, but it depends upon the presence of a non-standard C library in the target library in order to work. The variable is now enabled for more targets which are known to have suitable C libraries. Is darwin one of those targets that changed at 4.2? If so is there a flag to reverse that behavior for individual source files? Jack
Re: c++ regression in trunk
On May 30, 2006, at 3:25 PM, Jack Howarth wrote: ...and I get no hits suggesting it can't be a wrong visibility problem. I prefer the idea that it reduces the likelihood of such a problem. :-) Is there some approach I can use to figure out if it is a throw across a dylib or bundle boundary? Yes. If you aren't using bundles forget about that issue. The approach is to write a dylib and app that mirror the direction you are throwing. For example, the obvious: main() { try { dylib(); } catch(...) { printf("It worked"."); } } and in the dylib: void dolib() { throw 1; } tests to see if a throw from a dylib into the app works. I noticed from the 4.2 change page that... The configure variable enable-__cxa_atexit is now enabled by default Is darwin one of those targets that changed at 4.2? If so is there a flag to reverse that behavior for individual source files? Why ask me, when you can ask the compiler with documented -fuse-cxa- atexit and -fno-use-cxa-atexit flags and have it tell you with amazing accuracy? Yes, Geoff has been playing in this area. It had been off, and it later versions, it is on. You can check the ChangeLog file and find: 2006-03-15 Geoffrey Keating <[EMAIL PROTECTED]> * config.gcc (*-*-darwin*): Don't build crt2.o for all Darwin ports. Do switch on default_use_cxa_atexit. (powerpc*-*-darwin*): Build crt2.o on powerpc. * config/darwin-crt3.o: New. * config/darwin.h (LINK_SPEC): If -shared-libgcc, make linker default to 10.3. Pass '-multiply_defined suppress' if crt3.o is in use. (STARTFILE_SPEC): Add crt3.o when -shared-libgcc and appropriate OS version. * config/rs6000/t-darwin: Move crt2.o building to here. * config/rs6000/darwin.h (C_COMMON_OVERRIDE_OPTIONS): Update Mac OS version for using __cxa_get_exception_ptr. Don't test versions of __cxa_atexit. and you can also test before and after it to see if it is related, though, in theory the -f flags should be enough. Also, this list isn't the right place to obtain help with broken user code, just broken compiler code. Ideally, we want you to first debug it and then ask here for help after broken user code is ruled out. If the app shows an issue with the new cxa code-gen, I'd hazard a guess that it is an app bug [ fingers crossed ].
Re: Modifying ARM code generator for elimination of 8bit writes - need help
> > There are a couple of complications that spring to mind. The different > > addressing modes and the fact that swp clobbers a register are the most > > immediate ones. > > > > You'll need to modify at least the movqi insn patterns, memory > > constraints and the legitimate address stuff. I'm not sure about the > > clobber, that might need additional reload-related machinery. > > I suspect it would be better to make GCC do halfword stores instead > (read/modify/write). Does gcc have mechanisms for doing this, or would you have to hide it all inside the movqi pattern? If the latter I don't see much difference in terms of implementation. strb and strh use different addressing modes :-) Whether ldrh/strh are faster than swpb probably depends whether your cpu supports unaligned loads. It also means byte stores are no longer atomic, though I don't know whether that is important. Paul
independent study
Greetings, I am working on my Master's Degree, and have started an independent study this summer. Two of my goals for this study are to 1 - Provide something useful for GCC 2 - Learn more about the innards of GCC so that I can be of benefit towards the community I figured what better way to get credit and accomplish a personal goal. While I have barely begun my study, my advisor and I have elected to continue with my previous independent study revolving around evolutionary programming. While at first this might seem a bit of a benefit for GCC optimizations, further thought contradicts that notion. I still would like to provide help on GCC optimizations, and was wondering if there is anyone willing to allow me on their team so that I can accomplish my goals. I am also wondering if evolutionary/genetic algorithms do seem of benefit to one particular optimization construct, that I might have over looked or not considered. While the project page does list some interesting projects, I am putting this out in hopes of finding others that might provide a stable direction for me. I do realize that there is a project out there now that uses evolutionary concepts to optimize GCC with the current available command line options. While I think that is a great project, I would like to develop, or tweak, some type of internal aspect of GCC. Thanks! -Matt
Re: c++ regression in trunk
Mike, Actually the problem appears unrelated to cxa_atexit as neither -fuse-cxa-atexit nor -fno-use-cxa-atexit eliminates the problem with the throw aborting the program. I do believe I have found a work-around to the problem which identifies the nature of the issue as well. The xplor-nih program (xplor) is linked with gcc (since it contains only a single object file of c). However shared libraries of c++ and fortran code are linked to the main xplor program as well. The main c routine in xplor calls a fortran routine in one of its fortran shared libraries which in turn calls and returns from c++ code in a c++ shared library. I am able to suppress the aborts on the throw in xplor-nih when built under gcc 4.2 if I link the main xplor program with g++ rather than gcc. Should this be considered a regression from gcc 4.0/4.1? Since the c++ shared libraries are linked with g++ and the fortran shared libraries are linked with gfortran, it seemed that the main program (containing only a single object file from c code) should be linkable with gcc (despite c++ and fortran shared libraries being linked in to the main xplor program). Jack
Re: independent study
> I do realize that there is a project out there now that uses > evolutionary concepts to optimize GCC with the current available > command line options. While I think that is a great project, I > would like to develop, or tweak, some type of internal aspect of > GCC. I did a Masters project that extended this concept to re-ordering the tree optimisers and it was certainly interesting work. There are plenty of unanswered questions, so if you would like, contact me by private mail and I'll send you a copy of the paper. Cheers, Ben
Can libcalls be nested?
Hello, In some places in the RTL optimizers we assume that libcalls are never nested. In other places we assume that they do nest. In the documentation, I can't find which assumption is right. So I am asking here :-) Can libcalls be nested? Thanks, Gr. Steven
Problem about gcc 4.1 + binutil 2.16.92 + glibc 2.4 + ARM EABI
Hi, When enable the gcc 4.1 with EABI support for ARM, I met such situation about the alignment. Here is my test case: #include #include struct test { char c1; long long n; char c2; }; int my_temp(void) { int i; int j; char str[2]; long long l1; long long l2; char str2; printf("%s:\tj address: 0x%08x, str address:0x%08x\n", __func__, &j, str); printf("%s:\tl1 address:0x%08x, l2 address: 0x%08x\n", __func__, &l1, &l2); printf("%s:\tstr2 address: 0x%08x\n", __func__, &str2); return 0; } main() { struct test t; my_temp(); printf("%s:\ttest.c1 address: 0x%08x\n", __func__, &(t.c1)); printf("%s:\ttest.n address:0x%08x\n", __func__, &(t.n)); printf("%s:\ttest.c2 address: 0x%08x\n", __func__, &(t.c2)); } With the gcc 4.1 + EABI for ARM(build optioni: -mabi=aapcs-linux -O2), I get following output: [EMAIL PROTECTED] /test]#./test.gnu.eabi my_temp:j address: 0xbea85ac0, str address:0xbea85abe my_temp:l1 address: 0xbea85ab0, l2 address: 0xbea85aa8 my_temp:str2 address: 0xbea85aa7 main: test.c1 address:0xbea85ad8 main: test.n address: 0xbea85ae0 main: test.c2 address:0xbea85ae8 With the gcc 3.4.3 without EABI(build option: -O2), I get following output: [EMAIL PROTECTED] /test]#./test.gnu my_temp:j address: 0xbeb69bf8, str address:0xbeb69bf0 my_temp:l1 address: 0xbeb69be8, l2 address: 0xbeb69be0 my_temp:str2 address: 0xbeb69bdf main: test.c1 address:0xbeb69c10 main: test.n address: 0xbeb69c18 main: test.c2 address:0xbeb69c20 Please notice the address of the str and j. there IS NO memory hole between therm when using gcc 3.4.3 and there IS memory hole when using gcc 4.1. My question is: Why there is difference there. And what is the root cause of the difference (EABI or gcc update)? Is there any gcc option to make them align? Thanks & Regards Yin, Fengwei