Fortran regressions on Cygwin_NT
The failures below have all come up in the last few days using GNU Fortran (GCC) 4.3.0 20070815 (experimental) on Cygwin_NT/amd64 Cheers Paul FAIL: gfortran.dg/g77/980310-3.f (internal compiler error) FAIL: gfortran.dg/g77/980310-3.f (test for excess errors) Running /svn/trunk/gcc/testsuite/gfortran.dg/gomp/gomp.exp ... Running /svn/trunk/gcc/testsuite/gfortran.dg/vect/vect.exp ... Running /svn/trunk/gcc/testsuite/gfortran.fortran-torture/compile/compile.exp .. . Running /svn/trunk/gcc/testsuite/gfortran.fortran-torture/execute/execute.exp .. . FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O0 FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O1 FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O2 FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O3 -fo mit-frame-pointer FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O3 -fo mit-frame-pointer -funroll-loops FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O3 -fo mit-frame-pointer -funroll-all-loops -finline-functions FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O3 -g FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -Os
Re: Fortran regressions on Cygwin_NT
> FAIL: gfortran.dg/g77/980310-3.f (internal compiler error) > FAIL: gfortran.dg/g77/980310-3.f (test for excess errors) I saw this one on x86_64-linux with -m32, and filed it as PR33074. I asked about it on IRC yesterday, and if I understood Andrew Pinksi, it probably is a middle-end problem, as people have been messing with reload recently. > FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 execution, -O0 This one apparently appeared between rev. 127178 and 2007-08-06 (see http://gcc.gnu.org/ml/gcc-testresults/2007-08/msg00161.html and http://gcc.gnu.org/ml/gcc-testresults/2007-08/msg00278.html; there is no revision number for the second one), and it is also seen on a few platforms. It probably was introduced by me (recent NINT patch) and fixed as per http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00902.html FX
Re: Fortran regressions on Cygwin_NT
Paul Thomas wrote: > FAIL: gfortran.dg/g77/980310-3.f (internal compiler error) > FAIL: gfortran.dg/g77/980310-3.f (test for excess errors) I get the same error on x86-64/openSUSE with "-m32 -O" with -m64 and without "-O" it works. FX reported it yesterday as PR 33074. > FAIL: gfortran.fortran-torture/execute/intrinsic_integer.f90 > execution, -O0 Works here (with both -m32 and -m64; incl. under valgrind). I only get a failure for random_7.f90 (PR33077). Tobias
Re: GCC 4.3.0 Status Report (2007-08-09)
> Jan Hubicka wrote: > > > One thing I would like to see in is the sharing checker. The criteria > > of bootstrap/regtesting on primary platforms is almost met now with > > exception of regmove pass that I sent patch for some time ago. > > http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01441.html > > I will do re-testing now and see if some new problems has appeared. > > Thank you for bringing this up. I'd let to get the checker in too. > But, I don't really understand the regrename.c patch. Are you saying > that regrename.c is broken, and that we need to make these copies > because of a real bug? Or just to make the checker happy? If the Introducing wrong sharing is real bug :) But I know of no testcase where it leads to ICE or produce wrong code without checker. Regrename is run late, sharing is introduced just for complex instruction patterns and not too many passes afterwards cares about sharing. The copying occurs only when nontrivial RTX expressions are matched that happens generally only in combiner patterns dealing with arithmetic and corresponding set of flags that are not terribly common, so it is sub 1% memory use growth on combine.c and PPC, 0% on i386. However I am no longer sure I fully understand why the sharing is needed at first place - regrename seems to have later mechanizm to deal with match_dup and it seems to me that it only can result in mismatch when there was invalid sharing before regrename introduced (so updating the insn caused one copy of the matched RTX to be alterned but no other copy). I am now re-testing alternate patch that simply disables the code introducing sharing in a hope that it will was just symptomatic fix for sharing issue orignally and it will simply pass now. I will know results tonight. Honza > latter, have you measured the compile-time and memory usage to see what > impact that has? We'd like to avoid making the compiler slower just to > make the checker happy -- but, of course, it might be worth a small hit > to get the checking benefit. > > Thanks, > > -- > Mark Mitchell > CodeSourcery > [EMAIL PROTECTED] > (650) 331-3385 x713
bootstrapping with -fopenmp
Hi, I'm trying to bootstrap (parloop branch) with -ftree-parallelize-loops=4, which requires also -fopenmp. I'm using: make BOOTCFLAGS="-O2 -ftree-parallelize-loops=4 -fopenmp" bootstrap -j 16 I'm failing at the begining of stage2 because the compiler can't find libgomp.spec How do I bootstrap correctly with fopenmp? Thanks, Razya
treelang: can we replace 'unsigned char *chars' by 'char *chars'?
Hi All, In file treelang.h structure token_part is defined as follows: struct token_part GTY(()) { location_t location; unsigned int charno; unsigned int length; /* The value. */ const unsigned char *chars; <-- HERE }; 'unsigned char *chars' is used instead of just 'char *chars'. Is-there any reason (speed, memory) why 'unsigned' is used? I am building an autogenerated version of 'treelang' and I am trying to generate directly 'tree' node in file parse.y. That is the reasoin why I am asking. Thanks, Laurent
Re: bootstrapping with -fopenmp
Razya Ladelsky wrote: Hi, I'm trying to bootstrap (parloop branch) with -ftree-parallelize-loops=4, which requires also -fopenmp. I'm using: make BOOTCFLAGS="-O2 -ftree-parallelize-loops=4 -fopenmp" bootstrap -j 16 I'm failing at the begining of stage2 because the compiler can't find libgomp.spec How do I bootstrap correctly with fopenmp? You have to add bootstrap=true to libgomp, and regenerate Makefile.in. Paolo
RE: gcc on SCO
Dave Korn wrote: > But consider also > http://gcc.gnu.org/svn/gcc/trunk/README.SCO Which calls them "not a serious threat." I hadn't been closely following this, but that sure seems to be the case given last week's ruling. http://arstechnica.com/news.ars/post/20070812-sco-never-owned-unix-copyr ights-owes-novell-95-percent-of-unix-royalties.html http://en.wikipedia.org/wiki/Sco_group#SCO-Linux_lawsuits_and_controvers ies gsw
Re: RFC: Simplify rules for ctz/clz patterns and RTL
On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote: > Is popcount really slow on PowerPC? (Compared to clz?) popcount is really popcount in bytes and then you do a multiple to get the real popcount. This is why it is slower than count leading zeros. Also popcount does not exist in most powerpc's while count leading zeros exist in all. Thanks, Andrew Pinski
Re: RFC: Simplify rules for ctz/clz patterns and RTL
Andrew Pinski wrote: On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote: Is popcount really slow on PowerPC? (Compared to clz?) popcount is really popcount in bytes and then you do a multiple to get the real popcount. This is why it is slower than count leading zeros. Also popcount does not exist in most powerpc's while count leading zeros exist in all. Makes sense. I don't suppose I could persuade you to teach rs6000 RTX_COSTS about clz and popcount...? zw
Re: RFC: Simplify rules for ctz/clz patterns and RTL
Segher Boessenkool wrote: * I would like to do the same for __builtin_ctz, but there is a catch. The synthetic ctz sequence in terms of popcount (as presently implemented by ia64.md, and potentially usable for at least i386 and rs6000 as well if moved to optabs.c) produces the canonical behavior at zero, but the synthetic sequence in terms of clz (as presently implemented by optabs.c) produces the value -1 at zero. I suppose you're using (assuming 32-bit) ctz(x) := 31 - clz(x & -x) now, which gives -1 for 0; and the version you're looking for is ctz(x) := 32 - clz(~x & (x-1)) which gives 32 for 0. Thanks! That's, unfortunately, one more instruction, although I guess a lot of chips have "a & ~b" as one operation. What does the popcount version look like? Never seen that before, but I think it will be really expensive on PowerPC. ctz(x) := popcount(~x & (x-1)) Just the same thing as your version of the ctz-as-clz operation, but without the final adjustment. It looks like ~x & (x-1) turns any number into 000...111... where the boundary between zeroes and ones lies at the lowest 1 in the original. Is popcount really slow on PowerPC? (Compared to clz?) Ideally one would choose between the two expansions based on RTL costs, but the only architectures it matters for are i386 and powerpc, and neither of them define the cost of either clz or popcount. zw
Re: RFC: Simplify rules for ctz/clz patterns and RTL
Joern Rennecke wrote: The score, sh and sparc instructions may or may not display canonical behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was not able to find documentation of the relevant instruction. The operation the nsb instruction of the SHmedia instruction set performs is 'count number of sign bit copies'. [...] It sounds like the SH should probably be lumped in with the x86 as not doing "canonical behavior". Conveniently enough for my grand plan, it already uses an UNSPEC for the actual instruction :-) What is the result of the instruction for (64-bit) all-bits-zero or all-bits-one? 64? Assuming so, it occurs to me that the result of an unsigned clz() on any negative 64-bit value will be zero; thus, you could get a "canonical" clz out of nsb by doing (pseudo-assembly) mov result, 0 cmp/pz arg bf 1f nsb result, arg 1: Similarly, the x & (x-1) operation used to set up for ctz/ffs in terms of clz will leave the high bit set *only* for x == 0x8000 ; which can be tested for as x == (x&(x-1)) and the nsb skipped. Would these sequences be slower than the current logic? The ARC700 has a NORM instruction, which again counts the number of sign bit copies. There is a variant NORM.F which sets the N flag if the input is negative. Sorry, I don't recognize the ARC700 - which GCC back end is that? It might be worth teaching optabs.c about sign-bit-count operations, but only if we have more than one architecture that can use it. zw
Re: Announce: VCG support for Graph::Easy
Moin, On Sunday 12 August 2007 20:11:34 Tels wrote: > Moin, The signature on my email was bad/broken, when it came back to me from the mailing-list. Did this happen to anybody else? Since this never happened to me before, here is another email, as test. Let's see if the signature is still bad (e.g. the list garbles my text somehow). Sorry for the noise, Tels -- Signed on Sun Aug 12 23:44:32 2007 with key 0x93B84C15. View my photo gallery: http://bloodgate.com/photos PGP key on http://bloodgate.com/tels.asc or per email. "Duke Nukem Forever is a 1999 game and we think that timeframe matches very well with what we have planned for the game." -- George Broussard, 1998 (http://tinyurl.com/6m8nh) pgpoFnI2AQdq2.pgp Description: PGP signature
Re: RFC: Simplify rules for ctz/clz patterns and RTL
I think the cost would be something like: Index: rs6000.c === --- rs6000.c(revision 127484) +++ rs6000.c(working copy) @@ -20292,10 +20292,15 @@ *total += COSTS_N_INSNS (2); return false; +case CTZ: case FFS: *total = COSTS_N_INSNS (4); return false; +case POPCOUNT: + *total = COSTS_N_INSNS (3); + return false; + case NOT: if (outer_code == AND || outer_code == IOR || outer_code == XOR) { @@ -20305,6 +20310,7 @@ /* FALLTHRU */ case AND: +case CLZ: case IOR: case XOR: case ZERO_EXTRACT:
Re: RFC: Simplify rules for ctz/clz patterns and RTL
I suppose you're using (assuming 32-bit) ctz(x) := 31 - clz(x & -x) now, which gives -1 for 0; and the version you're looking for is ctz(x) := 32 - clz(~x & (x-1)) which gives 32 for 0. Thanks! That's, unfortunately, one more instruction, although I guess a lot of chips have "a & ~b" as one operation. Yes, it's exactly the same cost on PowerPC, and on most other RISC architectures. It looks like ~x & (x-1) turns any number into 000...111... where the boundary between zeroes and ones lies at the lowest 1 in the original. Exactly. "To the right of the lowest 1". Is popcount really slow on PowerPC? (Compared to clz?) Ideally one would choose between the two expansions based on RTL costs, but the only architectures it matters for are i386 and powerpc, and neither of them define the cost of either clz or popcount. Andrew answered this already. Adding clz/popcount to the cost tables seems like a good idea, yes. Segher
Re: RFC: Simplify rules for ctz/clz patterns and RTL
> Zack Weinberg writes: Zack> Makes sense. I don't suppose I could persuade you to teach rs6000 Zack> RTX_COSTS about clz and popcount...? Sure. It's not that difficult to add to the table. David
Re: RFC: Simplify rules for ctz/clz patterns and RTL
> > Is popcount really slow on PowerPC? (Compared to clz?) Ideally one > would choose between the two expansions based on RTL costs, but the only > architectures it matters for are i386 and powerpc, and neither of them > define the cost of either clz or popcount. Of course adding a popcount/clz cost into i386 cost tables is easy and probably most correct thing to do :) Honza > > zw
Re: Announce: VCG support for Graph::Easy
On Sun, 2007-08-12 23:45:09 +0200, Tels <[EMAIL PROTECTED]> wrote: > > The signature on my email was bad/broken, when it came back to me from the > mailing-list. Did this happen to anybody else? > > Since this never happened to me before, here is another email, as test. > Let's see if the signature is still bad (e.g. the list garbles my text > somehow). This unfortunately happens regularly on this list. I don't think if this is considered a problem or only annoying. What actually *is* a problem is that you're most probably using a non-working "From: " header. Or do you actually read nospam-abuse? MfG, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED] +49-172-7608481 Signature of: ...und wenn Du denkst, es geht nicht mehr, the second : kommt irgendwo ein Lichtlein her. signature.asc Description: Digital signature
Re: RFC: Simplify rules for ctz/clz patterns and RTL
I think the cost would be something like: +case POPCOUNT: + *total = COSTS_N_INSNS (3); + return false; Is that the cost when using popcountb? It is a lot more expensive when that instruction isn't available (like on most current machines). The rest (i.e. CLZ, CTZ) looks good to me. Segher
Re: Announce: VCG support for Graph::Easy
Moin, On Wednesday 15 August 2007 21:30:16 Jan-Benedict Glaw wrote: > On Sun, 2007-08-12 23:45:09 +0200, Tels <[EMAIL PROTECTED]> wrote: > > The signature on my email was bad/broken, when it came back to me from > > the mailing-list. Did this happen to anybody else? > > > > Since this never happened to me before, here is another email, as test. > > Let's see if the signature is still bad (e.g. the list garbles my text > > somehow). > > This unfortunately happens regularly on this list. I don't think if > this is considered a problem or only annoying. It happened to my second mail, too, and lazy inspection shows that probably just some lines are wrapped. Still, very annoying as it breaks my signatures and I consider it a problem. (Of course, apart from this announcement, I don't intent to post much on this list, except maybe if someone asks me a VCG related question.) > What actually *is* a problem is that you're most probably using a > non-working "From: " header. Or do you actually read nospam-abuse? Of course. Why should I not read it? All the best, Tels -- Signed on Wed Aug 15 22:02:13 2007 with key 0x93B84C15. View my photo gallery: http://bloodgate.com/photos PGP key on http://bloodgate.com/tels.asc or per email. "We have problems like this all of the time," Kirk said, trying to reassure me. "Sometimes its really hard to get things burning." -- http://tinyurl.com/qmg5 pgpNf1B9yb0mV.pgp Description: PGP signature
Re: RFC: Simplify rules for ctz/clz patterns and RTL
> Segher Boessenkool writes: >> I think the cost would be something like: >> +case POPCOUNT: >> + *total = COSTS_N_INSNS (3); >> + return false; Segher> Is that the cost when using popcountb? It is a lot more Segher> expensive when that instruction isn't available (like on Segher> most current machines). Yes, but do we even create POPCOUNT rtx if the insn isn't supported? Wouldn't we expand or create libcall early? David
Re: RFC: Simplify rules for ctz/clz patterns and RTL
I think the cost would be something like: +case POPCOUNT: + *total = COSTS_N_INSNS (3); + return false; Segher> Is that the cost when using popcountb? It is a lot more Segher> expensive when that instruction isn't available (like on Segher> most current machines). Yes, but do we even create POPCOUNT rtx if the insn isn't supported? Wouldn't we expand or create libcall early? I don't know, there's only one way to find out... :-) Segher
Re: RFC: Simplify rules for ctz/clz patterns and RTL
> Segher Boessenkool writes: >> Yes, but do we even create POPCOUNT rtx if the insn isn't >> supported? Wouldn't we expand or create libcall early? Segher> I don't know, there's only one way to find out... :-) I did check. Didn't you? David
Re: RFC: Simplify rules for ctz/clz patterns and RTL
On Wed, Aug 15, 2007 at 11:55:02AM -0700, Zack Weinberg wrote: > Joern Rennecke wrote: > >The operation the nsb instruction of the SHmedia instruction set performs > >is 'count number of sign bit copies'. > >[...] > > It sounds like the SH should probably be lumped in with the x86 as not > doing "canonical behavior". Conveniently enough for my grand plan, it > already uses an UNSPEC for the actual instruction :-) > > What is the result of the instruction for (64-bit) all-bits-zero or > all-bits-one? 64? No, it is 63. There is one essential sign bit and 63 more copies. > Assuming so, it occurs to me that the result of an > unsigned clz() on any negative 64-bit value will be zero; thus, you > could get a "canonical" clz out of nsb by doing (pseudo-assembly) > > mov result, 0 > cmp/pz arg > bf 1f > nsb result, arg > 1: We are talking about SHmedia code here. cmp/pz and bf are not SHmedia instructions. Loading a zero into result would be movi 0,result . If you want to special-case the negative input, that would be: shari arg,31,tmp nsbarg,result cmvne tmp,tmp,result addi result,1,result > Similarly, the x & (x-1) operation It's x ^ (x-1) (xor) or x &~(x-1) (andc) > used to set up for ctz/ffs in terms > of clz will leave the high bit set *only* for x == 0x8000 > ; which can be tested for as x == (x&(x-1)) and the nsb skipped. > > Would these sequences be slower than the current logic? currently we have for ffs: addi arg,-1,tmp xorarg,tmp,tmp shlri tmp,1,tmp nsbtmp,tmp addi tmp,-64,tmp cmveq arg,r63,tmp subr63,tmp,result Using the above sequence, we get the more register-hungry: addi arg,-1,tmp xorarg,tmp,tmp shari tmp,31,tmp2 nsbtmp,tmp cmvne tmp2,tmp2,tmp addi tmp,1-64,tmp subr63,tmp,result you propose: pt after_nsb,trtmp addi arg,-1,tmp andc arg,tmp,tmp movi -1,tmp2 beqarg,tmp,trtmp nsbtmp,tmp2 after_nsb: addi tmp2,-63,tmp subr63,tmp,result or is that: pt after_nsb,trtmp addi arg,-1,tmp xorarg,tmp,tmp bgtr63,tmp,trtmp nsbtmp,tmp after_nsb: addi tmp,-63,tmp subr63,tmp,result At any rate, the introduction of the branch makes the code worse. But for the ARC, it would make an interesting shortcut. Although norm can't be conditionalized, we can use the -1 from the xor to save on a long immediate for 32 bit ffs. sub_s tmp,arg,1 xor.f tmp,tmp,arg ; for -Os this can be xor_s and norm result,tmp ; then norm.f produces the flag. mov.mi result,tmp rsub result,31,result > >The ARC700 has a NORM instruction, which again counts the number of > >sign bit copies. There is a variant NORM.F which sets the N flag if the > >input is negative. > > Sorry, I don't recognize the ARC700 - which GCC back end is that? It belongs in config/arc ; however, proper ARC700 support is not in the FSF mainline yet. We are working on it. > It > might be worth teaching optabs.c about sign-bit-count operations, but > only if we have more than one architecture that can use it. The NORM instruction is also available as an optional extension operation for ARCtangent-A5 and ARC600.
gcc-4.2-20070815 is now available
Snapshot gcc-4.2-20070815 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.2-20070815/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.2 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_2-branch revision 127526 You'll find: gcc-4.2-20070815.tar.bz2 Complete GCC (includes all of below) gcc-core-4.2-20070815.tar.bz2 C front end and core compiler gcc-ada-4.2-20070815.tar.bz2 Ada front end and runtime gcc-fortran-4.2-20070815.tar.bz2 Fortran front end and runtime gcc-g++-4.2-20070815.tar.bz2 C++ front end and runtime gcc-java-4.2-20070815.tar.bz2 Java front end and runtime gcc-objc-4.2-20070815.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.2-20070815.tar.bz2The GCC testsuite Diffs from 4.2-20070627 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.2 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.