Nonzero result when left-shift greater than width of unsigned type
What's going on here? I'm expecting the answer 0, but get 2. #include int main(){ unsigned x=1; printf("%u\n",(x<<33)); /* outputs "2" on gcc 4.1.2 on x86_32 */ /* [#4] The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1*2^E2, reduced modulo one more than the maximum value representable in the result type. http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.htm */ return 0; }
Re: Nonzero result when left-shift greater than width of unsigned type
On Fri, Apr 13, 2007 at 05:38:03AM -0400, Ken Takusagawa wrote: > What's going on here? I'm expecting the answer 0, but get 2. > > #include > int main(){ > unsigned x=1; > printf("%u\n",(x<<33)); > /* outputs "2" on gcc 4.1.2 on x86_32 */ > > /* >[#4] The result of E1 << E2 is E1 left-shifted E2 bit >positions; vacated bits are filled with zeros. If E1 has an >unsigned type, the value of the result is E1*2^E2, reduced >modulo one more than the maximum value representable in the >result type. > >http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.htm > */ Read from just 1 paragraph above the last sentence: [#3] The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative | or is greater than or equal to the width of the promoted left operand, the behavior is undefined. Ciao, Marcus
Re: Nonzero result when left-shift greater than width of unsigned type
On 4/13/07, Marcus Meissner <[EMAIL PROTECTED]> wrote: On Fri, Apr 13, 2007 at 05:38:03AM -0400, Ken Takusagawa wrote: > What's going on here? I'm expecting the answer 0, but get 2. > > #include > int main(){ > unsigned x=1; > printf("%u\n",(x<<33)); > /* outputs "2" on gcc 4.1.2 on x86_32 */ > > /* >[#4] The result of E1 << E2 is E1 left-shifted E2 bit >positions; vacated bits are filled with zeros. If E1 has an >unsigned type, the value of the result is E1*2^E2, reduced >modulo one more than the maximum value representable in the >result type. > >http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.htm > */ Read from just 1 paragraph above the last sentence: [#3] The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative | or is greater than or equal to the width of the promoted left operand, the behavior is undefined. Ciao, Marcus I see, so in this case, the undefined behavior happens to return "2".
RE: Call to arms: testsuite failures on various targets
On 12 April 2007 22:22, FX Coudert wrote: > Hi all, > Note2: I also omitted a couple of gfortran.dg/secnds.f failures; this > testcase should be reworked I was about to report that myself! Both secnds.f /and/ secnds-1.f have some kind of race condition or indeterminacy. cheers, DaveK -- Can't think of a witty .sigline today
Re: [MIPS] MADD issue
(define_insn "adddi3_internal_1" [(set (match_operand:DI 0 "register_operand" "=d,&d") (plus:DI (match_operand:DI 1 "register_operand" "0,d") (match_operand:DI 2 "register_operand" "d,d"))) (clobber (match_operand:SI 3 "register_operand" "=d,d"))] "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" { return (REGNO (operands[0]) == REGNO (operands[1]) && REGNO (operands[0]) == REGNO (operands[2])) ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" : "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; } This should be a post-reload (i.e. predicated on reload_completed) split, I think. Paolo
Re: Call to arms: testsuite failures on various targets
> * powerpc-apple-darwin8.5.0: gfortran.dg/edit_real_1.f90 I don't see these failures on my weekly snapshot build on OSX 10.3.9 (nor in a month old build on OSX 10.4.8 or 9, cannot remember). Could it be related to 10.4.5 gcc failures gcc.dg/torture/builtin-pow-mpfr-1.c and gcc.dg/torture/builtin-sin-mpfr-1.c I don't see either with my builds? > Note2: I also omitted a couple of gfortran.dg/secnds.f failures; this > testcase should be reworked see that too now and then. I have also done the following experiment with large_real_kind_2.F90: (1) compiled with -S (2) edited large_real_kind_2.s to add $LDBL128 where appropriate (e.g., .indirect_symbol _expl$LDBL128) (3) assembled the result (4) linked the object and the executable passed the tests. So it should not be too difficult for someone who knows what to do to fix this part of the problem. Making large_real_kind_form_io_2.f90 to works will probably requires more work because, as far as I understand the problem, the corresponding changes have to be done in libgfortran. I don't have the knowledge, nor the time, to do the change myself, but I can test patches. Dominique
Re: [MIPS] MADD issue
"Fu, Chao-Ying" <[EMAIL PROTECTED]> writes: > After tracing GCC 4.x to see why MADD is not generated for MIPS32, > I found out the main issue is that the pattern "adddi3" > is not available for MIPS32. Because the missing > of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate > RTL insns. This leads to that the combining phase fails > to combine RTL insns to a single madd pattern. > > Could we enable "adddi3" for MIPS32 in GCC 4.x? Or is there a > better way to generate MADD? Thanks a lot! The problem with: > Ex: (mips.md in GCC 3.4) > (define_expand "adddi3" > [(parallel [(set (match_operand:DI 0 "register_operand" "") >(plus:DI (match_operand:DI 1 "register_operand" "") > (match_operand:DI 2 "arith_operand" ""))) > (clobber (match_dup 3))])] > "TARGET_64BIT || (!TARGET_DEBUG_G_MODE && !TARGET_MIPS16)" > { > > > (define_insn "adddi3_internal_1" > [(set (match_operand:DI 0 "register_operand" "=d,&d") > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > (match_operand:DI 2 "register_operand" "d,d"))) >(clobber (match_operand:SI 3 "register_operand" "=d,d"))] > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > { > return (REGNO (operands[0]) == REGNO (operands[1]) > && REGNO (operands[0]) == REGNO (operands[2])) > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > : > "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > } > [(set_attr "type" "darith") >(set_attr "mode" "DI") >(set_attr "length" "16")]) ...this was that it tended to be very poor for the additions themselves. When optabs.c implements the additions instead, the early RTL optimisers get to see the individual instructions, and are often able to handle constant or part-constant operands better. This led to a noticable size improvement when I tested it originally. (I imagine the effects are even better now, thanks to the subreg lowering pass.) See: http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00947.html for the patch that made this change, and some rationale. As far as madd goes, I think it would be better to either (a) get combine to handle this situation or (b) get expand to generate a fused multiply-add from the outset. (b) sounds like it might be useful in its own right. At the moment we treat the generation of floating-point multiply-adds as an optimisation, but in some applications it's critical not to round the intermediate result. (I don't know if there's a bugzilla entry about this.) If we treated fused multiply-add as a primitive operation, we could extend it to integer types too. In this case we'd also need to handle widening multiplications, but we already need to do that for stand-alone multiplications. Just random musings, and probably not the answer you wanted to hear, sorry. Richard
Re: Recent dataflow branch SPEC2000 benchmarking
Steven Bosscher wrote: On 4/12/07, Steven Bosscher <[EMAIL PROTECTED]> wrote: On 4/12/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: > An interesting observation is that the more hard registers the processor > has, the bigger slowdown is. Although it might be a coincidence. Yes, I noticed this too. I don't believe this is a coincidence. It's the first thing I was planning to look into, in fact. The problem seems to have to do with how the RTL loop optimizers work. They analyze a subcfg with df, and this is incredibly slow right now. To give you some idea, we spend more than 35% of the total compile time in "loop analysis" (TV_LOOP) for a fairly typical Fortran 90 program with lots of small loops, and almost all that extra time is spent in df_reorganize_refs_by_reg. So, Vlad, thanks for helping us identify a serious bottleneck! Kenny said he has some ideas about how to fix this. No problem. New DF infrastructure is inevitable (we differ only in question when to merge). I see a real progress in compilation speed improvement. I am going to do more benchmarking (on merge points) to check the progress. As you probably saw from my email, there is SPECFP2000 score degradation on ppc64 (all other ports looks good). The biggest degradation is wupwise (about 7%). Wupwise is relatively small program. It would be nice if you fix it. I also see bigger code size on all platforms. As I wrote if you fix that, you will be closer to 5% target or even there. And ppc is the best platform to do it.
Re: [MIPS] MADD issue
Richard Sandiford wrote: As far as madd goes, I think it would be better to either (a) get combine to handle this situation or (b) get expand to generate a fused multiply-add from the outset. (b) sounds like it might be useful in its own right. At the moment we treat the generation of floating-point multiply-adds as an optimisation, but in some applications it's critical not to round the intermediate result. (I don't know if there's a bugzilla entry about this.) If we treated fused multiply-add as a primitive operation, we could extend it to integer types too. In this case we'd also need to handle widening multiplications, but we already need to do that for stand-alone multiplications. Richard While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. Nigel
Re: [MIPS] MADD issue
Nigel Stephens <[EMAIL PROTECTED]> writes: > While I agree with you philosophically, it feels like (b) might be quite > a major task. A number of optimisation passes which currently recognise > and MUL and PLUS separately (e.g. loop strength reduction) would now > need to be extended to handle the fused MULPLUS and MULSUB operators. > > And although the reduction in instruction count due to your previous > change is good, what is it as a percentage of the total? After all it > only helps code which uses 64-bit integer types with a 32-bit ABI, which > is probably quite a small proportion of most real-life applications -- > whereas for some algorithms the ability to use MADD is absolutely > critical to performance, and for them losing the ability to generate > MADD is a significant backward step for the compiler. > > How about, as a workaround until (b) sees the light of day, we > reimplement adddi3 and subdi3 only (not the other di mode patterns), > qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented > more cleanly nowadays, using define_insn_and_split and/or a "#" > template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. Richard
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens <[EMAIL PROTECTED]> writes: While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. OK, we'll have a think about that. Thanks Nigel
Re: RFA: i386 is running out target mask bits
On Thu, Apr 12, 2007 at 05:31:36PM -0700, H. J. Lu wrote: > http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00738.html Ok. r~
tree_code and type safety
Hi, while waiting for my copyright assignment, I continued compiling gcc with a C++ compiler. Most problems are minor, but now I encountered one where I am unsure what to do: The basic tree codes are defined by the enum tree_code, that basically looks like this: enum tree_code { LAST_AND_UNUSED_TREE_CODE } The C front end apparently needs additional tree codes, and defines them like this: enum c_tree_code { C_DUMMY_TREE_CODE = LAST_AND_UNUSED_TREE_CODE, } So far ok, but then C front end passes its private tree code to functions expecting tree_code values. This is not accepted by the C++ compiler, as tree_code and c_tree_code are distinct types. In fact I think the code is undefined even in C unless something like MAXIMUM_TREE_CODE = 65535 is added to tree_code and MINIMUM_TREE_CODE=0 is added to c_tree_code. (At least if the C++ standard paragraph 7.2.6 is similar to its C counterpart) Now I have multiple options to fix this issue: 1) I could just explicitly cast from c_tree_code to tree_code. Avoids the error, and is only needed about two or three times in the whole code as there is only one C tree_code currently. But more tree code might be added in the future and other front end might use more tree codes. 2) I could use an integer data type instead of an enum to hold the tree code values. This avoids all problems, but is massively invasive: grep "enum tree_code" returns 530 hits, changing all of these into tree_code (as an integer typedef would not be in the enum namespace) would touch many files. Probably not a good idea. 3) use preprocessor magic to add the front end tree codes into the tree code enum, somewhat like this (just a rough sketch): enum tree_code { LAST_AND_UNUSED_TREE_CODE, FIRST_C_CODE = LAST_AND_UNUSED_TREE_CODE, #include "c-common.def" FIRST_FOOLANG_CODE = LAST_AND_UNUSED_TREE_CODE, #include "foolang-common.def" } This gets the enum reasonable, but has to know the front ends somehow. Or perhaps it is enough to include the tree codes of the _current_ front end, whatever it is. The preprocessor magic is probably not trivial, but otherwise the rest of the code should not be affected. I tend to just go for 1) and add the casts, but this is not very future proof. Any suggestions? Thomas
Re: [MIPS] MADD issue
Paolo Bonzini <[EMAIL PROTECTED]> writes: > > (define_insn "adddi3_internal_1" > > [(set (match_operand:DI 0 "register_operand" "=d,&d") > > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > > (match_operand:DI 2 "register_operand" "d,d"))) > >(clobber (match_operand:SI 3 "register_operand" "=d,d"))] > > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > > { > > return (REGNO (operands[0]) == REGNO (operands[1]) > > && REGNO (operands[0]) == REGNO (operands[2])) > > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > > : > > "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > > } > > This should be a post-reload (i.e. predicated on reload_completed) > split, I think. Actually, with the relatively recent lower-subreg work, it is desirable to split this sort of instruction before reload. That is, do an unconditional split. Ian
Re: [MIPS] MADD issue
This should be a post-reload (i.e. predicated on reload_completed) split, I think. Actually, with the relatively recent lower-subreg work, it is desirable to split this sort of instruction before reload. That is, do an unconditional split. Right. Combine cannot cope with the resulting 4-insn sequence and merge it back with the multiplication, but split is ran after combine and before the second lower-subreg pass. So, making this an unconditional split should be the best of both worlds. Paolo
Re: [MIPS] MADD issue
Paolo Bonzini <[EMAIL PROTECTED]> writes: >>> This should be a post-reload (i.e. predicated on reload_completed) >>> split, I think. >> >> Actually, with the relatively recent lower-subreg work, it is >> desirable to split this sort of instruction before reload. That is, >> do an unconditional split. > > Right. Combine cannot cope with the resulting 4-insn sequence and merge > it back with the multiplication, but split is ran after combine and > before the second lower-subreg pass. > > So, making this an unconditional split should be the best of both worlds. The problem is, combine is also one of the passes that was able to optimise the split form so effectively. It's not the best of both worlds from that point of view. Richard
Re: Call to arms: testsuite failures on various targets
Dave Korn wrote: On 12 April 2007 22:22, FX Coudert wrote: Note2: I also omitted a couple of gfortran.dg/secnds.f failures; this testcase should be reworked I was about to report that myself! Both secnds.f /and/ secnds-1.f have some kind of race condition or indeterminacy. It's an indeterminacy, and a somewhat pernicious one -- I tried to fix it a few months ago, and didn't get very far. My guess is that what's happening is inconsistent rounding between two different intrinsic functions that both return the current time as a floating-point number, or something equivalent to that, but I haven't had any time to poke at it farther. - Brooks
Re: RFC: Add target_isa_flags
On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: You don't need to do all this, You can just use variable with MASK which was added by JSM when PPC64-linux-gnu's target bits overflowed. -- Pinski
Re: RFC: Add target_isa_flags
On Fri, Apr 13, 2007 at 12:04:04PM -0700, Andrew Pinski wrote: > On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > > You don't need to do all this, You can just use variable with MASK > which was added by JSM when PPC64-linux-gnu's target bits overflowed. For i386, we are adding new target mask bits for new instruction sets. The new bits are used togther with existing ISA bits to selectively enable builtins. I don't know how to make the new variable to work when a new variable will have a set of bits overlapping with the exist ones. For example, SSE2 has #define MASK_SSE2 (1 << 21) But the new SSE4.1 will have something like #define OPTION_MASK_SSE4_1 (1 << 2) I can't use MASK_SSE2 | OPTION_MASK_SSE4_1 since 2 sets are different. H.J.
GIMPLE tuples document uploaded to wiki
I have added the design document and links to most of the discussions we've had so far. Aldy updated the document to reflect the latest thread. http://gcc.gnu.org/wiki/tuples
Re: RFC: Add target_isa_flags
On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: On Fri, Apr 13, 2007 at 12:04:04PM -0700, Andrew Pinski wrote: > On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > > You don't need to do all this, You can just use variable with MASK > which was added by JSM when PPC64-linux-gnu's target bits overflowed. For i386, we are adding new target mask bits for new instruction sets. The new bits are used togther with existing ISA bits to selectively enable builtins. I don't know how to make the new variable to work when a new variable will have a set of bits overlapping with the exist ones. For example, SSE2 has I mean you can move over all the ISA flags to their own mask variable without changing the common part of the compilers as you can use MASK and variable together in the .opt files. You don't need to have an extra option called ISA for these variables. Thanks, Andrew Pinski
Re: RFC: Add target_isa_flags
On Fri, Apr 13, 2007 at 02:13:34PM -0700, Andrew Pinski wrote: > On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > >On Fri, Apr 13, 2007 at 12:04:04PM -0700, Andrew Pinski wrote: > >> On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > >> > >> You don't need to do all this, You can just use variable with MASK > >> which was added by JSM when PPC64-linux-gnu's target bits overflowed. > > > >For i386, we are adding new target mask bits for new instruction sets. > >The new bits are used togther with existing ISA bits to selectively > >enable builtins. I don't know how to make the new variable to work > >when a new variable will have a set of bits overlapping with the exist > >ones. For example, SSE2 has > > I mean you can move over all the ISA flags to their own mask variable > without changing the common part of the compilers as you can use MASK > and variable together in the .opt files. > > You don't need to have an extra option called ISA for these variables. It won't work due to ;; Support Athlon 3Dnow builtins Mask(3DNOW_A) in i386.opt. It will put 3DNOW_A in target_flags even if I use ;; Support Athlon 3Dnow builtins Mask(3DNOW_A) Var(target_isa_flags) To put it on target_isa_flags, I have to add something like -m3dnowa But this option isn't needed before. Also it will change MASK_SSE to OPTION_MASK_SSE and change TARGET_SSE to OPTION_SSE. They look odd to me. H.J.
Re: RFC: Add target_isa_flags
On 4/13/07, H. J. Lu <[EMAIL PROTECTED]> wrote: But this option isn't needed before. This option should have been there anyways, I don't understand why the option does not exist. -- Pinski
gcc-4.3-20070413 is now available
Snapshot gcc-4.3-20070413 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20070413/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 123799 You'll find: gcc-4.3-20070413.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20070413.tar.bz2 C front end and core compiler gcc-ada-4.3-20070413.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20070413.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20070413.tar.bz2 C++ front end and runtime gcc-java-4.3-20070413.tar.bz2 Java front end and runtime gcc-objc-4.3-20070413.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20070413.tar.bz2The GCC testsuite Diffs from 4.3-20070406 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.