RE: Optimize flag breaks code on many versions of gcc (not all)
On 19 June 2006 00:04, Paolo Carlini wrote: > Zdenek Dvorak wrote: > >> ... I suspect there is something wrong with your >> code (possibly invoking some undefined behavior, using uninitialized >> variable, sensitivity to rounding errors, or something like that). >> >> > A data point apparently in favor of this suspect is that the "problem" > goes away if double is replaced everywhere with long double... > > Paolo. Is this another case of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 then? cheers, DaveK -- Can't think of a witty .sigline today
Re: Coroutines
Ross Ridge wrote: >Hmm? I don't see how the "Lua-style" coroutines you're looking are any >lightweight than what Maurizio Vitale is looking for. They're actually >more heavyweight because you need to implement some method of returning >values to the "coroutine" being yeilded to. Dustin Laurence wrote: >I guess that depends on whether the userspace thread package in question >provides for a return value as pthreads does. Maurizio Vitale clearly wasn't looking for pthreads. > In any case, coroutines don't need a scheduler, even a cooperative one. He also made it clear he wanted schedule his threads himself, just like you want to do. In fact, what he seems to be trying to implement are true symmetric coroutines. Ross Ridge
gcc port based on MIPS
hi, I'm trying to port gcc for a processor which is very similar to MIPS.Today i just tried to compile gcc-4.1.0 for this processor by changing configuration files. First i changed the config.sub file in base directory and just added the name of processor ABC. Then i changed the configure.ac file in gcc/ subdirectory and added following lines. ABC*) conftest_s=' .section .tdata,"awT",@progbits x: .word 2 .text addiu $4, $28, %tlsgd(x) addiu $4, $28, %tlsldm(x) lui $4, %dtprel_hi(x) addiu $4, $4, %dtprel_lo(x) lw $4, %gottprel(x)($28) lui $4, %tprel_hi(x) addiu $4, $4, %tprel_lo(x)' tls_first_major=2 tls_first_minor=16 tls_as_opt='-32 --fatal-warnings' ;; As you can see it was just copy paste of mips*-*-*) option. Then i did following changes to config.gcc file in gcc/ subdirectory ABC*) cpu_type=ABC ;; - - - - - - - - - - - - - - -- - - - - - -- ABC*) tm_file="dbxelf.h elfos.h svr4.h linux.h ${tm_file} ABC/linux.h" ;; Then i made a directory gcc-4.1.0/gcc/config/ABC/.I copied all files of gcc-4.1.0/gcc/config/mips to ABC directory and renamed following files. mips.h -- ABC.h mips.md --ABC.md mips.c --ABC.c mips-modes.def -ABC-modes.def mips-protos.h- ABC-protos.h mips.opt - ABC.opt But when i issued the make all-gcc command .Following error occured ../../gcc-4.1.0/gcc/config/ABC/ABC.md: unknown mode `V2SF' Would u please explain why this error is being generated.Also a bit of explaination of 'V2SF' mode will helpful. Then i removed the 'V2SF' mode from patterns in ABC.md file.But now following error was generated. ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' for `mode' attribute ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' for `mode' attribute ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' for `mode' attribute ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' for `mode' attribute Would you please tell me why this error is being generated. thanks, shahzad
Re: Usage of -ftrapv
> I'd like to catch automatically over/underflows on floating point > and integer arithmetic. I thought -ftrapv would do the trick but I > don't really understand how it works. By the way, -ftrapv only works on integral types. Ben
Re: Optimize flag breaks code on many versions of gcc (not all)
On 6/19/06, Dave Korn <[EMAIL PROTECTED]> wrote: On 19 June 2006 00:04, Paolo Carlini wrote: > Zdenek Dvorak wrote: > >> ... I suspect there is something wrong with your >> code (possibly invoking some undefined behavior, using uninitialized >> variable, sensitivity to rounding errors, or something like that). >> >> > A data point apparently in favor of this suspect is that the "problem" > goes away if double is replaced everywhere with long double... > > Paolo. Is this another case of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 then? cheers, DaveK It is the same case. Fundamentally, this is not fixable by the compiler alone without significant performance penalty. There are very few implementations [1] that are completely IEEE754 conformant and making them to be so is often prohibitively expensive, hence it's not done or at least not by default. So whenever you're programming a floating-point code, you need to be aware of the caveats of a particular implementation. Beside x86's well-known extended precision issue, other processors have things like flush-denormal-inputs-to-zero, or multiply-add instruction that is not equivalent to separate muitply and add. I think it's not fair to expect gcc to somehow "fix" this whole mess alone. Of course, whenever there's a reasonable workaround for a particular issue, I'm sure gcc developers will try to accomodate it, but IMHO this one (bug 323) isn't such. [1] by implementation, I mean the combination of: microprocessor, OS, compiler and runtime libraries.. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Usage of -ftrapv
> By the way, -ftrapv only works on integral types. When it works. Last time I took a look, it was easily wiped out by optimization. -- Eric Botcazou
Re: gcc-4.1.0 cross-compile for MIPS
David Daney kirjoitti: kernel coder wrote: hi, I'm trying to cross compile gcc-4.1.0 for mipsel platform.Following is the sequence of commands which i'm using ../gcc-4.1.0/configure --target=mipsel --without-headres --prefix=/home/shahzad/install/ --with-newlib --enable-languages=c Perhaps you should try to disable libssp. Try adding (untested) --disable-libmudflap --disable-libssp I tried the 'mipsel-elf' target (to which the bare 'mipsel' leads) with gcc-4.1.1 and using '--with-newlib --enable-languages=c,c++ --disable-shared'. The last (maybe) required because earlier builds with other '-elf' targets stopped when trying to check the 'libgcc_s.so' existence... But no '--without-headers' was used, instead copying the generic newlib headers into the $tooldir ($prefix/$target). After that everything succeeded: 'gcc' and 'libiberty', 'libstdc++-v3' and 'libssp' for the target. So disabling the libssp is vain. There was no libmudflap build... So, if forgetting that '--disable-shared', the build worked just as earlier with the earlier GCC versions! And 'kernel coder' using : ../gcc-4.1.0/configure --target=mipsel --prefix=/home/shahzad/install \ --with-newlib --enable-languages=c,c++ should have worked after having copied those newlib headers to be ready for the fixinc, limits.h check etc. the GCC build tries to do with them.
Re: Optimize flag breaks code on many versions of gcc (not all)
On 6/19/06, Seongbae Park <[EMAIL PROTECTED]> wrote: On 6/19/06, Dave Korn <[EMAIL PROTECTED]> wrote: > On 19 June 2006 00:04, Paolo Carlini wrote: > > > Zdenek Dvorak wrote: > > > >> ... I suspect there is something wrong with your > >> code (possibly invoking some undefined behavior, using uninitialized > >> variable, sensitivity to rounding errors, or something like that). > >> > >> > > A data point apparently in favor of this suspect is that the "problem" > > goes away if double is replaced everywhere with long double... > > > > Paolo. > > Is this another case of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 > then? > > cheers, > DaveK It is the same case. Fundamentally, this is not fixable by the compiler alone without significant performance penalty. There are very few implementations [1] that are completely IEEE754 conformant and making them to be so is often prohibitively expensive, hence it's not done or at least not by default. So whenever you're programming a floating-point code, you need to be aware of the caveats of a particular implementation. Beside x86's well-known extended precision issue, other processors have things like flush-denormal-inputs-to-zero, or multiply-add instruction that is not equivalent to separate muitply and add. I think it's not fair to expect gcc to somehow "fix" this whole mess alone. Of course, whenever there's a reasonable workaround for a particular issue, I'm sure gcc developers will try to accomodate it, but IMHO this one (bug 323) isn't such. [1] by implementation, I mean the combination of: microprocessor, OS, compiler and runtime libraries.. Using -mfpmath=sse -msse2 is a workaround if you have a processor that supports SSE2 instructions. As opposed to -ffloat-store, it works reliably and with no performance impact. Richard.
Re: addressability checks in the gimplifier
Hello, As a followup to my previous message enquiring about the intent underlying various addressability checks in the gimplifier, attached is an example of patch which addresses the issues we're observing. It for instance fixes an ICE in in expand_expr_addr_expr_1 on the testcase below: procedure P5 is type Long_Message is record Data : String (1 .. 16); end record; type Short_Message is record B : Boolean; Data : String (1 .. 4); end record; pragma Pack (Short_Message); procedure Process (LM : Long_Message; Size : Natural) is SM : Short_Message; begin SM.Data (1 .. Size) := LM.Data (1 .. Size); end; begin null; end; which is the one producing the tree excerpt quoted in the previous message (for SM.Data (1 .. Size) in Process). The patch bootstraps fine with languages="all,ada" on i686-pc-linux-gnu, and introduces no new regression. Regarding gimple predicates typically not recursing down trees (in accordance with the grammar), as I said << I'm pretty sure I'm missing implicit assumptions and/or bits of design intents in various places, so would appreciate input on the case and puzzles described above. >> So this patch is posted here primarily for discussion purposes. I'd welcome suggestions on better ways to address this, if the approach is indeed considered inappropriate. Thanks in advance for your help, With Kind Regards, Olivier 2006-06-19 Olivier Hainque <[EMAIL PROTECTED]> * tree-gimple.c (is_gimple_lvalue, is_gimple_addressable): Account for possibly nested bitfield component refs, not addressable while still valid lvalues. *** tree-gimple.c.ori Tue May 30 15:55:07 2006 --- tree-gimple.c Mon Jun 19 16:50:38 2006 *** rhs_predicate_for (tree lhs) *** 139,149 bool is_gimple_lvalue (tree t) { ! return (is_gimple_addressable (t) ! || TREE_CODE (t) == WITH_SIZE_EXPR ! /* These are complex lvalues, but don't have addresses, so they !go here. */ ! || TREE_CODE (t) == BIT_FIELD_REF); } /* Return true if T is a GIMPLE condition. */ --- 139,148 bool is_gimple_lvalue (tree t) { ! return (TREE_CODE (t) == WITH_SIZE_EXPR ! || INDIRECT_REF_P (t) ! || handled_component_p (t) ! || is_gimple_variable (t)); } /* Return true if T is a GIMPLE condition. */ *** is_gimple_condexpr (tree t) *** 159,166 bool is_gimple_addressable (tree t) { ! return (is_gimple_id (t) || handled_component_p (t) ! || INDIRECT_REF_P (t)); } /* Return true if T is function invariant. Or rather a restricted --- 158,181 bool is_gimple_addressable (tree t) { ! if (is_gimple_id (t) || INDIRECT_REF_P (t)) ! return true; ! ! switch (TREE_CODE (t)) ! { ! case COMPONENT_REF: ! return ! !DECL_BIT_FIELD (TREE_OPERAND (t, 1)) ! && is_gimple_addressable (TREE_OPERAND (t, 0)); ! ! case VIEW_CONVERT_EXPR: ! case ARRAY_REF: case ARRAY_RANGE_REF: ! case REALPART_EXPR: case IMAGPART_EXPR: ! return is_gimple_addressable (TREE_OPERAND (t, 0)); ! ! default: ! return false; ! } } /* Return true if T is function invariant. Or rather a restricted *** gimplify.c.ori Tue May 30 15:54:59 2006 --- gimplify.c Mon Jun 19 16:55:00 2006 *** gimplify_modify_expr (tree *expr_p, tree *** 3422,3430 return ret; /* If we've got a variable sized assignment between two lvalues (i.e. does ! not involve a call), then we can make things a bit more straightforward ! by converting the assignment to memcpy or memset. */ ! if (TREE_CODE (*from_p) == WITH_SIZE_EXPR) { tree from = TREE_OPERAND (*from_p, 0); tree size = TREE_OPERAND (*from_p, 1); --- 3422,3431 return ret; /* If we've got a variable sized assignment between two lvalues (i.e. does ! not involve a call), we can make things a bit more straightforward by ! converting the assignment to memcpy or memset as soon as both operands ! can have their address taken. */ ! if (TREE_CODE (*from_p) == WITH_SIZE_EXPR && is_gimple_addressable (*to_p)) { tree from = TREE_OPERAND (*from_p, 0); tree size = TREE_OPERAND (*from_p, 1);
Re: gcc port based on MIPS
"kernel coder" <[EMAIL PROTECTED]> writes: > But when i issued the make all-gcc command .Following error occured > > ../../gcc-4.1.0/gcc/config/ABC/ABC.md: unknown mode `V2SF' > > Would u please explain why this error is being generated.Also a bit of > explaination of 'V2SF' mode will helpful. V2SF should normally be defined by ABC/ABC-modes.def. It is normally found by these lines in config.gcc when you run configure: if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def then extra_modes=${cpu_type}/${cpu_type}-modes.def fi V2SF will be created by the line VECTOR_MODES (FLOAT, 8); in ABC-modes.def (as copied from mips-modes.def). > Then i removed the 'V2SF' mode from patterns in ABC.md file.But now > following error was generated. > > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value > `' for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value > `' for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' > for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' > for `mode' attribute > > > Would you please tell me why this error is being generated. Hard to say without knowing what you changed. Did you simply delete the ANYF macro, or forget to remove the V2SF case from it? MD file macros are documented here: http://gcc.gnu.org/onlinedocs/gccint/Macros.html Ian
Re: gcc port based on MIPS
V2SF will be created by the line VECTOR_MODES (FLOAT, 8); Yes you are absolutely right.When i changed the name of file ABC-modes.def to 1ABC-modes.def ,i got the following error make[1]: *** No rule to make target `../../gcc-4.1.0/gcc/config/ABC/ABC-modes.def', needed by `build/genmodes.o'. Stop. This shows that ABC-modes.def is being used and it has the required macro VECTOR_MODES (FLOAT, 8); Then why still the following error is being generated. > ../../gcc-4.1.0/gcc/config/ABC/ABC.md: unknown mode `V2SF' As far as my changes to ABC.md file are concerned .They are as fellows (define_mode_macro ANYF [(SF "TARGET_HARD_FLOAT") (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")]) ;; (V2SF "TARGET_PAIRED_SINGLE_FLOAT")]) - - - - - -- - - - - - - - - - -- - - - - - - - - -- - - - - (define_mode_attr divide_condition [DF (SF "!TARGET_FIX_SB1 || flag_unsafe_math_optimizations")]) ;; (V2SF "TARGET_SB1 && (!TARGET_FIX_SB1 || flag_unsafe_math_optimizations)")]) As you can see i just omitted the entries of V2SF. On 19 Jun 2006 10:40:45 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: "kernel coder" <[EMAIL PROTECTED]> writes: > But when i issued the make all-gcc command .Following error occured > > ../../gcc-4.1.0/gcc/config/ABC/ABC.md: unknown mode `V2SF' > > Would u please explain why this error is being generated.Also a bit of > explaination of 'V2SF' mode will helpful. V2SF should normally be defined by ABC/ABC-modes.def. It is normally found by these lines in config.gcc when you run configure: if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def then extra_modes=${cpu_type}/${cpu_type}-modes.def fi V2SF will be created by the line VECTOR_MODES (FLOAT, 8); in ABC-modes.def (as copied from mips-modes.def). > Then i removed the 'V2SF' mode from patterns in ABC.md file.But now > following error was generated. > > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value > `' for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value > `' for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' > for `mode' attribute > ../../gcc-4.1.0/gcc/config/ABC/ABC.md:228: unknown value `' > for `mode' attribute > > > Would you please tell me why this error is being generated. Hard to say without knowing what you changed. Did you simply delete the ANYF macro, or forget to remove the V2SF case from it? MD file macros are documented here: http://gcc.gnu.org/onlinedocs/gccint/Macros.html Ian
Re: gcc port based on MIPS
"kernel coder" <[EMAIL PROTECTED]> writes: > > V2SF will be created by the line > > VECTOR_MODES (FLOAT, 8); > > Yes you are absolutely right.When i changed the name of file > ABC-modes.def to 1ABC-modes.def ,i got the following error > > make[1]: *** No rule to make target > `../../gcc-4.1.0/gcc/config/ABC/ABC-modes.def', needed by > `build/genmodes.o'. Stop. > This shows that ABC-modes.def is being used and it has the required macro > > VECTOR_MODES (FLOAT, 8); > > Then why still the following error is being generated. > > > > ../../gcc-4.1.0/gcc/config/ABC/ABC.md: unknown mode `V2SF' I don't know. You'll have to debug it. > As far as my changes to ABC.md file are concerned .They are as fellows > > (define_mode_macro ANYF [(SF "TARGET_HARD_FLOAT") > (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")]) > ;; (V2SF "TARGET_PAIRED_SINGLE_FLOAT")]) > > - - - - - -- - - - - - - - - - > -- - - - - - - - - -- - - - - > (define_mode_attr divide_condition > [DF (SF "!TARGET_FIX_SB1 || flag_unsafe_math_optimizations")]) > ;; (V2SF "TARGET_SB1 && (!TARGET_FIX_SB1 || > flag_unsafe_math_optimizations)")]) > > > As you can see i just omitted the entries of V2SF. I hope that isn't really what you did, since that would comment out the "])" close brackes in each case. Ian
Re: gcc port based on MIPS
> > "kernel coder" <[EMAIL PROTECTED]> writes: > > > (define_mode_attr divide_condition > > [DF (SF "!TARGET_FIX_SB1 || flag_unsafe_math_optimizations")]) > > ;; (V2SF "TARGET_SB1 && (!TARGET_FIX_SB1 || > > flag_unsafe_math_optimizations)")]) > > > > > > As you can see i just omitted the entries of V2SF. > > I hope that isn't really what you did, since that would comment out > the "])" close brackes in each case. Except the above line has ]) also :). -- Pinski
Question regarding the "Clean up how cse works" project
Hi, I have a question about the "Clean up how cse works" project on http://gcc.gnu.org/projects/optimize.html Let me first explain what I am trying to do. I have seen Vlad's patch to make CSE path following remember its state at the end of a path, so that when a new path is followed, a re-scan is unnecessary for all the insns up to the point where a state is stored. I believe that in the long run we want CSE to work on extended basic blocks, i.e. more like a tree walk, without looking path following as we know it. And IMHO that special hash table implementation in cse.c shouldn't be necessary so I'd like to replace it with libiberty's hashtab (which seems to do Just Fine for e.g. cselib). So far I've mostly used Vlad's code for learning but it seems to me that his code is not easily adapted to my "do CSE on extended basic blocks" idea, and I have also found out that with his patch the order of the elements in the hash table is not restored properly. This even causes some test suite failures for me with gcc 3.2 (the last version that the patch will apply to without too much effort). I don't really see an easy way to efficiently implement a scoped hash table the cse.c way, such that you can invalidate and roll back while maintaining the order of the linked list of equal-valued exprs. And the cse.c hash table is alos simply too slow (fixed number of buckets, so potentially quadratic if you record lots of expressions). The first problem I immediately ran into while trying to figure out how to make cse.c use libiberty's hashtab is that we seem to use different "equivalent" checks depending on how strict we want to be. For some lookups, apparently if we only want to find the first_same_value element in the hash table, we lookup without validating in exp_equiv_p. For other lookups we call exp_equiv_p with validate set to true. The most obvious example is lookup_as_function. In the projects page "Clean up how cse works" there is a scheme described to make cse.c work without first_same_value and next_same_value. Apparently, at some point someone decided that we should not _have_ this whole thing with multiple expressions describing the same value. I couldn't agree more. But I am _still_ not sufficiently familiar with cse.c to fully understand what it can do (other than sending email). E.g. I have been trying to figure out why we record multiple expressions with the same value in the hash table. I would like to know what the benefit is, and whether we would lose optimizations if I make it go away. It turns out that we sometimes record widely different expressions that get the same value (due to canonicalization and so on). Usually the different expression with the same value comes from a REG_EQUAL note. When you look from gdb what we are recording, you get things like: 2: debug_rtx (elt->exp) = (reg:SI 91) void 1: dump_class (classp) = Equivalence chain for (reg:SI 82): (reg:SI 82) (plus:SI (reg:SI 81) (reg:SI 59 [ D.6710 ])) (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) 2: debug_rtx (elt->exp) = (reg:SI 92) void 1: dump_class (classp) = Equivalence chain for (reg:SI 83): (reg:SI 83) (ashift:SI (reg:SI 82) (const_int 2 [0x2])) (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 12 [0xc])) 2: debug_rtx (elt->exp) = (reg:QI 184 [ D.5910 ]) void 1: dump_class (classp) = Equivalence chain for (reg:QI 385): (reg:QI 385) (eq:QI (reg/v:SI 136 [ spec_long ]) (const_int 0 [0x0])) (eq:QI (reg:CCZ 17 flags) (const_int 0 [0x0])) In my collection of cc1-i files (half a million lines of preprocessed code), at -O2, we record multiple expressions with the same value in 4460 cases. My guess is that in most of these cases we record a SET_SRC and a REG_EQUAL note. I of course still need to make sure that assumption is correct ;-) In all cases where the value leader is a constant, we can apparently fold_rtx the expression to that constant so those are not interesting expressions to count as dups. That still leaves more than 3000 cases. I suspect we may benefit from recording these different expressions in e.g. find_best_addr. For some machines, the ashift may be better and for others the mult is cheaper. So to know that these expressions have the same value is very important. That brings me back to the CSE project on the projects page. Assume we'd be looking at the first case again, which comes from the following insns: (insn 54 53 55 6 (parallel [ (set (reg:SI 82) (plus:SI (reg:SI 81) (reg:SI 59 [ D.6710 ]))) (clobber (reg:CC 17 flags)) ]) 208 {*addsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) (nil))) (insn 77 76 78 8 (set (reg:SI 91) (reg:SI 82)) 40 {*movsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) (nil))) With the current cse, we simply brute-force record the REG_EQUAL note
Re: Output of contrib/compare_tests
On Jun 18, 2006, at 2:35 PM, Mike Stein wrote: Is someone else interested in the daily output Or while (1) do, if you have the bandwidth... :-) But, please, just email the results to yourself and try that for a week. :-) This will help shake out the trivial things. You'll also need to add in some other information, like revision numbers of things under test and platform, so that people can know what it is you're reporting. Also, be sure to care and feed the system...
Re: MIPS RDHWR instruction reordering
On Fri, Jun 16, 2006 at 02:12:29PM -0700, Ian Lance Taylor wrote: > The computation of the address of x was moved outside the > conditional--that is, both the rdhwr and the addu moved. You'll have > to figure out why. gcc shouldn't move instructions outside of a > conditional unless they are cheap and don't trap. This instruction > doesn't trap, but it's not cheap. What metric gets used for this - rtx_cost? -- Daniel Jacobowitz CodeSourcery
Re: MIPS RDHWR instruction reordering
Daniel Jacobowitz <[EMAIL PROTECTED]> writes: > On Fri, Jun 16, 2006 at 02:12:29PM -0700, Ian Lance Taylor wrote: > > The computation of the address of x was moved outside the > > conditional--that is, both the rdhwr and the addu moved. You'll have > > to figure out why. gcc shouldn't move instructions outside of a > > conditional unless they are cheap and don't trap. This instruction > > doesn't trap, but it's not cheap. > > What metric gets used for this - rtx_cost? I'm not sure, because I'm not sure what is hoisting the instruction. I tried recreating this, but I couldn't. I get this: foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .cpload $25 .setreorder .setnoreorder .setnomacro beq $4,$0,$L7 .setpush .setmips32r2 rdhwr $3,$29 .setpop .setmacro .setreorder lw $2,%gottprel(x)($28) addu$2,$2,$3 lw $2,0($2) j $31 $L7: .setnoreorder .setnomacro j $31 move$2,$0 .setmacro .setreorder This of course is not ideal, since it unconditionally executes the rdhwr instruction. But it is not the same as what the OP reported. This case happens because reorg.c ignores the cost of the instruction in fill_slots_from_thread. I believe that reorg.c should not move an expensive instruction which is only conditionally executed into a delay slot. That is probably a bug. We can see a similar case with this: int foo(int arg, int x) { if (arg) return x * x; return 0; } which yields this: foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .cpload $25 .setreorder .setnoreorder .setnomacro beq $4,$0,$L7 mult$5,$5 .setmacro .setreorder mflo$2 j $31 $L7: .setnoreorder .setnomacro j $31 move$2,$0 .setmacro .setreorder which executes the "mult" instruction unconditionally which is probably not desirable since it will tie up the multiplication pipeline. Ian
Re: Optimize flag breaks code on many versions of gcc (not all)
On 6/19/06, Richard Guenther <[EMAIL PROTECTED]> wrote: Using -mfpmath=sse -msse2 is a workaround if you have a processor that supports SSE2 instructions. As opposed to -ffloat-store, it works reliably and with no performance impact. Such slab test can be turned into a branchless sequence of SSE min/max, even for filtering infinities around dir ~= 0; it's much simpler and efficient to intersect 4 rays against one box at once though. Without intrinsics a NaN oblivious version would be like: static float minf(const float a, const float b) { return (a < b) ? a : b; } static float maxf(const float a, const float b) { return (a > b) ? a : b; } bool_t intersect_ray_box(const aabb_t &box, const rt::mono::ray_t &ray, float &lmin, float &lmax) { float l1 = (box.min.x - ray.pos.x) * ray.inv_dir.x, l2 = (box.max.x - ray.pos.x) * ray.inv_dir.x; lmin= minf(l1,l2); lmax= maxf(l1,l2); l1 = (box.min.y - ray.pos.y) * ray.inv_dir.y; l2 = (box.max.y - ray.pos.y) * ray.inv_dir.y; lmin= maxf(minf(l1,l2), lmin); lmax= minf(maxf(l1,l2), lmax); l1 = (box.min.z - ray.pos.z) * ray.inv_dir.z; l2 = (box.max.z - ray.pos.z) * ray.inv_dir.z; lmin= maxf(minf(l1,l2), lmin); lmax= minf(maxf(l1,l2), lmax); return (lmax >= lmin) & (lmax >= 0.f); }