Re: Question about merging two instructions.
This approach seems reasonable. The current structure of the code in simplify_replace_rtx is intended to handle RTL expressions rather than patterns, so normally it would be passed just SET_SRC (pat), instead of the whole set. Which is why, OTOH, I would be *extremely* cautious doing such a change. Leehod said: > I tried to use simplify_replace_rtx to replace any use of (reg r) with[in] > the right-hand-side of the extension and simplify the result. If he want to replace uses within the RHS of the extension, he should pass SET_SRC (pat). He may as well want to handle parallels, in which case he should write a new function similar to this: int i; if (GET_CODE (pat) == SET) SET_SRC (pat) = simplify_replace_rtx (SET_SRC (pat), old, new); else if (GET_CODE (pat) == PARALLEL) for (i = 0; i < XVECLEN (pat, 0); i++) { rtx s = XVECEXP (pat, 0, i); if (GET_CODE (XVECEXP (pat, 0, i)) == SET) SET_SRC (s) = simplify_replace_rtx (SET_SRC (s), old, new); } Paolo
Re: [patch] Fix behavior of TER on unrolled loops
Maybe everyone else manages to get good code without something like TER. That's because we lack other things that people have, like a sane instruction selection and expression reordering pass: other compilers probably have something akin to TER as part of instruction selection. Like many other parts of GCC (right Steven? ;-) TER is not evil by itself -- it is as part of IR lowering. I think Zdenek's patch is fine, because *everything* in TER is a hack. Zdenek, maybe you could you try and see if enabling pre-regalloc scheduling would also make this particular problem you're seeing go away, just as an experiment... Couldn't it also make it much worse? Paolo
Re: Question about merging two instructions.
1. Can you please give me an example of something bad that can happen to the LHS. Maybe I'm missing something here. In this case nothing, but if NEW were a subreg, it can change a lot. 3. Isn't it reasonable to expect that every instance on old_rtx will be replaced by new_rtx even if it can't be simplified? This is what I understand from the function's documentation. But actually every expressions that can't be simplified is not replaced. SET is not an expression, so it is not handled by simplify_replace_rtx. I agree that it is confusing, that x = simplify_replace_rtx (a, b, c); is different from x = simplify_rtx (replace_rtx (a, b, c)); So maybe all you need is changing the final "return x" to "return replace_rtx (x, old, new);". I reckon this change would be much cleaner than handling SET, but I would very much want to see what a GWP person thinks about this. Paolo
Redundant limit check for switch
The following code: #include void Switch4(int x) { switch (x & 7) { case 0: printf("0\n"); break; case 1: printf("1\n"); break; case 2: printf("2\n"); break; case 3: printf("3\n"); break; case 4: printf("4\n"); break; case 5: printf("5\n"); break; case 6: printf("6\n"); break; case 7: printf("7\n"); break; } } void Switch256(int x) { switch ((unsigned char) x) { case 0: printf("0\n"); break; case 1: printf("1\n"); break; case 2: printf("2\n"); break; // ... (all 256 cases) } } when compiled with: gcc -S -O3 fullswitch.c produces the following: ... .globl _Switch4 .def_Switch4; .scl2; .type 32; .endef _Switch4: pushl %ebp movl%esp, %ebp movl8(%ebp), %eax andl$7, %eax cmpl$7, %eax ja L12 jmp *L11(,%eax,4) ... .globl _Switch256 .def_Switch256; .scl2; .type 32; .endef _Switch256: pushl %ebp movl%esp, %ebp movzbl 8(%ebp), %eax cmpl$255, %eax ja L273 jmp *L272(,%eax,4) cmpl+ja are redundant in both cases. Do you think it is possible for gcc to optimize them away? An example of a real program with all 256 cases for unsigned char is Atari800. We use table-driven goto * there for better performance.
Re: ISO C++ forbids initialization in array new?
On 8/19/05, Jonathan Wakely <[EMAIL PROTECTED]> wrote: > WU Yongwei wrote: > > Well, I see this in the gcc error message. Can someone here kindly > > point to me which part of the Standard specified this behaviour? I > > thought it should be in 5.3.4, but was not able to find the words > > there. [snipped] > This is OK, and default initialises the array, which default initialises > each element (8.5/5) > >int* i = new int[5](); > > but this is not OK: > >int* i = new int[5](23); > > because it is not valid to initialise an array like this: > >typedef int (five_ints)[5]; >five_ints i(23); > > this gives: > > array_init.cc:8: error: cannot initialize arrays using this syntax This sounds reasonable. The only problem is that it does not constitute proof. It is complete OK if the standard disallowed `int a[5](23)' while allowing `new int[5](23)', just as older GCC did. On 8/19/05, Alisdair Meredith <[EMAIL PROTECTED]> wrote: > WU Yongwei wrote: > > > Well, I see this in the gcc error message. Can someone here kindly > > point to me which part of the Standard specified this behaviour? I > > thought it should be in 5.3.4, but was not able to find the words > > there. > > > > By the way, anyone knows the rationale of this behaviour? > > It is not explicitly forbidden. Rather, there is no syntax defined that > would enable it (so it is implicitly forbidden) My observations. According to 5.3.4, an expression like `new int[5](23)' is well-formed. Just that no semantics are formally defined for it. The behaviour might be OK, but the message seems a little misleading. Would something like `cannot specify initializer for arrays' be better since it really has little to do with `array new'? Also, the trends of gcc removing existing `extensions' are a little worrisome. IMHO, the extension should be allowed and a warning in the case of `-W' or `-ansi' should be issued. Best regards, Yongwei
Build report for AIX 5.1
Hi, i just built GCC 4.0.1 on AIX 5.1 using the following commands: ../gcc-4.0.1/configure --with-libiconv-prefix=/usr --disable-nls --disable-multilib make bootstrap-lean make install $ config.guess powerpc-ibm-aix5.1.0.0 $ gcc -v Using built-in specs. Target: powerpc-ibm-aix5.1.0.0 Configured with: /home/linke/temp/gcc-4.0.1/configure --with-libiconv-prefix=/usr --disable-nls --disable-multilib Thread model: aix gcc version 4.0.1 The system is an IBM pSeries M80 with AIX 5.1 at the latest patchlevel. The building c-complier is gcc 4.0.0 Make is gnu-make 3.80 The disable-xxx configure-options shouldn't be necessary, i used them for buildtime- and space-saving reasons. The whole build took less than two hours. Mario Linke
PR/23303 (not using leal on ix86)
It looks like Jan's patch at http://gcc.gnu.org/ml/gcc-patches/2005-07/msg02021.html is causing a code size regression in i686. I'll consider a 32-bit target, and this testcase char ** VTallocbuf(char **allbuf, unsigned long savelines) { return &allbuf[savelines]; } For i686 we produce movl8(%esp), %eax movl4(%esp), %edx sall$2, %eax addl%edx, %eax instead of movl8(%esp), %eax movl4(%esp), %edx leal(%edx,%eax,4), %eax However even in this case a no-lea code could be feasible: movl8(%esp), %eax sall$2, %eax addl4(%esp), %eax And this is exactly the rtl we have until peephole2, where this peephole splits the instruction: ;; Don't do logical operations with memory inputs. (define_peephole2 [(match_scratch:SI 2 "r") (parallel [(set (match_operand:SI 0 "register_operand" "") (match_operator:SI 3 "arith_or_logical_operator" [(match_dup 0) (match_operand:SI 1 "memory_operand" "")])) (clobber (reg:CC FLAGS_REG))])] "! optimize_size && ! TARGET_READ_MODIFY" [(set (match_dup 2) (match_dup 1)) (parallel [(set (match_dup 0) (match_op_dup 3 [(match_dup 0) (match_dup 2)])) (clobber (reg:CC FLAGS_REG))])] "") I think that Jan's patch should be conditionalized: if !optimize_size && !TARGET_READ_MODIFY, the transformation he removed will be done anyway, and too late in the game. Let's see what the hunks do... Index: expr.c === RCS file: /cvs/gcc/gcc/gcc/expr.c,v retrieving revision 1.806 diff -c -3 -p -r1.806 expr.c *** expr.c 25 Jul 2005 12:04:45 - 1.806 --- expr.c 29 Jul 2005 12:10:40 - *** expand_expr_real_1 (tree exp, rtx target *** 6578,6595 target = 0; } - /* If will do cse, generate all results into pseudo registers - since 1) that allows cse to find more things - and 2) otherwise cse could produce an insn the machine - cannot support. An exception is a CONSTRUCTOR into a multi-word - MEM: that's much more likely to be most efficient into the MEM. - Another is a CALL_EXPR which must return in memory. */ - - if (! cse_not_expected && mode != BLKmode && target - && (!REG_P (target) || REGNO (target) < FIRST_PSEUDO_REGISTER) - && ! (code == CONSTRUCTOR && GET_MODE_SIZE (mode) > UNITS_PER_WORD) - && ! (code == CALL_EXPR && aggregate_value_p (exp, exp))) - target = 0; switch (code) { I think this ought to be left in. Not because CSE can find more things, but because in general an instruction selection pass ought to recombine MEMs at their usage points if it is worthwhile. To this end, ix86_rtx_costs could be taught about TARGET_READ_MODIFY, with something like: case MEM: if (!optimize && !TARGET_READ_MODIFY && GET_RTX_CLASS (outer_code) == RTX_BIN_ARITH) *total++; break; Also consider that we still lack tree-based load PRE (PR23455), so we still need CSE to "find more things". Load PRE is the major obstacle towards removing CSE path following. --- 6578,6583 Index: optabs.c === RCS file: /cvs/gcc/gcc/gcc/optabs.c,v retrieving revision 1.287 diff -c -3 -p -r1.287 optabs.c *** optabs.c12 Jul 2005 09:20:02 - 1.287 --- optabs.c29 Jul 2005 12:10:41 - *** expand_vec_cond_expr (tree vec_cond_expr *** 5475,5481 if (icode == CODE_FOR_nothing) return 0; ! if (!target) target = gen_reg_rtx (mode); /* Get comparison rtx. First expand both cond expr operands. */ --- 5475,5481 if (icode == CODE_FOR_nothing) return 0; ! if (!target || !insn_data[icode].operand[0].predicate (target, mode)) target = gen_reg_rtx (mode); /* Get comparison rtx. First expand both cond expr operands. */ This is partially unrelated, it does not hurt. Index: config/i386/i386.c === RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v retrieving revision 1.843 diff -c -3 -p -r1.843 i386.c *** config/i386/i386.c 18 Jul 2005 06:39:18 - 1.843 --- config/i386/i386.c 29 Jul 2005 12:10:42 - *** ix86_fixup_binary_operands (enum rtx_cod *** 8154,8170 && GET_RTX_CLASS (code) != RTX_COMM_ARITH) src1 = force_reg (mode, src1); - /* If optimizing, copy to regs to improve CSE */ - if (optimize && ! no_new_pseudos) - { - if (GET_CODE (dst) == MEM) - dst = gen_reg_rtx (mode); - if (GET_CODE (src1) == MEM) - src1 = force_reg (mode, src1); - if (GET_CODE (src2) == MEM) - src2 = force_reg (mode, src2); - } - src1 = operands[1] = src1; src2 = operands[2] = src2; r
Re: Bootstrap failure on powerpc-apple-darwin8 with Ada
I disbelieve you can get this in C or C++. The fragment above is a syntax error. AFAIK, all of this is simple laziness in the Ada front end: generating & is how things were done at the beginning of time, and it was easier to change this in the gimplifier than to modify the code that generated this directly. Well, it's more that what this should get changed to is the address of a temporary and the gimplifier already has code to make the temporaries so why duplicate that code in Gigi?
Re: Redundant limit check for switch
Piotr Fusik <[EMAIL PROTECTED]> wrote: > void Switch4(int x) { > switch (x & 7) { > case 0: printf("0\n"); break; > case 1: printf("1\n"); break; > case 2: printf("2\n"); break; > case 3: printf("3\n"); break; > case 4: printf("4\n"); break; > case 5: printf("5\n"); break; > case 6: printf("6\n"); break; > case 7: printf("7\n"); break; > } > } > .globl _Switch4 > .def _Switch4; .scl 2; .type 32; .endef > _Switch4: > pushl %ebp > movl %esp, %ebp > movl 8(%ebp), %eax > andl $7, %eax > cmpl $7, %eax > ja L12 > jmp *L11(,%eax,4) > cmpl+ja are redundant in both cases. > Do you think it is possible for gcc to optimize them away? I believe VRP could be taught about inferring ranges from bit_and_expr and similar operations. Right? -- Giovanni Bajo
Re: Redundant limit check for switch
void Switch4(int x) { switch (x & 7) { } } >> .globl _Switch4 .def _Switch4; .scl 2; .type 32; .endef _Switch4: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax andl $7, %eax cmpl $7, %eax ja L12 jmp *L11(,%eax,4) cmpl+ja are redundant in both cases. Do you think it is possible for gcc to optimize them away? I believe VRP could be taught about inferring ranges from bit_and_expr and similar operations. Right? Yes, but the range check is not emitted until trees are expanded to RTL. combine does a lot of simplifications, but unfortunately not this one. It is also quite hard to teach combine to *remove* jumps, though it has some ability to turn conditional jumps into unconditional. The attached patch would at least cause simplify-rtx.c to realize that (gtu (reg:SI 61) (const_int 7)) is false, but not cause any code generation improvement. Paolo Index: simplify-rtx.c === RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v retrieving revision 1.230.2.6 diff -u -r1.230.2.6 simplify-rtx.c --- simplify-rtx.c 24 Apr 2005 22:15:04 - 1.230.2.6 +++ simplify-rtx.c 22 Aug 2005 12:23:03 - @@ -2912,6 +2912,7 @@ if (GET_CODE (op1) == CONST_INT) { + HOST_WIDE_INT mask; if (INTVAL (op1) == 0 && COMPARISON_P (op0)) { /* If op0 is a comparison, extract the comparison arguments form it. */ @@ -2931,6 +2932,46 @@ XEXP (op0, 0), XEXP (op0, 1)); } } + + + mask = nonzero_bits (op0, cmp_mode); + switch (code) + { + case GE: + if (mask < 0) + break; + /* fall through */ + case GEU: + if (INTVAL (op1) > mask) + return const0_rtx; + + case GT: + if (mask < 0) + break; + /* fall through */ + case GTU: + if (INTVAL (op1) >= mask) + return const0_rtx; + + case LE: + if (mask < 0) + break; + /* fall through */ + case LEU: + if (INTVAL (op1) >= mask) + return const_true_rtx; + + case LT: + if (mask < 0) + break; + /* fall through */ + case LTU: + if (INTVAL (op1) > mask) + return const_true_rtx; + break; + default: + break; + } } /* (eq/ne (plus x cst1) cst2) simplifies to (eq/ne x (cst2 - cst1)) */
Re: Problem compiling libstdc++ is current 4.0.2 cvs (volatile strikes again)
Haren Visavadia wrote: You missed "The GCC team has been urged to drop support for SCO Unix from GCC, as a protest against SCO's irresponsible aggression against free software". When starting my Unix learnings with SCO Xenix/286, SCO Xenix/386 and SCO Unix (all some kind of trademarks), I have always wondered what on earth this really meaned. Did it mean that the support for SCO Unix, the '3.2.[2-4]' was dropped but the support for SCO OpenServer5 and SCO UnixWare were continued or what? Can it be so hard to ask someone who knows something about the SCO products to write some sane clauses to the FSF documents ? Maybe "SCO Unix" means something else for Haren and some FSF people but my thought is that the majority understands it to mean the SCO's SVR3.2 release from the late 80's and early 90's... Ok, the heir for SCO Unix is the OSR5 but the UnixWare got from Univel/Novell has always been called as "UnixWare" so "SCO Unix" cannot be called so... If the "SCO Unix" means all the SVR4 and SVR5 based Unices, then also AIX, Solaris2, HP-UX, Irix and others were dropped from all support
Re: GCC 4.1 Status Report (2005-08-21)
On Aug 22, 2005, at 1:27 AM, Mark Mitchell wrote: (Quite a few of the diagnostic messages stem from the design decision to issue warnings from the optimizers...) Only 8 out of 49 at that, though some are very minor as two are just complaining wording of the warning. And almost all are uninitialized warnings which are always questionable as there is no way to warn from the front-end without flow control. In fact all those 8 are either unreachable code, uninitialized, or return. All of which can be shown have to be done with control flow. If you take the following example (for unreachable code): void f(void) { goto a; b: goto c; a: return; c: goto b; } There is no way to know that the "goto c" and "goto b" are dead without some kind of control flow. I don't think singling out these 8 are useful. Considering 28 out of 49 are C++ bugs and at least one of those keep on getting pushed from one release to the next. If we don't care about diagnostic bugs (which we really should but it seems like we don't), then can we just remove the target milestones for all of those too. -- Pinski
Searching for a branch for the see optimization.
Hello, I would like to know if someone knows a suitable branch for the sign extension optimization pass. This pass stands for itself. There are not many changes to the other parts of the gcc. For details see: http://gcc.gnu.org/ml/gcc-patches/2005-08/msg01087.html Thanks, Leehod.
Re: Question about merging two instructions.
> > 1. Can you please give me an example of something bad that can happen to > > the LHS. Maybe I'm missing something here. > > In this case nothing, but if NEW were a subreg, it can change a lot. Why? Do I always need to recognize the result? If the answer is yes, than I suppose that if something bad happens, the recognition will fail. Thanks, Leehod.
Re: Question about merging two instructions.
Do I always need to recognize the result? validate_change and apply_change_group will take care of that. If the answer is yes, than I suppose that if something bad happens, the recognition will fail. No, the problem is when recognition passes, and you have a SUBREG on the LHS that will only modify part of the pseudo. I can't think off the top of mind of a case, but I'd rather err on the side of safety. It may still make sense changing the default case of simplify_replace_rtx to invoke replace_rtx rather than returning x. But this is unrelated, because nobody is currently passing a SET to simplify_replace_rtx (only expressions), and you should do the same: *you* said you want to replace on the RHS, so you really want to invoke simplify_replace_rtx on the RHS. Paolo
Re: Question about merging two instructions.
On Sun, 21 Aug 2005, Leehod Baruch wrote: > >>(insn 1 0 2 0 (set (reg/v:Xmode r) > >>(sign_extend:Xmode (op:Ymode (... > >>(insn 2 1 3 0 (set (lhs) (rhs))) > > 1. Can you please give me an example of something bad that can happen to > the LHS. Maybe I'm missing something here. (set (reg:Xmode r) (sign_extend:Xmode (reg:Ymode p))) (set (subreg:Ymode (reg:Xmode r) 0) (reg:Ymode q)) would be transfomed (by replacing all uses of "reg r" with it's definition on both LHS and RHS) into: (set (reg:Ymode p) (reg:Ymode q)) Originally, r's high part would be set to the signbit of p and r's low part would be set to the value of q. After the transformation, we now overwrite the operand p with the value q, which isn't quite the same thing. > 2. After calling simplify_replace_rtx I try to recognize the instruction. > Is this been cautious or is it unnecessary? Except for register-register moves, all synthesized instructions need to be rerecognized, especially after "RTL simplification". > 3. Isn't it reasonable to expect that every instance on old_rtx will be > replaced by new_rtx even if it can't be simplified? > This is what I understand from the function's documentation. > But actually every expressions that can't be simplified is not replaced. Every instance of old_rtx should be replaced by new_rtx. You may be getting confused by the code to reduce memory usage. If a replacement doesn't occur within all operands/subtrees of a tree, then return this tree. The invariant in the recursion is that if a substitution has been made anywhere in the tree, it returns a newly allocated RTX. Simplification of this newly allocated RTX, will itself return a newly allocated RTX. Hence the testing whether the return value of simplify_replace_rtx matches it's original first argument is a way of determining whether any substitution has been made (whether it was subsequently simplified or not). The one caveat to this is that simplify_replace_rtx is less robust to unrecognized RTL codes than replace_rtx. i.e. it won't traverse UNSPECs or other non-unary/non-binary/non-comparison expressions. This can/should probably be fixed by tweaking the "default:" case to match the GET_RTX_FORMAT loop in replace_rtx. Note this isn't a simple cut'n'paste, as replace_rtx destructively overwrites it's input expression, whilst simplify_replace_rtx returns a different tree if anything changed. Roger --
Re: Question about merging two instructions.
Paolo Bonzini <[EMAIL PROTECTED]> wrote on 22/08/2005 10:10:40: > > > I tried to use simplify_replace_rtx to replace any use of (reg r) with[in] > > > the right-hand-side of the extension and simplify the result. > > If he want to replace uses within the RHS of the extension, he should > pass SET_SRC (pat). He may as well want to handle parallels, in which > case he should write a new function similar to this: I think you misunderstood my original purpose. I did mean [with] and not [in]. Let me explain again. I have these two instructions: (insn 1 0 2 0 (set (reg/v:Xmode r) (sign_extend:Xmode (op:Ymode (... (insn 2 1 3 0 (set (LHS) (RHS))) where: 1. Xmode > Ymode 2. RHS and/or LHS may contain: (subreg:Ymode (reg/v:Xmode r) lowpart) and/or (reg/v:Xmode r). Now I want to replace every *use* of (reg r) in insn 2 with the rhs of insn 1 and simplify the result. This is way the replacement may happen in the LHS of insn 2. Note that I don't want to replace any *def* and uses may appear in the LHS. My plan was to use: replace_regs () to replace every use of (reg r) with the a new pseudo register (because this is the only function that I found that separates the uses from the defs) and then use simplifiy_replace_rtx () to replace that new pseudo register with the rhs of insn 1 and simplify. To make things even more complicated - insn 2 may be PARALLEL. Maybe I should use simplify_rtx (replace_rtx (..))? But it seem to me that simplify_rtx () doesn't deal with SET either. Do you see a better way? Thanks, Leehod.
Re: [patch] Fix behavior of TER on unrolled loops
On Monday 22 August 2005 09:28, Paolo Bonzini wrote: > I think Zdenek's patch is fine, because *everything* in TER is a hack. My thoughts exactly ;-) > > Zdenek, maybe you could you try and see if enabling pre-regalloc > > scheduling would also make this particular problem you're seeing go > > away, just as an experiment... > > Couldn't it also make it much worse? Depends on what the scheduler does with the memory references vs. register pressure. I have no idea. Gr. Steven
Re: Searching for a branch for the see optimization.
On Monday 22 August 2005 14:46, Leehod Baruch wrote: > Hello, > > I would like to know if someone knows a suitable branch for the sign > extension optimization pass. Why not just maintain it in a local tree and post refined versions every now and then, until stage 1 for GCC 4.2 opens? Branches are for major work and a new pass is not that major. Gr. Steven
Re: GCC 4.1 Status Report (2005-08-21)
Andrew Pinski wrote: On Aug 22, 2005, at 1:27 AM, Mark Mitchell wrote: (Quite a few of the diagnostic messages stem from the design decision to issue warnings from the optimizers...) Only 8 out of 49 at that, though some are very minor as two are just complaining wording of the warning. It was a little snarky of me to throw that in there; I do realize this is considered a settled issue. As you know, it's a pet peeve of mine. I still think that until we give up on this approach we're going to see strange warning behavior; the price of our quest for better accuracy will be less predictability and less consistency from release to release. And almost all are uninitialized warnings which are always questionable as there is no way to warn from the front-end without flow control. Most compilers use very simplistic methods for doing these warnings, in their front ends. They still use control flow, but in much simpler ways. The accuracy of the warnings is therefore less than in GCC (in that, generally, other compilers warn less often than GCC, and therefore detect fewer problems), but the number of false positives is generally nearly zero. Though I don't agree, it's certainly true that the consensus has been to try for greater accuracy, by using the optimizers. I'm not trying to upset the apple cart; I was just throwing in a little barb. to the next. If we don't care about diagnostic bugs (which we really should but it seems like we don't), then can we just remove the target milestones for all of those too. No, we should not do that. As you say, we should be trying to fix them. Though, naturally, they're less important, all things equal, to wrong-code, rejects-valid, or ice-on-valid bugs. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Question about merging two instructions.
Note that I don't want to replace any *def* and uses may appear in the LHS. Ok, I see. But you have to cope with *def*s appearing in the LHS: you don't want to replace them, yet your modified simplify_replace_rtx will! My plan was to use: replace_regs () to replace every use of (reg r) with the a new pseudo register (because this is the only function that I found that separates the uses from the defs) Not really, as it will happily replace within a STRICT_LOW_PART, but that may as well be a bug. I think you're best off making your own function for now. And make it use validate_change so that you won't have to call recog manually. Paolo
Re: Question about merging two instructions.
On Mon, Aug 22, 2005 at 03:37:27PM +0200, Paolo Bonzini wrote: > It may still make sense changing the default case of > simplify_replace_rtx to invoke replace_rtx rather than returning x. But > this is unrelated, because nobody is currently passing a SET to > simplify_replace_rtx (only expressions), and you should do the same: > *you* said you want to replace on the RHS, so you really want to invoke > simplify_replace_rtx on the RHS. Agreed. If you're concerned about the LHS, use reg_overlap_mentioned_p and abort the optimization. r~
Re: GCC 4.1 Status Report (2005-08-21)
Mark Mitchell <[EMAIL PROTECTED]> writes: > My first comment is that we had a lot of bugs targeted at 4.1.0 that > should never have been so targeted. Please remember that bugs that do > not effect primary or secondary targets should not have a target > milestone. There are several PRs that seem to have had target > milestones re-added after I removed them before, though it could also > be that I failed to remove the milestone, even though I added a > comment to that effect. PR 17356 is an example of such a PR, though > in this case it looks like it was Andrew Pinski who removed the target > milestone. PR 18190 is another example. In fact, it almost looks > like someone went through and methodically re-added target milestones > to all the PRs for which they had been removed. If that's the case, > please stop! FYI, you can find out when fields like the target milestone were changed by clicking on the "View Bug Activity" link which is just under the "Commit" button. In the case of PR 18190, for example, you removed the target milestone on 2005-01-19, Andrew set it to 4.1.0 on 2005-03-05 (without adding a comment), and you removed it again on 2005-08-22. Ian
Re: Question of pipeline description
Ling-hua Tseng wrote: > It's only correct if the two RISC insns reserved the same RISC function > unit. Try defining two separate reservations for each pipe, e.g. a risc_data_processing_r0 and a risc_data_processing_r1. Then you can write the bypass rule in the obvious way. -- Jim Wilson, GNU Tools Support, http://www.specifix.com
Re: Where have all the conditional moves gone?
Piotr Wyderski wrote: > I have disassembled my program produced by g++ 4.0.0 > and I see a very strange behaviour -- the compiler doesn't > generate cmov-s (-O3 -march=pentium3). G++ 3.4 generates > them. So, how can I reactivate cmov-s in the newest version > of the compiler? fif-conversion doesn't work... I tried a very simple testcase. When I compile it with -O -march=pentium3 -m32 I get a cmov instruction. This is using the gcc-4.0.x branch, on an x86_64 host, with the attached testcase. I am not convinced that anything is broken here. If there really is a problem here, you will need to give us a better bug report. -- Jim Wilson, GNU Tools Support, http://www.specifix.com int sub (int i, int j, int k) { if (k >= 0) i = j; return i; }
The Linux binutils 2.16.91.0.3 is released
This is the beta release of binutils 2.16.91.0.3 for Linux, which is based on binutils 2005 0821 in CVS on sources.redhat.com plus various changes. It is purely for Linux. The new i386/x86_64 assemblers no longer accept instructions for moving between a segment register and a 32bit memory location, i.e., movl (%eax),%ds movl %ds,(%eax) To generate instructions for moving between a segment register and a 16bit memory location without the 16bit operand size prefix, 0x66, mov (%eax),%ds mov %ds,(%eax) should be used. It will work with both new and old assemblers. The assembler starting from 2.16.90.0.1 will also support movw (%eax),%ds movw %ds,(%eax) without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are available at http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch The ia64 assembler is now defaulted to tune for Itanium 2 processors. To build a kernel for Itanium 1 processors, you will need to add ifeq ($(CONFIG_ITANIUM),y) CFLAGS += -Wa,-mtune=itanium1 AFLAGS += -Wa,-mtune=itanium1 endif to arch/ia64/Makefile in your kernel source tree. Please report any bugs related to binutils 2.16.91.0.3 to [EMAIL PROTECTED] and http://www.sourceware.org/bugzilla/ If you don't use # rpmbuild -ta binutils-xx.xx.xx.xx.xx.tar.bz2 to compile the Linux binutils, please read patches/README in source tree to apply Linux patches if there are any. Changes from binutils 2.16.91.0.2: 1. Update from binutils 2005 0821. 2. Support x86-64 medium model. 3. Fix "objdump -S --adjust-vma=xxx" (PR 1179). 4. Reduce R_IA64_NONE relocations from R_IA64_LDXMOV relaxation. 5. Fix x86 linker regression for dosemu. 6. Add "readelf -t/--section-details" to display section details. 7. Fix "as -al=file" regression (PR 1118). Changes from binutils 2.16.91.0.1: 1. Update from binutils 2005 0720. 2. Add Intel VMX support. 3. Add AMD SVME support. 4. Add x86-64 new relocations for medium model. 5. Fix a PIE regression (PR 975). 6. Fix an x86_64 signed 32bit displacement regression. 7. Fix PPC PLT (PR 1004). 8. Improve empty section removal. Changes from binutils 2.16.90.0.3: 1. Update from binutils 2005 0622. 2. Fix a linker versioning bug exposed by gcc 4 (PR 1022/1023/1025). 3. Optimize ia64 br->brl relaxation (PR 834). 4. Improve linker empty section removal. 5. Fix DWARF 2 line number reporting (PR 990). 6. Fix DWARF 2 line number reporting regression on assembly file (PR 1000). Changes from binutils 2.16.90.0.2: 1. Update from binutils 2005 0510. 2. Update ia64 assembler to support comdat group section generated by gcc 4 (PR 940). 3. Fix a linker crash on bad input (PR 939). 4. Fix a sh64 assembler regression (PR 936). 5. Support linker script on executable (PR 882). 6. Fix the linker -pie regression (PR 878). 7. Fix an x86_64 disassembler bug (PR 843). 8. Fix a PPC linker regression. 9. Misc speed up. Changes from binutils 2.16.90.0.1: 1. Update from binutils 2005 0429. 2. Fix an ELF linker regression (PR 815). 3. Fix an empty section removal related bug. 4. Fix an ia64 linker regression (PR 855). 5. Don't allow local symbol to be equated common/undefined symbols (PR 857). 6. Fix the ia64 linker to handle local dynamic symbol error reporting. 7. Make non-debugging reference to discarded section an error (PR 858). 8. Support Sparc/TLS. 9. Support rpm build with newer rpm. 10. Fix an alpha linker regression. 11. Fix the non-gcc build regression. Changes from binutils 2.15.94.0.2.2: 1. Update from binutils 2005 0408. 2. The i386/x86_64 assemblers no longer accept instructions for moving between a segment register and a 32bit memory location. 3. The x86_64 assembler now allows movq between a segment register and a 64bit general purpose register. 4. 20x Speed up linker for input files with >64K sections. 5. Properly report ia64 linker relaxation failures. 6. Support tuning ia64 assembler for Itanium 2 processors. 7. Linker will remove empty unused output sections. 8. Add -N to readelf to display full section names. 9. Fix the ia64 linker to support linkonce text sections without unwind sections. 10. More unwind directive checkings in the ia64 assembler. 11. Speed up linker with wildcard handling. 12. Fix readelf to properly dump .debug_ranges and .debug_loc sections. Changes from binutils 2.15.94.0.2: 1. Fix greater than 64K section support in linker. 2. Properly handle i386 and x86_64 protected symbols in linker. 3. Fix readelf for LEB128 on 64bit hosts. 4. Speed up readelf for section group process. 5. Include ia64 texinfo pages. 6. Change ia64 assembler to check hint.b for Montecito. 7. Improve relaxation failure report in ia64 linker. 8. Fix ia64 linker to allow relax backward branch in the same section. Changes from binutils 2.15.94.0.1: 1. Update from binutils 2004 1220. 2. Fix strip for TLS symbol references. Changes from binutils 2.15.92.0.2: 1. U
[RFA] Nonfunctioning split in rs6000 back-end
While researching who is really using flow's computed LOG_LINKS, I found a define_split in the rs6000 back-end that uses them through find_single_use. It turns out the only users are combine, this split, and a function in regmove. The split dates back to revision 1.5 of old-gcc. ;; If we are comparing a register for equality with a large constant, ;; we can do this with an XOR followed by a compare. But we need a scratch ;; register for the result of the XOR. (define_split [(set (match_operand:CC 0 "cc_reg_operand" "") (compare:CC (match_operand:SI 1 "gpc_reg_operand" "") (match_operand:SI 2 "non_short_cint_operand" ""))) (clobber (match_operand:SI 3 "gpc_reg_operand" ""))] "find_single_use (operands[0], insn, 0) && (GET_CODE (*find_single_use (operands[0], insn, 0)) == EQ || GET_CODE (*find_single_use (operands[0], insn, 0)) == NE)" [(set (match_dup 3) (xor:SI (match_dup 1) (match_dup 4))) (set (match_dup 0) (compare:CC (match_dup 3) (match_dup 5)))] (etc.) This split would turn lis r0,0x1234 ori r0,r0,0x5678 cmpwi cr7,r3,r0 beq cr7,L6 into xoris r0,r3,0x1234 cmpwi cr7,r0,0x5678 beq cr7,L6 Nice trick, but it will not trigger, because the non_short_cint_operand will have been split already before this pattern is matched. I see two possibilities: 1) turning it into a peephole2 like the following: (define_peephole2 [(set (match_operand:GPR 0 "register_operand") (match_operand:GPR 1 "logical_operand" "")) (set (match_dup 0) (match_operator:GPR 3 "boolean_or_operator" [(match_dup 0) (match_operand:GPR 2 "logical_operand" "")])) (set (match_operand:CC 4 "cc_reg_operand" "") (compare:CC (match_operand:GPR 5 "gpc_reg_operand" "") (match_dup 0))) (set (pc) (if_then_else (match_operator 6 "equality_operator" [(match_dup 4) (const_int 0)]) (match_operand 7 "" "") (match_operand 8 "" "")))] "peep2_reg_dead_p (3, operands[0]) && printf (\"^&\")" [(set (match_dup 0) (xor:GPR (match_dup 5) (match_dup 9))) (set (match_dup 4) (compare:CC (match_dup 0) (match_dup 10))) (set (pc) (if_then_else (match_dup 6) (match_dup 7) (match_dup 8)))] { /* Get the constant we are comparing against, and see what it looks like when sign-extended from 16 to 32 bits. Then see what constant we could XOR with SEXTC to get the sign-extended value. */ rtx cnst = simplify_const_binary_operation (GET_CODE (operands[3]), GET_MODE (operands[3]), operands[1], operands[2]); HOST_WIDE_INT c = INTVAL (cnst); HOST_WIDE_INT sextc = ((c & 0x) ^ 0x8000) - 0x8000; HOST_WIDE_INT xorv = c ^ sextc; operands[9] = GEN_INT (xorv); operands[10] = GEN_INT (sextc); }) I'm testing a patch that does this replacement, and I can post it tomorrow morning. It has triggered only a dozen times so far (half in libgcc, half in the compiler), but it may be worth keeping it. However, another possibility is... 2) ... entirely getting rid of it. What do you suggest? Paolo
Re: Building of fixincludes with 4.0.1 uses wrong gcc
[EMAIL PROTECTED] wrote: > 1. bootstrapping the gcc 4.0.1 under Sparc/Solaris I found that the > building in "fixincludes" uses the gcc (with no PATH specification) > instead of the xgcc build by the last stage. It may crash, it happens on > my environment, because I've migrated from Solaris 9 to Solaris 10 where > the includes are not compatible. It looks like the attached file got chopped off, because I can't see what the error message is. There is a problem here, but not the one you suggested. We can't use the just built xgcc to build fixincludes, because gcc isn't a usable compiler until after fixincludes has been built and run. So we have to use the host compiler, the same one used to build gcc. Also, fixincludes has to be built and run before we can bootstrap gcc. So the problem here is that fixincludes is being built a second time, after gcc has bootstraped, and it is being built with the wrong compiler. Presumably you used "cc" for the bootstrap? If the installed gcc is broken, then you wouldn't have been able to use it to bootstrap gcc. If so, cc should have also been used to build fixincludes. It would have been used the first time we built fixincludes, but the second time we build fixincludes, we end up using "gcc" which is wrong. I took a quick look at the top level Makefile, but I don't see an obvious reason why fixincludes is being built twice. We probably should have a bugzilla bug report for this. Oh wait, I see the problem, the first time we build all-build-fixincludes and the second time we build all-fixincludes. We have the same problem with libiberty. By the way, you can fix your old sol9 gcc by rerunning fixincludes. We install a copy of it so it can be re-run after an OS update. Find the directory where the old gcc installed the cc1 file (e.g. gcc --print-file-name=cc1) and then look in the install-tools directory. Not all old gcc versions have this stuff though, so this only works with some recent gcc versions. > 2. A question: The gcc 4.0 uses the built-in specs. Is it correct, that > a "specs" file will not be used as in the 2.x and 3.x versions of the gcc? Sort of. gcc has always had a builtin specs file. By default, we dumped it to a file and installed the file, just in case someone wanted to modify the specs file. But almost no one did, so most of the time this just made gcc unnecessarily slower. So now we changed the default so that the specs file is not installed. So by default, we use the builtin specs. You still have the option of creating and installing a specs file, in case you want to modify it. -- Jim Wilson, GNU Tools Support, http://www.specifix.com
Re: [RFA] Nonfunctioning split in rs6000 back-end
> Paolo Bonzini writes: Paolo> I'm testing a patch that does this replacement, and I can post it Paolo> tomorrow morning. It has triggered only a dozen times so far (half in Paolo> libgcc, half in the compiler), but it may be worth keeping it. It would be nice to keep this type of optimization if the re-engineered version works. Thanks, David
Re: GCC 4.1 Status Report (2005-08-21)
Ian Lance Taylor wrote: Mark Mitchell <[EMAIL PROTECTED]> writes: My first comment is that we had a lot of bugs targeted at 4.1.0 that should never have been so targeted. Please remember that bugs that do not effect primary or secondary targets should not have a target milestone. There are several PRs that seem to have had target milestones re-added after I removed them before, though it could also be that I failed to remove the milestone, even though I added a comment to that effect. PR 17356 is an example of such a PR, though in this case it looks like it was Andrew Pinski who removed the target milestone. PR 18190 is another example. In fact, it almost looks like someone went through and methodically re-added target milestones to all the PRs for which they had been removed. If that's the case, please stop! FYI, you can find out when fields like the target milestone were changed by clicking on the "View Bug Activity" link which is just under the "Commit" button. Aha! I knew it must be there somewhere. Thanks, -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Memory usage reduction in loop.c ?
I think that the structure 'struct loop_info' in loop.c could be shrinked a bit if all the 'int has_XXX' fields where turned into a bitfield just as in 'struct iv_class' or 'struct induction' in the same file. I don't know if it worse it (in term of memory usage reduction) neither the impact in performance. If anyone interested, I can try it and do a bootstrap but I don't have the tools to perform benchmark (memory usage or speed of the compiler) Christophe Jaillet
Re: Question about pointer arithmetics in GIMPLE
On Sun, 2005-08-21 at 20:32 +0200, Falk Hueffner wrote: > Hi, > > I'm trying to implement a tree pass that warns about bad array > accesses as suggested for PR 8268 by Jeff Law. However, I have trouble > with the following: > > char digit_vector[5]; > const char *ggc_alloc_string(int length) { > return digit_vector + ((length - 17) * 2); > } > > this translates to: > > ggc_alloc_string (length) > { > const char * D.1292; > int D.1293; > long unsigned int D.1294; > char * D.1295; > char * D.1296; > > D.1293 = length * 2; > D.1294 = (long unsigned int) D.1293; > D.1295 = (char *) D.1294; > D.1296 = &digit_vector + -34B; <--- > D.1292 = D.1295 + D.1296; > return D.1292; > } > > that is, a pointer is formed that wouldn't be legal to form from C, > and we end up with > > return (char *) (long unsigned int) (length * 2) + > &digit_vector[-00022]; IIRC creating an invalid pointer is OK -- dereferencing the pointer is what's bad. You need to focus on array accesses, pointer dereferences and the like, not pointer generation. Warning for pointer generation is going to be a *lot* harder and I suspect will always result in more false positives. > producing a warning. Is that correct GIMPLE? If so, I fear it simply > isn't possible to do this kind of warnings after gimplification, and, > if at all possible, would have to be done in the front-end after all. Putting these warnings in the front-end is IMHO wrong. They belong in the generic parts of the compiler. Jeff
Re: Question about pointer arithmetics in GIMPLE
> Warning for pointer generation is going to be a *lot* harder and I > suspect will always result in more false positives. In order to increase the accuracy of the data dependence analysis, i do, at some point, plan on tracking the sizes of malloc sites, and giving an upper bound on them (for cases of loops, etc) when possible on an interprocedural basis, that should allow you to generate at least "this is obviously wrong" warnings.
Bug in builtin_floor optimization
There is some clever code in convert_to_real that converts double d; (float)floor(d) to floorf((float)d) (on targets where floor and floorf are considered builtins.) This is wrong, because the (float)d conversion normally uses round-to-nearest and can round up to the next integer. For example: double d = 1024.0 - 1.0 / 32768.0; extern double floor(double); extern float floorf(float); extern int printf(const char*, ...); int main() { double df = floor(d); float f1 = (float)floor(d); printf("floor(%f) = %f\n", d, df); printf("(float)floor(%f) = %f\n", d, f1); return 0; } with -O2. The transformation is also done for ceil, round, rint, trunc and nearbyint. I'm not a math guru, but it looks like ceil, rint, trunc and nearbyint are also unsafe for this transformation. round may be salvageable. Comments? Should I preserve the buggy behavior with -ffast-math?
gcc.c-torture/execute/stdarg-2.c: long vs int
This test assumes that integer constants passed as varargs are promoted to a type at least as big as "long", which is not valid on 16 bit hosts. For example: void f1 (int i, ...) { va_start (gap, i); x = va_arg (gap, long); int main (void) { f1 (1, 79); if (x != 79) abort (); Shouldn't those constants be 79L, not just 79? That change fixes one m32c failure, but given that it's a test case I'm not going to make any assumptions about it.
compile error for 990203-1.c
I'm porting a back-end for gcc. My back-end crached in the compile test pattern 990203-1.c, and the error message is main.c:7: internal compiler error: in purge_addressof, at function.c:3423 for (insn = insns; insn; insn = NEXT_INSN (insn)) if (INSN_P (insn)) { if (! purge_addressof_1 (&PATTERN (insn), insn, asm_noperands (PATTERN (insn)) > 0, 0, 1, ht)) /* If we could not replace the ADDRESSOFs in the insn, something is wrong. */ abort (); /* If we find a REG_RETVAL note then the insn is a libcall. Such insns must have REG_EQUAL notes as well, in order for later passes of the compiler to work. So it is not safe to delete the notes here, and instead we abort. */ if (! purge_addressof_1 (®_NOTES (insn), NULL_RTX, 0, 0, 0, ht)) { /* If we could not replace the ADDRESSOFs in the insn's notes, we can just remove the offending notes instead. */ rtx note; if(REG_NOTE_KIND (note) == REG_RETVAL) abort (); <--- die here ... The pattern 990203-1.c is int f (f) float f; { long long *ip = (long long *) &f; return (*ip & 0x7ff0LL ) != 0x7ff0LL ; } (insn 26 24 27 0 (set (reg:DI 172) (reg:DI 172)) 10 {movdi_split} (nil) (insn_list:REG_RETVAL 25 (expr_list:REG_EQUAL (and:DI (mem:DI (addressof:SI (reg/v:SF 169) 166 0x402ade58) [0 S8 A64]) (const_double 0 [0x0] 2146435072 [0x7ff0] 0 [0x0] 0 [0x0] 0 [0x0] 0 [0x0])) (nil Setting a break point (with cond insn->u.fld[0].rtint == 26) in expression: if (! purge_addressof_1 (®_NOTES (insn), NULL_RTX, 0, 0, 0, ht)) and use gdb to dump rtl gdb> p print_rtl(stderr, REG_NOTES(insn)) (insn_list:REG_RETVAL 25 (expr_list:REG_EQUAL (and:DI (mem:DI (addressof:SI (reg/v:SF 169) 166 0x402ade58) [0 S8 A64]) (const_double 0 [0x0] 2146435072 [0x7ff0] 0 [0x0] 0 [0x0] 0 [0x0] 0 [0x0])) (nil)))$2 = void which is just the same as the rtl dump. But the mips back-end get a (insn_list:REG_RETVAL 21 (expr_list:REG_EQUAL (and:DI (mem:DI (addressof:SI (mem/f:SF (reg/f:SI 77 $arg) [0 S4 A32]) 182 0x402b5a6c) [0 S8 A64]) (const_int 9218868437227405312 [0x7ff0])) (nil)))$13 = void my back-end ->(reg/v:SF 169) mips back-end ->(mem/f:SF (reg/f:SI 77 $arg) [0 S4 A32]) It's the mem rtx makes the difference, but I don't know where the mem rtx comes from. I have no idea about it. All those insn notes are generated by gcc and my back-end knows nothing about it Any help appreciated. Thanks
Warning Behavior
Hello, How come the following code would not be considered a Warning? Surely there is no possible way this would be intentional? if (x<4); x++; Cheers, Ivan
Re: Warning Behavior
Ivan Novick wrote: Hello, How come the following code would not be considered a Warning? Surely there is no possible way this would be intentional? if (x<4); x++; When you consider macro expansion, it could: #if SIZEOF_LONG == 4 #define WARN_FOR_BIG_VALUES \ printf ("hey, x is too big, keeping low bits only!") #else #define WARN_FOR_BIG_VALUES /* nothing */ #endif ... long f (long long x) { if (x > LONG_MAX) WARN_FOR_BIG_VALUES; return x & LONG_MAX; } (I'm not saying this is good code). Paolo