Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions
Hi Andrew, On Wed, 25 Oct 2006, Andrew Haley wrote: > I must admit to being a little perplexed by this. > > We have an unsafe optimization that causes bad code to be generated on > at least one platform. However, we want to continue to perform this > unsafe optimization on our release branch until we are sure that > removing it doesn't cause performance regressions. And, perhaps, if > removing the optimization does cause performance regressions we won't > remove it, preferring bad code to reduced performance. > > Is that a fair summary? Perhaps I'm misunderstanding what you wrote. The issue is that there are two possible fixes to this problem... This wrong code regression was introduced by the recent bug-fix patch to address an aspect of PR middle-end/28690, a missed optimization regression affecting gcc 4.2 and 4.3. That change then exposed a latent bug in the RTL optimizers that has been there 2002. The ultra-conservative approach would be to simply back-out the patch that exposed the bug, and simply accept the performance degradation. The alternative is to progress towards "our ideal world" by actually fixing the problem rather than working around the issue. Unfortunately, moving forward is itself not without risk, both to the tree's stability and to the performance impact of removing an optimization that's been there for years. The fixes trade-off one performance issue against the possibility of another. In my professional judgement as a middle-end maintainer, the appropriate course is to move mainline forward to better evaluate the risks and tradeoffs of the proposed changes. Then depending upon the outcome, decide which of the two solutions should be backported to the release branches. Given the expected release timings, it makes sense to avoid "acting in haste" and to make the right decision. If we're in a rush, I'd push for reverting the PR28690 from the release branch and return us to a known state, and accept the performance loss. But if Richard Sandiford's and David Daney's patch works out, we can correct the wrong-code issue, without the performance loss. Once explained, I'd expect most maintainers would make precisely the same call? Roger --
Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions
On Wed, 25 Oct 2006, Richard Sandiford wrote: > Roger Sayle <[EMAIL PROTECTED]> writes: > > Once explained, I'd expect most maintainers would make precisely the > > same call? > > I suppose the counter-argument is that we shouldn't ship 4.2 in its > current state. We should either back out the patch that made > REG_POINTER more prominent or go with the fix for this PR. I repeat that its not an issue of does it get fixed or not, but of which is the more suitable fix. I think it's very safe to assume that we have at least a few days to evaluate our options on mainline rather than make a rush decision. With 108 serious regressions including 12 P1s, of course we shouldn't ship 4.2 in its current state. We've not yet had a release candidate, or timeline for release. I've absolutely no intention of not selecting which fix to backport to the release branch long before the release's due date. My counter-counter argument would be that it's the rush to backport changes to the release branch that has caused this problem. If the PR 28690 fix had had longer on mainline, such that we'd identified its side-effect of breaking libgcj on MIPS, it wouldn't have been applied to the branch before we'd fully understood which other changes are also required. The 28690 fix was fully tested on IA-32 and PowerPC and there was no fault in applying that fix; it's fallout was unforeseeable. There's also no fault on the MIPS maintainers and testers for not identifying the issue earlier. Indeed, you came up with a corrective fix very quickly. But once again, this change is not 100% safe (very few changes to the compiler ever are). There is a very real possibility that disabling this optimization could have a significant performance impact on non-MIPS platforms. Indeed, that David's seen a large number of code generation differences means that I'd like to get a better handle of its effects on benchmarking. Perhaps we should discuss this further on IRC, or open up the thread to general discussion and comment. Back when I first started contributing to GCC, it was not uncommon to be expected to bootstrap RTL patches on five or six targets, including simulators and to present SPEC figures. Several of my large changes took many months to get approved, and for all that the statistics on how frequently mainline bootstrap was broken were abysmal. Even with any amount of contributor and reviewer testing, it's inevitable that bugs slip through. The huge size of software engineering projects such as GCC means that the combinatorial number of interactions between different components means that its impossible for an individual or any goup of maintainers to predict how they'll play with each other. My belief/position is that rather than become paralysed by this complexity, the work-flow needs to be adapted to accomodate for it. One of the contributing factors for the egcs-FSF split was poor reviewer response time. Hence, my vote is to be more accepting and lenient of patches, instead relying on the power of GCC's community to support patch validation. Very few contributors have access to MIPS hardware, or alpha hardware, or SPEC, or nullstone, etc... but our project benefits from the fact that one of the major "unseen" contributions is that folks regularly contribute their time to build, test and benchmark. The open-source principle that "given sufficient eyes all bugs are shallow". Rather than be (what I consider) unreasonable and require that David run SPEC on at least both PowerPC and IA-32 to verify there's no slow-down or quantify how severe it is, I have the option to provisionally approve patches for mainline, and then look at Deigo's, Andreas's and Richi's automatic benchmarking, and the results of the autotesters. I honestly would like to have the ability to look at a patch and say with 100% confidence and certainty, and without reservation that it should be applied to the 4.0 branch. However, waiting 48 or 72 hours to obtain more feedback not only "covers my arse", but should be common sense and "best practice". Now I completely agree that there is a real downside to this approach. The strategy leads to the additional complexity of large numbers of patches in flight, applied to different branches and in different orders. Fortunately for us, we have excellent engineering practices in place to keep track of this. Bugzilla (and the bug masters) continually maintains an up to date list of which bugs are present on which branches. A release manager can instantly see which bugs still affect an upcoming release, *and* see that they may have been fixed on mainline. It's also no additional overhead to the contributor. Our rules for applying patches to the branches require that each patch must be tested against the branch to whic
Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions
On Wed, 25 Oct 2006, David Daney wrote: > The patch is fully tested and ready to go for the 4.2 branch. The last thing I want is for this fix to get delayed whilst we argue over patch testing/approval policy. This fix addresses the known wrong-code issue, and at worst may replace it with missed optimization opportunity. Hence although we don't know for sure whether this is better than reverting the patch that caused the regression, it's certainly better than where 4.2 is now, and keeps us sync'd with mainline. Ok for the 4.2 branch. Sorry for the confusion. Hopefully, there'll be no more surprises. Roger --
Re: [PING] fwprop in 4.3 stage 1?
Hi Paolo, On Mon, 30 Oct 2006, Paolo Bonzini wrote: > Given that Roger started the ball rolling, by approving Steven's > -fcse-skip-blocks patch, I'll ping the discussion... > > http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01066.html I believe the appropriate next step is to freshen these patches against mainline, and re-bootstrap and regression test them. Not only do the patches you cite date back to last May, but they haven't been revised to reflect the reviewer comments they recieved. For example, Joseph Myers pointed out they contained the wrong FSF address. I might also be getting confused, but I also remember that supporting patches were committed to the tree, to address some of the performance issues with these patches, so benchmarking should look better now. I foresee no problems in getting the fwprop pass merged into mainline this week. One detail I would like resolved however, is if you and Steven Bosscher could confirm you're both co-ordinating your efforts. Presumably, adding fwprop is part of the agreed upon game-plan, and not something that will complicate Steven's CSE efforts. Roger --
Re: GCSE again: bypass_conditional_jumps -vs- commit_edge_insertions - problem with ccsetters?
On Tue, 31 Oct 2006, Dave Korn wrote: > Tracking down a gcse bug while unrolling a loop where the count is > known to be one, I've narrowed the problem down to the actions of > commit_edge_insertions and bypass_conditional_jumps, and I could use > a little help in appreciating the reasoning behind what they're /trying/ > to do. Sorry for the delay, but I only read the "gcc" list infrequently and have only just seen your question. > Is the bug perhaps because the bypass code notes that there are two code > paths from BB#2 to BB#7, notices that the indirect one depends on a > condition that it knows is true, but doesn't realise that that doesn't > mean they are equivalent to each other? Aftre reading through your analysis I suspect this is precisely the problem. The jump bypassing code is making the assumption that there can temporarily be two (or more) edges between the same pair of basic blocks. The distinction being that one edge has an instructin inserted on it, and the other one doesn't. I suspect that somewhere this is falling foul of some other CFG related optimization that says if both branches of a conditional jump go the same label, then the conditional jump is redundant and can be replaced by an unconditional jump. One the CFG believes that BB#2 has only one sucessor, then commit_edge_insertions is free to place things there. Your investigation and understanding is flawless so far. We now need to figure out how these edges are getting merged. If this is a side-effect of recent CFG/RTL related changes jump bypassing might need to be restructured/rewritten to avoid using the insn on edge functionality. Steven Bosscher might even have plans for reorganizing jump bypassing already as part of his CSE/GCSE overhaul? Roger --
Re: Unsure about a new warning in mainline
Hi Manuel and Paolo, On Sun, January 14, 2007 3:59 pm, Paolo Carlini wrote: >>> By the way, "new" also wrt current 4_2-branch, in the sense that the >>> latter doesn't emit *any* overflow warning for the snippet in 30465, >>> with explicit -Wconversion too... I think I can explain everything. The first issue, is that although the C front-end has issued these warnings for years, a recent change to the C++ front-end means that it now emits them too. This is why Paolo hadn't seen them before, but from the C/middle-end perspective the code is unchanged. The second issue is that there are two different overflow related warnings. The first warning indicates that an overflow has occurred in an expression, and the second is that the overflowed result is used in a context required to be a compile-time constant. Apparently, the latter is more serious than the first, as an expression that overflows isn't considered a "constant" according to the/some standard. Manuel's confusion here is that recent changes mean that we now only emit one warning of the first kind in an expression. Previously, for suitably nasty expressions we'd generate multiple such warnings. As for example, in "- - - - - - INT_MIN". Formerly, we'd issue one "overflow warning" per negation. However, we still continue to generate one warning for any overflow in an expression, and another if the result of that expression is used in a context required to be a constant. Hopefully, these differences can be seen by compiling the problematic code given in PR 30465 with "gcc" then with "g++". I agree with Paolo that this is a change for C++, and should at least be documented in changes.html, and probably be tweaked to avoid warning in system headers. However, it's also odd that in this respect C has had stricter/better diagnostics that C++ for some time. Specifically, for PR 30465 "((T)1 << 31) - 1" is potentially undefined when T is a 32-bit signed type, but well-defined if T is unsigned or wider than 32-bits. I hope this helps. Roger --
Re: [patch] fold-const.c: Reorganize fold - Part 1/n
I should probably also add that if any of the branches scheduled for merge prior to 4.1 contain/require changes to "fold", perhaps for additional transformations they need, I'll be glad to review/approve those specific changes now, ahead of their branch's planned merge dates to allow them to synchronize fold-const.c prior/during the current "reconstruction". This should minimize or eliminate the conflict headaches for pending branch maintainers (for this source file at least). I hope this seems reasonable to everyone. During this stage of reorganization there are no external/visible API changes, so source files other than fold-const.c are/will be unaffected. Hopefully everyone should find this satisfactory. Mark, are you OK with this single source file exception/tweak to the current plan/rules? I'd expect few, if any, of the development branches contain any changes to fold, and they're welcome to maintain those changes locally until the prescribed time, but I thought I'd make the offer anyway. Roger --
Re: A headache with fold_ternary and CALL_EXPR.
On Thu, 3 Mar 2005, Kazu Hirata wrote: > If we want to change fold_builtin to take operands like op0, op1, and > op2, I would have to change so many things that depend on > fold_builtin. (Note that the subroutines of fold_builtin also depends > on fold_builtin in a sense that they need the original tree, one whose > TREE_CODE is CALL_EXPR is available.) > > So the question is: Is it worth changing all these? Or should I stop > folding builtins in fold_ternary? I don't think this is as bad as it first appears. Most of the individual fold_builtin_foo functions in builtins.c take the decomposed CALL_EXPR, i.e. the fndecl, the arglist and the type. I'm fairly sure those that take the full tree "exp" are relatively easily tweaked to take fndecl, arglist and type instead. Following on from the current strategy (game plan?) for fold, I think we should do something identical with fold_builtin. We keep the external interface to fold_builtin the same, but underneath the surface we change fold_builtin_1 to take the individual components. For the time being we could probably move the: if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD) return targetm.fold_builtin (exp, ignore); from fold_builtin_1 to fold_builtin which has the full tree. Once the transition of fold_builtin_1 to fndecl/arglist/type is complete we can export it outside of builtins.c, and call this API from your new fold_ternary. I appreciate its more work, but it keeps with the philosophy for this reorganization. > Thoughts? Likewise? Roger --
Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary
Hi Alex and Mark, On 7 Mar 2005, mark at codesourcery dot com wrote: > Yes, I understand. You still need to take it up with Roger, though. My apologies to both of you for being curiously/annoyingly silent on this is issue. I've been getting up to speed on the internals of the C++ parser, in order to come up with a correct solution to this problem. For example, I believe that Alex's proposed solution to PR c++/19199 isn't an appropriate fix. It's perfectly reasonable for fold to convert a C++ COND_EXPR into a MIN_EXPR or MAX_EXPR, as according to the C++ front-end all three of these tree nodes are valid lvalues. Hence it's not this transformation in fold that's in error. It also doesn't help with the problematic C++ extension: (a >? b) = c; which use MIN_EXPR/MAX_EXPR as a valid lvalue in C++ without any help from "fold". In all fairness, though I suspect most of the middle-end is currently set-up to handle the implications of this. For example, in several places we perform transformations that assume MIN_EXPR and MAX_EXPR are commutative, but this is only true for lvalue MIN_EXPR and lvalue MAX_EXPR. For rvalue MIN_EXPR and rvalue MAX_EXPR, the semantics need to specify a reference to the first operand is returned for values comparing equal. I also suspect that the gimplification process may not be preserving the lvalue-ness during lowering. Simply disabling the COND_EXPR -> MIN_EXPR/MAX_EXPR transformation is also likely to be a serious performance penalty, especially on targets that support efficient sequences for "min" and "max". Clearly, it's always safe to only perform this when one side or other is an rvalue, but taking the maximum of two user defined variables, I would expect to be a fairly common instance. My current thinking of how best to solve this falls into one of three plans. The first is to see if in the C++ parser provides enough information such that it knows we're if parsing an lvalue or an rvalue. In these cases, the C++ front-end can avoid calling "fold" on the COND_EXPR its creating in build_conditional_expr. This only impacts that C++ front-end and doesn't impact performance of C/fortran/java code that may usefully take advantage of min/max. If the parser can't be used to provide useful context, we could potentially postpone the calling of "fold" for C++ COND_EXPRs during parsing, but call "fold" on COND_EXPRs while lowering to gimple, where we'll always know whether we're expecting lvalues/rvalues. An improvement on this approach is for the C++ front-end never to create COND_EXPR, MIN_EXPR or MAX_EXPR as the target of MODIFY_EXPRs, and lower them itself upon creation. By forbidding them in "generic", support for COND_EXPRs as lvalues can be removed from language-independent gimplification, fold, non_lvalue, etc... greatly simplifying the compiler. Finally, the third approach would be to introduce new tree codes for LCOND_EXPR, LMIN_EXPR and LMAX_EXPR that reflect the subtley different semantics of these operators when uses as lvalues. If these were then created by the C++ front-end when a lvalue is potentially required, it would provide a convenient mechanism for the middle-end to distinguish which transformations and implementation/lowering strategies are appplicable. My final comment is that in the lowering of (a ? b : c) = d by using a pointer temporary runs into the problems with bitfields "b" and "c", and marking of objects addressable when they need not be. i.e. t = a ? &b : &c; *t = d; runs into problems. A better approach it to use a temporary for "d". i.e. t = d; if (a) b = t; else c = t; If C++ semantics don't allow for a new temporary (i.e. a SAVE_EXPR) in this case, then we can do far worse than to duplicate "d", i.e. if (a) b = d; else c = d; Sorry again for my hesitancy in replying, but I'm desparately trying to get up to speed on the options available to the C++ front-end before recommending the correct long term fix. Does anyone have opinions on each of the above approaches? [I predict the tree-ssa folks will argue we disallow COND_EXPRs as lvalues, and the C++ folks will argue that knowing whether an expression appears before "=" is difficult (requiring deep lookahead) at parse time.] "Stuck-in-the-middle"-end! Roger --
Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary
On Mon, 7 Mar 2005, Mark Mitchell wrote: > Roger Sayle wrote: > > For example, I believe that Alex's proposed solution to PR c++/19199 > > isn't an appropriate fix. It's perfectly reasonable for fold to convert > > a C++ COND_EXPR into a MIN_EXPR or MAX_EXPR, as according to the C++ > > front-end all three of these tree nodes are valid lvalues. Hence it's > > not this transformation in fold that's in error. > > Only sort-of: you added the support for MIN_EXPR/MAX_EXPR as lvalues in > build_modify, but not the same kind of back-to-COND_EXPR transformation > elsewhere. I truly hope you're not trying to suggest that it was me that introduced the concept of MIN_EXPR and MAX_EXPR as lvalues into the C++ front-end: cvs annotate -r1.1 cp/tree.c 1.1 (law 11-Aug-97): int 1.1 (law 11-Aug-97): real_lvalue_p (ref) 1.1 (law 11-Aug-97): tree ref; 1.1 (law 11-Aug-97): { ... 1.1 (law 11-Aug-97): switch (TREE_CODE (ref)) 1.1 (law 11-Aug-97): { ... 1.1 (law 11-Aug-97): case MAX_EXPR: 1.1 (law 11-Aug-97): case MIN_EXPR: 1.1 (law 11-Aug-97): return (real_lvalue_p (TREE_OPERAND (ref, 0)) 1.1 (law 11-Aug-97):&& real_lvalue_p (TREE_OPERAND (ref, 1))); I'm just a guy trying to sort out the mess. Clearly, a fraction of the compiler currently supports the use of COND_EXPR, MIN_EXPR and MAX_EXPR as lvalues and another fraction doesn't. This issue predates CVS revision 1.1 (a.k.a. the dawn of time). The very simple decision that we face is whether to change the compiler consistently to allow lvalue COND_EXPRs everywhere, or to consistently disallow them everywhere. Most of this decision making to date, has been on how easy C++ is to parse/how well constructed the C++ parser is. Disabling compiler optimizations to workaround a few instances of this problem isn't a "fix". Getting us out of this mess has only become possible with the advent of tree-ssa, and tree walking passes between parsing and RTL expansion. Roger --
Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary
On Mon, 7 Mar 2005, Mark Mitchell wrote: > Roger Sayle wrote: > > I truly hope you're not trying to suggest that it was me that introduced > > the concept of MIN_EXPR and MAX_EXPR as lvalues into the C++ front-end: > > I thought you were the person who introduced changes to fold that caused > it to generate these expressions when the GNU source extension did not > appear. Anyhow, who did what to whom isn't important. :-) I do appreciate the smiley, but I still feel unfairly maligned by such accusations. Once again, fold-const.c CVS version 1.1, converts COND_EXPRs into MIN_EXPRs and MAX_EXPRs, though admittedly under the control of "pedantic_lvalues". I suspect that with the various rearrangements of GCC's ANSI/ISO warnings since then, the distinction got lost. I agree "who did what to whom isn't important", but it is important (to me) that neither subject nor object was me! :-) I'm sure you'll agree that its bad for the community, if the handful of people fixing problems get blamed for causing them. I've certainly been fixing bugzilla PRs under the assumption that there'd be no stigma attached to fixing wrong-code regressions. If I only fixed problems caused by me, I'd have an easy life. > Actually, I think it's always been fixable. It's not a question of > extra passes; it's just a question of ordering. We could have done fold > at the end of a statement (rather than while building up a statement) > even back in GCC 2.8, and that would have fixed the problem. Depends upon your definition of a "pass". fold isn't recursive, i.e. it only rearranges the top-level node of the tree assuming that all of it's subtrees/operands have already been folded. Hence, to perform "fold" at the end of a statement would still require an additional post-order traversal of the entire tree. Whether its done per statment, per function or per compilation unit, isn't usually significant in the definition of what constitutes a pass. I still believe the issue is resolvable for 4.0, but I just haven't completely figured out the safest, least-intrusive change to do it yet. Roger --
Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary
On Mon, 7 Mar 2005, Richard Henderson wrote: > On Mon, Mar 07, 2005 at 08:55:14AM -0700, Roger Sayle wrote: > > For rvalue MIN_EXPR and rvalue MAX_EXPR, the semantics need > > to specify a reference to the first operand is returned for values > > comparing equal. > > NOT true. And the docs explicitly say so. Which docs?! There's currently *no* documentation for MIN_EXPR or MAX_EXPR in c-tree.texi. Indeed the only references to these within the entire docs subdirectory is for calling REAL_ARITHMETIC. And the commentary above MIN_EXPR and MAX_EXPR in tree.def also has absolutely nothing to say on the matter. Even the descriptions of the extensions ">?" and " /* In C++ a ?: expression can be an lvalue, so put the >operand which will be used if they are equal first >so that we can convert this back to the >corresponding COND_EXPR. */ And as an example of why this is relevant, see bugzilla PR c++/7503, particularly the example in comment #1. As has been described earlier on this thread, GCC has folded the C++ source "(a >= b ? a : b) = c" into "MAX_EXPR (a,b) = c" and equivalently "(a > b ? a : b) = c" into "MAX_EXPR (b,a) = c" since the creation of current CVS. The correct behavior of this code relies on (and has always relied on) the fact that the middle-end prescribes a behaviour for equal comparison. In balance, to support your side of the argument, ever since the same creation of current CVS, the middle-end has considered MIN_EXPR and MAX_EXPR to be commutative operations. We're screwed if we do and screwed if we don't. Basically the lvalue forms of these tree nodes have different algebraic properties to the rvalue forms. I'm looking forward to any clarification you can make to your comments. Roger --
Re: expand_binop misplacing results?
Hi DJ, On Mon, 21 Mar 2005, DJ Delorie wrote: > 2005-03-21 DJ Delorie <[EMAIL PROTECTED]> > > * optabs.c (expand_binop): Make sure the first subword's result > gets stored. This is OK for mainline, provided that you bootstrap and regression test it somewhere. Thanks. You're quite right that callers can't rely on expand_binop placing the result in target, however most backends and RTL expansion sequences try hard to honor the request. There does appear to be a bug in the "i == 0" case, which has probably never been an issue as most targets are either able to place the result of the addition/subtraction in the requested destination or provide their own adddi3/addti3 expanders. Thanks for finding/fixing this. This might be a candidate for backporting to the GCC 4.0 branch if we can find a target/testcase that triggers a problem. Roger --
Re: [PATCH] Cleanup fold_rtx, 1/n
On Wed, 13 Apr 2005, Steven Bosscher wrote: > My gcov attachment to PR19721 shows that the code does not > trigger on amd64 either. Hi Paolo and Steven, I'm still reading through the patch, but I think the wider issue of using profiling to determine which of GCC's transformations are worthwhile and which are not deserves some broader discussion. I admit that fold_rtx is perhaps a special case, as we're trying to eliminate/reduce the significant amount of compile-time spent in CSE. More generally, I'm uncomfortable with using profile techniques that determine whether a piece of code ever triggers to decide whether it has benefit or not. Such protocols are inherently dependent upon the source code corpus being used. In this case the comments in PR19271, suggest that you're using 571 source files from GCC's own cc1. I suspect that in this entire "coverage suite", there isn't a single "complex" or "vector" operation, and the total number of floating point operations is severely under-represented compared to SPECfp for example. We know that cc1 doesn't link against the host's libm, for example, so none of GCC's math builtins would be triggered. A slightly better approach would be to include the dejagnu testsuite in the coverage analysis, as the preferred policy of adding a new testcase to trigger each transformation/optimization might help here. Of course, this policy isn't rigorously enforced, so I'd still be uncomfortable removing code based simply on the fact that we couldn't get it to trigger. Of course, there are no clear black-and-whites in such arguments and everything is a compromise. For example, even if well-written pattern matching code should take little extra time if they never match (e.g. unreachable cases in switch statements), the additional code leads to larger/slower binaries and presents a maintenance overhead. Alternatively, code that never triggers can't cause any bugs (but may bit-rot until it does trigger at some point in the future). My personal preference remains that these RTL optimizations should be moved from fold_rtx to simplify_rtx when possible, and those that depend upon the CSE'ing different functional forms are left. For example, one of the "???" comments in your patch recommends that that transformation be moved to combine.c, which is probably as true today as it was when it was first written. Thoughts? Opinions? I'm not trying to be a road block on the path of progress, but I'd just like to get a feel for where we intend to draw the line with these protocols. If taken to extremes, these policies can clearly be dangerous (if none of these cc1 files contains K&R prototypes, perhaps we could drop parser support for pre-ANSI C, etc...). Roger -- Roger Sayle, E-mail: [EMAIL PROTECTED] OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
My opinions on tree-level and RTL-level optimization
I seem to have inadvertantly annoyed Steven Bosscher on IRC, so by way of an apology and explanation I thought I'd post my current opinion and thinking on the optimizations performed by GCC's middle-end both at the tree-level in the tree-ssa optimizers and at the RTL-level in the exisiting RTL-optimizers. The intention is to (i) explain my current position, understanding and misunderstandings and (ii) for people to disagree and/or convince me otherwise. My current opinion is that for the medium-term, let's arbitrarily say prior to GCC 5.x, GCC will continue to need to perform optimizations at both the tree-level and the RTL-level. The tree-level, I see as fundamentally responsible for high-level and machine independent transformations, and the RTL optimizers for the lower-level machine-dependent transformations. It's unlikely that RTL, or an instruction-based, representation of a program will ever go away. And the RTL passes corresponding to instruction selection (combine), register allocation (reload) and scheduling will almost certainly be RTL-level tasks/passes. But the advent of the tree-ssa infrastructure, the other RTL "optimization" passes, notably CSE, GCSE, loop, if-conversion, and jump bypassing may not be long for this world. And certainly not preserved in their current incarnations. The difficulty here is that by expressing a program as low-level insns, we expose optimization opportunities not visible as trees. The RTL expanders and backend's pattern expanders inadvertantly introduce redundancies and inefficiencies, or more commonly by providing machine-specific idioms open up code transformations, for example with SUBREGs, that just didn't occur previously. For example, expand_mult may produce a sequence of shifts and additions for an integer multiplication. If there are two integer multiplications by a constant in a basic block, CSE will be able to identify shared terms between their shift/add sequences that aren't visible to the tree-level multiplications. One might argue that the shifts/adds should be exposed at the tree-level. But not only does this require detailed knowledge of the instructions available on a machine, but it would also obfuscate the multiplication from the tree-ssa optimizers, such as iv-opts, etc... Instead, representing the multiplication at the tree-level as a MULT_EXPR, propagating constants into it, and letting expand do it's thing gets the best of both worlds. This doesn't constrain us to keep all of GCC's original RTL optimizers though. Unlike gcc 3.x and earlier, we now expect tree-ssa to do most of the heavy lifting. One goal is to have the initial RTL that we generate to be pretty close to it's final form. Perhaps only requiring minor clean-ups. I forsee a world in which RTL transformations become far more local. Perhaps the RTL-level loop optimizers will disappear completely. Although, RTL expansion may introduce new loops, these tend to be rare, and the expanders have all the information they need to hoist/sink invariant expressions and unroll/peel themselves. Certainly there should be no need for RTL-level loop optimizations to do loop unrolling or other large scale reorganization. Simiarly, CSE shouldn't need to process more than a single basic blocks, and GCSE shouldn't need to move anything other than simple expressions. The quality of alias analysis at the RTL-level shouldn't be an issue. This brave new world also has a profound impact on how backends are written. Previously, because with the exception of "fold" and "expand", all optimization was performed on the RTL-level, backend's played tricks to virtualize their hardware to the middle-end in an attempt to allow the RTL optimizers to understand what was going on. Virtual high-level patterns would be introduced for the benefit of the RTL optimizers, only to be split later. The problem with this is that by splitting/lowering after the RTL optimizers any inefficiencies that are introduced are too late to be cleaned up. Perhaps one of the most notorious examples of this is the handling of DImode on IA-32. The i386.c backend pretends that it has native DImode support prior to reload. Whilst in gcc 3.x, this is essential, constant propagation and constant folding wouldn't work with highparts and lowparts, and for example, the loop analysis would be unable to do anything with a loop using a "long long" counter once the loop exit test has been decomposed into highpart and lowpart subtests. Unfortunately, the late splitting leads to terrible register allocation, and numerous dead loads/stores to/from memory. SPECint's crafty suffers. With tree-ssa, however, these design decisions can and need to be re-evaluated. Now that a "long long" for-loop will be reversed at tree-level, the x86 backend is far better of decomposing DImode early, and getting the benefit of RTL optimizers to eliminate the dead/load stores etc... One side-effect of such a move is that although we genera
Re: My opinions on tree-level and RTL-level optimization
On Sat, 16 Apr 2005, Richard Kenner wrote: > Although, RTL expansion may introduce new loops, these tend to be > rare, and the expanders have all the information they need to > hoist/sink invariant expressions and unroll/peel themselves. > > I disagree. In order to make the proper decisions about merging givs > and chosing which giv should represent a biv, you have to know a lot > about the valid addressing modes on the machine and this isn't something > the tree level optimizers should have to deal with. > ... > > Simiarly, CSE shouldn't need to process more than a single basic > blocks, > > Again, not clear. Certainly the costly stuff I put in ages ago to > walk through comparisons and around loops needs to go, but there's > no reason to tie CSE to a basic block: it can operate until the next > label, like it does now. Admittedly, the number of CSE opportunities > won't be great, but why restrict them to a basic block? > > and GCSE shouldn't need to move anything other than simple > expressions. > > Why would we need a GCSE at the RTL level at all? I'd guess the number > of wins it would produce would be very small. > > The quality of alias analysis at the RTL-level shouldn't be an issue. > > Here I disagree the strongest! Instruction scheduling is rapidly > becoming one of the most critical optimizations and must be done at > the RTL level. The quality of instruction scheduling depends quite > heavily on the quality of the aliasing information. Thanks for your feedback. I agree with all your points. I had greatly underestimated the importance of RTL alias analysis, especially with respect to scheduling. The compile-time hog issues with CSE are primarily due to deep path-following (AFAIU). Simple/cheap tricks are such as extended basic blocks are clearly a win, but their benefit depends on whether we keep GCSE. If we don't then EBBs of course, if we do, the EBBs are probably subsumed by GCSE. And its very true at the RTL-level addressing modes have always been an Achilles' heel. But whilst induction variable selection is very dependent upon target parameters, as is prefetching, but it's not yet clear whether uglifying tree-ssa or continuing to improve RTL loop analysis is the better long-term strategy. I take it from your comments, that you are in the camp that believes that "the sun has not yet set" on the need for RTL optimizers. :-) The intent of my post was to defend what was seen as my pro-RTL stance, it's interesting to see that the loudest feedback is that I'm underestimating/down-playing the importance of the RTL optimizers. Much appreciated, Roger --
Re: My opinions on tree-level and RTL-level optimization
On Mon, 18 Apr 2005, Paolo Bonzini wrote: > Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa! How > would you do it? Take > > long long a, b, c; > c = a + b; > > Would it be > > c = ((int)a + (int)b) > + ((int) (a >> 32) + (int) (b >> 32) > + ((unsigned int) a < (unsigned int) b)) << 32; > > Or will you introduce new tree codes and uglifying tree-ssa? > Seriously... I think you may have misinterpreted or over-interpreted what I meant by performing optimizations/lowering, earlier vs. later. When I suggested the i386.c backend should lower DImode operations earlier, my intention was to lower during RTL expansion instead of where it currently happens after reload. Whilst doing it in tree-ssa would technically also be "doing it earlier", this isn't really what I had in mind. The fact that DImode moves aren't lowered, and that the high and low parts of a DImode register can't be independently live/dead (for register allocation) is the reason for PR17236, and why GCC generates poorer code than Intel on something as simple as "a * b". Not that it would be impossible to do this with trees. It's not uncommon for representations to be lowered, for example high-level trees to low-level trees. RTL has always done this, where addressof could be seen as a lowering pass, as can combine, reload, regstack and the process of splitting, where the RTL representation afterwards is a more accurate description of the generated code than the RTL representation before. Roger --
Re: GCC 4.0, Fast Math, and Acovea
On Fri, 29 Apr 2005, Scott Robert Ladd wrote: > I've been down (due to illness) for a couple of months, so I don't know > if folk here are aware of something I discovered about GCC 4.0 on AMD64: > -ffast-math is "broken" on AMD64/x86_64. Hi Scott, I was wondering if you could do some investigating for me... The change in GCC was made following the observation that given operands in SSE registers, it was actually faster on x86_64 boxes to call the optimized SSE implementations of intrinsics in libm, than to shuffle the SSE registers to x87 registers (via memory), invoke the x87 intrinsic, and then shuffle the result back from the x87 registers to SSE registers (again via memory). See the threads at http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01877.html http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02119.html (Your benchmarking with acovea 4 is even quotetd in http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02154.html) Not only are the recent libm implementations faster than x87 intrinsics, but they are also more accurate (in terms of ulp). This helps explains why tbp reported that gcc is faster than icc8.1 on opteron, but slower than it ia32 (contradiciting your observations). Of course, the decision to disable x87 intrinsics with (the default) -fpmath=sse on x86_64 is predicated on a number of requirements. These include that the mathematical intrinsics are implemented in libm using fast SSE implementations with arguments and results being passed and returned in SSE registers (the TARGET64 ABI). If this isn't the case, then you'll see the slowdowns you're seeing. Could you investigate if this is the case? For example, which OS and version are you using? And what code is being generated for: double test(double a, double b) { return sin(a*b); } One known source of problem is old system headers for , where even on x86_64 targets and various -fpmath=foo options the header files insist on using x87 intrinsics, forcing the compiler to shuffle registers by default. As pointed out previously, -D__NO_MATH_INLINES should cure this. Thanks in advance, Roger -- Roger Sayle, E-mail: [EMAIL PROTECTED] OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
Re: GCC-3.3.6 release status
On 30 Apr 2005, Gabriel Dos Reis wrote: > There were 36 PRs open against it (as of this morning, 6am, GMT-05). > I wnet through all of them and the appeared to be very minor or > impossible bugs to fix in 3.3.6 major restructuring (typically, they > were bugs fixed in 3.4.x or 4.0.x). Two of them were pretty simple to > handle. Others were closed as "fixed in 3.4.x, won't fix in 3.3.6". > Except this: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19579 > for which I would appreicate inputs from both the author and Roger. My apologies for the delay. Yes, this should be safe for the 3.3 release branch, as mentioned in the original approval: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01714.html. The cause of my delay was to investigate whether the code on the gcc-3_3-branch was significantly different or whether there was some other reason why this fix hadn't yet been committed there. If I get a few hours, and nobody beats me to it, I'll try a bootstrap and regression testing a backport of Jakub's fix myself. Roger --
Re: On fold_indirect_ref
On Sun, 15 May 2005, Richard Guenther wrote: > What can be done about this issues? First, we can use VIEW_CONVERT_EXPR > unconditionally in fold_indirect_ref and only break some optimizations > (like temp1.C). Second, we can declare fold_indirect_ref being unsafe > about types and deal with this in its callers where we possibly know > about if we're dealing with lvalues or rvalues, using either NOP_EXPRs > or VIEW_CONVERT_EXPRs as appropriate. We could ease this with providing > wrappers around fold_indirect_ref (or a flag) - but checking if we're > (possibly) dealing with a lhs is not possible, so uses may remain > unsafe. Exactly which optimization do we miss by changing: /* *&p => p */ - if (lang_hooks.types_compatible_p (type, optype)) + if (type == optype) return op; This is the least intrusive and most correct change. Normally, in fold-const.c the invariant is (or at least should be) that gcc_assert (TREE_TYPE (t) == TREE_TYPE (fold (t)) Returning a tree with any type other than "type" from fold_indirect_ref will potentially screw something up. Probably the reason that fold_unary doesn't call fold_indirect_ref is precisely because it is unsafe. If there are significant optimizations that require playing fast and loose with optimization, then these really should me moved (back) out of fold-const.c, to gimplify.c where the tree-ssa passes have a less strict type system. Creating a gimplify_indirect_ref, or better still both a gimplify_lhs_indirect_ref and a gimplify_rhs_indirect_ref would allow these. Not only, as you mention in your post, can't float x; *(int *)&x = 0x8000; validly be reduced to (int)x = 0x8000, because of the need for a cast on the lhs, but similarly, this transformation is just as unsafe for the RHS, where return *(int*)&x; really isn't equivalent to (int)x, even when the front-end's language hooks allow assignment/conversion from floats to ints and ints to floats. Interpreting a float's bits as an integer isn't the same as implicitly casting the float to an integer. Likewise int y; return *(char*)y; isn't always the same as (char)y depending up on the endianness of the machine, even if ints can be assigned to chars by the front-end. In the above cases, the usual casting operators NOP_EXPR and CONVERT_EXPR are the wrong things to use. If these transformations are to be performed at all, on both LHS and RHS, they must use VIEW_CONVERT_EXPR. At the moment, it looks like if ever "optype != type" we're assuming that the callers implicitly perform fold_convert or equivalent, which is clearly the wrong form of conversion to use. The issue is that if ever fold returns a type other than it's arguments, there's an ambiguity as which form of conversion to use to correct it. In these cases, the assumption that fold_convert can be used is incorrect. You mentioned on IRC that many of the optimizers aren't expecting VIEW_CONVERT_EXPRs, in which case it really is best to leave these indirect references through casts of ADDR_EXPRs in their original form. Roger --
Re: On fold_indirect_ref
On Sun, 15 May 2005, Richard Guenther wrote: > > Exactly which optimization do we miss by changing: > > > > /* *&p => p */ > > - if (lang_hooks.types_compatible_p (type, optype)) > > + if (type == optype) > > return op; > > I don't know - maybe stripping sign casts. But if we use equality > comparison here we can as well use STRIP_TYPE_NOPS instead of > STRIP_NOPS - but the patch doing so caused some optimization regressions. You're confusing the types of the pointers vs. the types of what they point to. The call to STRIP_NOPS at the top of fold_indirect_ref is stripping the conversions between pointer types. Both your recent change to STRIP_TYPE_NOPS (and it's reversion) and your suggestion above about STRIP_SIGN_NOPS are red herrings. These calls are responsible for eliminating the casts between "char*", "const char*", "int*", "unsigned int*", "float*" etc. In the C front-end, for example, these are all equivalent types, typically all with mode Pmode and with (irrelevant?) signedness. The NOPS that you are trying to strip are for the pointed to type, i.e. that you strip a conversion for T1* to T2*, if T1 and T2 have the same mode, not that the pointers T1* and T2* have the same mode. If you really know of a missed optimization, rather than just your suspicion above about the potential optimization opportunity for signedness changes, then you can try something more aggressive such as: /* *&p => p */ if (TYPE_MODE (type) == TYPE_MODE (optype) && TYPE_MODE (type) != BLKmode) return fold_convert (type, op) But this is only safe if the operand is known to be on the RHS of an expression, and not used as an lvalue. Unfortunately, testing "maybe_lvalue_p (op)" here doesn't help, as op has to have been an lvalue to be the argument of an ADDR_EXPR. In gimplify.c, you might be able to get away without the fold_convert, but I would like to comment on what/when type conversions are useless in tree-ssa. Hence, if there are any missed optimizations, for my original suggestion these would have to be caught at gimplification (or in fold_stmt), with a mode test like the one above, when the context of the indirect_ref expression is known. I hope this clears up the STRIP_NOPS confusion. Roger --
Re: Sine and Cosine Accuracy
On Thu, 26 May 2005, Scott Robert Ladd wrote: > I prefer breaking out the hardware intrinsics from > -funsafe-math-optimizations, such that people can compile to use their > hardware *without* the other transformations implicit in the current > collective. > > If someone can explain how this hurts anything, please let me know. I apologise for coming into this argument late. I'll admit that I haven't even caught up on the entire thread, but an interesting relevant article that may or may not have already been mentioned is: http://web.archive.org/web/20040409144725/http://www.naturalbridge.com/floatingpoint/intelfp.html [My bookmark 404'd, so I had to use the wayback machine to find it!] The crux of the issue is that only two GCC targets have ever supported trigonometic functions in hardware; the x87 coprocessor on IA-32 systems and the 68881 co-processor on m68k systems. Of these two, GCC builtin support has only ever been added for the i386 backend and as mentioned in the article above the FSIN and FCOS functions produce results well outside the 1ulp allowed by the relevant standards, even for arguments in the range [0...2PI]. As such, the reason why hardware support for these intrinsics is considered part of flag_unsafe_math_optimizations, is that, for some applications, they are exactly that, "unsafe". Admittedly on many IA-32 systems there's little difference between using FSIN vs calling the OS's libm's sin function, as glibc and microsoft's runtimes (for example) themselves use the x87 intrinsics. GCC, however, is not to know this and assumes that the user might provide a high-precision library, such as Lefevre's perfect O.5ulp implementation. [It's nice to see him join this argument! :)] In this instance, "unsafe" math need only be different. For example, if the compile-time evaluation, the run-time library and the hardware intrinsic could potentially return different results, even if the change is to return a more accurate result, then it is potentially unsafe. Tests such as: double x = 2.0; if (sin(x) != sin(2.0)) abort(); and double (*fptr)(double) = &sin; if (sin(x) != fptr(x)) abort(); should continue to work as expected. This might also explain some of your accuracy results. Even if GCC can determine a way of evaluating an expression that results in less loss of precision, e.g. "(x + 1.0) - 1.0" -> "x", it can't do so unless allowed to be "unsafe". Sorry if all this has been mentioned before. Your own results show that you get different results using x87 hardware intrinsics, and this alone classifies their use as "unsafe" in GCC terminology, i.e. may potentially produce different results. Roger --
Re: ifcvt.c question
On Sun, 29 May 2005, Steven Bosscher wrote: > I don't understand what lines 2728 to 2741 are for. Could someone give > an example of when we can have a then_bb that has no successors, but > still ends in something that satisfies simplejump_p()? What kind of > strange indirect jump would that be? And shouldn't the check just use > computed_jump_p() if that is what it is looking for?? For the benefit of the list, I did some archeology to determine when this functionality was written and how it evolved to be so broken now. The mistake was a bug introduced by Michael Meisner back in 2000. The functionality was originally introduced by Richard Henderson to allow predication of then blocks that end in noreturn calls. 2000-05-31 Richard Henderson <[EMAIL PROTECTED]> * ifcvt.c (merge_if_block): Be prepared for JOIN to have no remaining edges. (find_if_block): Allow THEN with no outgoing edges. * flow.c (merge_blocks_nomove): Remove a barrier not following a jump as well. posted: http://gcc.gnu.org/ml/gcc-patches/2000-05/msg01711.html Unfortunately, this exposed a bug in the d30v port that caused a regression in gcc.c-torture/execute/920428-1.c. As I suggested on IRC, this was a then-block that ended in a non-local goto, or more precisely an indirect jump. Michael tried to resolve this problem by excluding those blocks that ended in indirect jumps. His mistake was to assume that the way to eliminate indirect jumps was to limit this code to simple jumps! Doh! This inadvertantly disabled RTH's optimization, and the optimizations' code has been dead ever since. 2000-08-19 Michael Meissner <[EMAIL PROTECTED]> * ifcvt.c (find_if_block): Do not assume that a THEN block has any instructions in it before checking for indirect jumps. * ifcvt.c (find_if_block): Do not consider a THEN block that ends in an indirect jump as a potential for conditional execution. * d30v.h (d30v_init_expanders): Don't declare here. * d30v-protos.h (d30v_init_expanders): Declare here with a valid prototype. posted: http://gcc.gnu.org/ml/gcc-patches/2000-08/msg00736.html RTH should probably have added a testcase that checked whether this code was being optimized, so we'd know when it regressed. I also can't immediately see why a block that ends in an indirect jump shouldn't be considered for conditional execution, and he might well have been papering over another problem in the compiler (esp. the D30V backend). Of course now that support for D30V has been removed from CVS, it's difficult to diagnose whether this really was just a backend problem, and whether we should just return the code to a form similar to the one RTH originally contributed. I hope this answers your question. Roger --
Re: PR 23046. Folding predicates involving TYPE_MAX_VALUE/TYPE_MIN_VALUE
Hi Diego, On Fri, 5 Aug 2005, Diego Novillo wrote: > In PR 23046 we ICE inside tree-vrp.c because fold() does not > realize that for > > enum enumtype { ENUM1, ENUM2 } x; > > the predicate 'if (x > 1)' is always false. This causes VRP to > create the impossible range [2, 1] for that predicate. > > While it would be trivial for VRP to paper over this problem, the > real fix should be in fold(). I looked at the logic that detects > these cases and it is fairly convoluted (fold-const.c:9174). > > I'm wondering why doesn't fold() just use TYPE_MAX_VALUE/TYPE_MIN_VALUE > if they're available? I'm trying to remember/find the conclusions of the many discussions on the ranges of enumeral types, that we've had in the past. One of the explanations describing the problems in this area was at http://gcc.gnu.org/ml/gcc-patches/2004-05/msg02002.html There was also a huge thread (famewar?) at the time over the issue which began at http://gcc.gnu.org/ml/gcc/2004-08/msg01129.html One of the fundamental problem/contentious issues was whether GCC guarantees that enumeral variables are always truncated and sign or zero extended when assigned from integers, or whether the optimizers can take advantage of the of the fact that the stored representation is always in its enumerated range. A related discussion can also be found in the thread following http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00968.html Perhaps the issue may now be closer the resolution than in was last time around. But the uneasy "detente" is that the middle-end currently treats enumerated types as having the widths of their machine modes. Perhaps things have now changed significantly since the last time this was discussed, I have no vested interest in one side vs. the other, other than the semantics need to be consistent lest the same kinds of tree-ssa bugs resurface again. I hope this helps. My personal advice, assuming that you're looking for an easy life, is to place the seemingly unnessary consistency checks in VRP. Some of the cases discussed in the above threads might make interesting tests for the VRP code. I'll admit some of this "lore" should be documented, but the issue has never been satisfactorily resolved to everyone's satisfaction, so we keep with the less than idea "status quo". Roger --
Re: Question about merging two instructions.
On Sun, 21 Aug 2005, Leehod Baruch wrote: > *** 329,334 > --- 328,341 > GET_MODE (SUBREG_REG (x)), > SUBREG_BYTE (x)); > return op0 ? op0 : x; > + } > + if (code == SET) > + { > + op0 = simplify_replace_rtx (XEXP (x, 0), old_rtx, new_rtx); > + op1 = simplify_replace_rtx (XEXP (x, 1), old_rtx, new_rtx); > + if ((op0 == XEXP (x, 0)) && (op1 == XEXP (x, 1))) > + return x; > + return gen_rtx_SET (VOIDmode, op0, op1); > } > break; > > This way, when the replacement succeeds and the result instruction > is recognizable, I can eliminate the extension. > > Does it seem like a good change? > Does anyone have a better solution? This approach seems reasonable. The current structure of the code in simplify_replace_rtx is intended to handle RTL expressions rather than patterns, so normally it would be passed just SET_SRC (pat), instead of the whole set. The reason why no-one has wanted/required simplify_replace_rtx to work on SETs is that all current passes tend to be cautious about changes to the LHS of an assignment. Performing substitution and simplifications in the RHS/SET_SRC always produces a semantically equivalent instruction, but blindly replacing subexpressions in the LHS/SET_DEST can significantly change its meaning. Hence, when combine/GCSE/CSE/reload and other RTL optimization passes modify the target/destination of an assignment, they only perform the transformations they expect or can guarantee are safe (building the SET themselves), but are happy to let simplify_replace_rtx do it's thing on the source expression. To summarise, the change above is not unreasonable and I'd be happy to allow this change to simplify-rtx.c, but I'd be more cautious about where and why it was used. For example, if you're sure nothing bad can happen in the LHS, it might be reasonable to instead place this code in a simplify_replace_set() function. As the SET (and perhaps support for PARALLEL) should only ever occur at the top-level, which can then call the recursive simplify_replace_rtx. I hope this helps. Roger --
Re: Question about merging two instructions.
On Sun, 21 Aug 2005, Leehod Baruch wrote: > >>(insn 1 0 2 0 (set (reg/v:Xmode r) > >>(sign_extend:Xmode (op:Ymode (... > >>(insn 2 1 3 0 (set (lhs) (rhs))) > > 1. Can you please give me an example of something bad that can happen to > the LHS. Maybe I'm missing something here. (set (reg:Xmode r) (sign_extend:Xmode (reg:Ymode p))) (set (subreg:Ymode (reg:Xmode r) 0) (reg:Ymode q)) would be transfomed (by replacing all uses of "reg r" with it's definition on both LHS and RHS) into: (set (reg:Ymode p) (reg:Ymode q)) Originally, r's high part would be set to the signbit of p and r's low part would be set to the value of q. After the transformation, we now overwrite the operand p with the value q, which isn't quite the same thing. > 2. After calling simplify_replace_rtx I try to recognize the instruction. > Is this been cautious or is it unnecessary? Except for register-register moves, all synthesized instructions need to be rerecognized, especially after "RTL simplification". > 3. Isn't it reasonable to expect that every instance on old_rtx will be > replaced by new_rtx even if it can't be simplified? > This is what I understand from the function's documentation. > But actually every expressions that can't be simplified is not replaced. Every instance of old_rtx should be replaced by new_rtx. You may be getting confused by the code to reduce memory usage. If a replacement doesn't occur within all operands/subtrees of a tree, then return this tree. The invariant in the recursion is that if a substitution has been made anywhere in the tree, it returns a newly allocated RTX. Simplification of this newly allocated RTX, will itself return a newly allocated RTX. Hence the testing whether the return value of simplify_replace_rtx matches it's original first argument is a way of determining whether any substitution has been made (whether it was subsequently simplified or not). The one caveat to this is that simplify_replace_rtx is less robust to unrecognized RTL codes than replace_rtx. i.e. it won't traverse UNSPECs or other non-unary/non-binary/non-comparison expressions. This can/should probably be fixed by tweaking the "default:" case to match the GET_RTX_FORMAT loop in replace_rtx. Note this isn't a simple cut'n'paste, as replace_rtx destructively overwrites it's input expression, whilst simplify_replace_rtx returns a different tree if anything changed. Roger --
Re: Bug in builtin_floor optimization
On Mon, 22 Aug 2005, Dale Johannesen wrote: > There is some clever code in convert_to_real that converts > > double d; > (float)floor(d) > > to > > floorf((float)d) > ... > > Comments? Should I preserve the buggy behavior with -ffast-math? Good catch. This is indeed a -ffast-math (or more precisely a flag_unsafe_math_optimizations) transformation. I'd prefer to keep these transformations with -ffast-math, as Jan described them as significantly helping SPEC's mesa when they were added. My one comment is that we should try to make sure that we continue to optimize the common safe case (even without -ffast-math): float x, y; x = floor(y); i.e. that (float)floor((double)y) is the same as floorf(y). Hmm, it might be good to know the relative merits of the safe vs. unsafe variants. If the majority of the benefit is from the "safe" form, I wouldn't be opposed to removing the "unsafe" form completely, if people think its an optimization "too far". Thanks for investigating this. Roger --
Re: fold_build1 (NOP_EXPR, ...) vs. fold_build1 (CONVERT_EXPR, ...)
On Thu, 1 Dec 2005, Richard Guenther wrote: > It looks like it is safe to exchange both of them (the first one for sure) > to fold_convert (...) due to the fact that fold_unary handles NOP_EXPR > the same way than CONVERT_EXPR apart from cases that look like oversights, > ... > In fact, I remember a plan of merging NOP_EXPR and CONVERT_EXPR. Doh! I follow gcc-patches more closely than the gcc list, so I saw your post there first and replied without cross-posting. For those interested in the topic, my thoughts are at: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg00124.html Roger --
Re: Problem with gcc.c-torture/execute/960608-1.c on dataflow
Hi Dan, On Wed, 7 Dec 2005, Daniel Berlin wrote: > This is *already* wrong, AFAICT, because reg:QI 58 is uninitialized, and > we are trying to use it's value. Why do we do this? This may have been a rhetorical question, but the middle-end's RTL expanders are generating uses of uninitialized registers when writing to bitfields of a structure that has been allocated to a register. In your example, the middle-end has decided that the struct "flags" should be placed in (reg:QI 58). Then when the first statement decides to initialize the single bit corresponding to field c to zero, it uses the idiom of (set (reg:QI 58) (and:QI (reg:QI 58) (const_int))) The possible fixes to this are for the middle-end to generate either code to initialize bitfield-containing structures when they're stored in registers, i.e. something like (set (reg:QI 58) (const_int 0)) at the start of the function, or to emit some form of nop_set or clobber to make it explicit that the contents of register QI:58 are initially undefined but may be used before becoming defined. Both of these should allow the middle-end's optimizers to do a better job, for example, the and's and or's to set QI:58 to a specific value should ideally be reduced to a single (set (reg:QI 58) (const_int foo)). The potential concern with "initialization" is that this may result in additional overhead, if the RTL optimizers can't determine that an initialization is dead when the result is unused/cobbered. I've no idea whether we've another target-independent mechanism to inform dataflow that a structure or some fields of it are used undefined. This should explain why the middle-end is doing what it's doing. As for how best to "fix" this behaviour, I'll leave to someone else. Roger --
[RFC/RFT] Should we hoist FP constants on x87?
One significant (philosophical) difference between the floating point code generated by GCC vs that generated by commercial compilers for IA-32, is the decision whether or not to hoist floating point constants on the x87. Or phrased equivalently, whether to allocate an x87 stack register to hold compile-time FP constants over basic block boundaries. Consider the following code: double a[10]; double b[10]; void foo(int n) { int i; for (i=0; i * gcse.c (want_to_gcse_p): On STACK_REGS targets, look through constant pool references to identify stack mode constants. * loop.c (constant_pool_constant_p): New predicate to check whether operand is a floating point constant in the pool. (scan_loop): Avoid hoisting constants from the constant pool on STACK_REGS targets. (load_mems): Likewise. Index: gcse.c === *** gcse.c (revision 108834) --- gcse.c (working copy) *** static basic_block current_bb; *** 1184,1189 --- 1184,1197 static int want_to_gcse_p (rtx x) { + #ifdef STACK_REGS + /* On register stack architectures, don't GCSE constants from the + constant pool, as the benefits are often swamped by the overhead + of shuffling the register stack between basic blocks. */ + if (IS_STACK_MODE (GET_MODE (x))) + x = avoid_constant_pool_reference (x); + #endif + switch (GET_CODE (x)) { case REG: Index: loop.c === *** loop.c (revision 108834) --- loop.c (working copy) *** find_regs_nested (rtx deps, rtx x) *** 977,982 --- 977,991 return deps; } + /* Check whether this is a constant pool constant. */ + bool + constant_pool_constant_p (rtx x) + { + x = avoid_constant_pool_reference (x); + return GET_CODE (x) == CONST_DOUBLE; + } + + /* Optimize one loop described by LOOP. */ /* ??? Could also move memory writes out of loops if the destination address *** scan_loop (struct loop *loop, int flags) *** 1228,1233 --- 1237,1248 if (GET_MODE_CLASS (GET_MODE (SET_DEST (set))) == MODE_CC && CONSTANT_P (src)) ; + #ifdef STACK_REGS + /* Don't hoist constant pool constants into stack regs. */ + else if (IS_STACK_MODE (GET_MODE (SET_SRC (set))) + && constant_pool_constant_p (SET_SRC (set))) + ; + #endif /* Don't try to optimize a register that was made by loop-optimization for an inner loop. We don't know its life-span, so we can't compute *** load_mems (const struct loop *loop) *** 10830,10835 --- 10845,10857 && SCALAR_FLOAT_MODE_P (GET_MODE (mem))) loop_info->mems[i].optimize = 0; + #ifdef STACK_REGS + /* Don't hoist constant pool constants into stack registers. */ + if (IS_STACK_MODE (GET_MODE (mem)) + && constant_pool_constant_p (mem)) + loop_info->mems[i].optimize = 0; + #endif + /* If this MEM is written to, we must be sure that there are no reads from another MEM that aliases this one. */ if (loop_info->mems[i].optimize && written) Roger --
Re: Is fold_binary_expr being too picky here?
Hi Diego, On Thu, 12 Jan 2006, Diego Novillo wrote: > In fold_binary() we are asserting: > > gcc_assert (IS_EXPR_CODE_CLASS (kind) > && TREE_CODE_LENGTH (code) == 2 > && op0 != NULL_TREE > && op1 != NULL_TREE); ... > DEFTREECODE (OMP_SINGLE, "omp_single", tcc_statement, 2) Interesting! If tcc_statement is the best tree_code_class for OMP_SINGLE, then I think its reasonable for fold_binary to be more forgiving. For example, the least intrusive fix is to change the assert to: gcc_assert (IS_EXPR_CODE_CLASS (kind) && TREE_CODE_LENGTH (code) == 2); if (op0 == NULL_TREE || op1 == NULL_TREE) return NULL_TREE; If there are useful high level optimizations of OMP_SINGLE, that could be applied by fold, then this issue needs a little more thought so that fold_binary can be restructured based upon TREE_CODE_CLASS to avoid any potential NULL pointer dereferences. Anyone feel strongly for or against the above change? I'd prefer not to have to bloat the trees we use with non-NULL operands, just to work around the sanity checks we have. The types of error caught by this assertion should be extremely rare. Thoughts? Roger --
Re: Mainline bootstrap failure (revision 110017)
On Fri, 20 Jan 2006, Kenneth Zadeck wrote: > I would like permission to revert Zdenek's patch for a few days. There > is nothing wrong with zdenek's patch, pe se, but it excercises a part of > df that should work but does not. I'm going to make an executive decision on this one, to restore bootstrap immediately. I agree that Zdenek's patch be reverted for a few days without prejudice. Both patch submitters followed the rules of bootstrap and regression test, but a bad (unforetestable) interaction occurred between the two. It turns out that it was the Kenny's DF changes at fault, that contain paths that weren't fully tested with his changes alone. Zdenek's patch is innocent, but reliant on this problematic functionality. Normally, we'd allow 48 hours of shouting before fixing the tree. But I think we agree that in the case "the mid-air collision" needs to be resolved by Kenny, and his request for a short-term work around carries some weight. With no shame on the killloop merge, I suspect we should back it out, as it is mostly new functionality, whilst Kenny's patch also contained bug-fixes, and is therefore probably the more stable of the two bits when applied independently. I hope this is satisfactory to anyone. The judge's decision is final modulo an over-ruling by Mark or the SC. This gets the tree back to bootstrap land, and allows the rest of GOMP to be merged etc... at a busy point in the GCC lifecycle. The clock is ticking for Kenny. I propose a reverse 48 hour rule where we reinstate Zdenek's patch on Monday, either by fixing DF by then or by reverting the DF changes. i.e. swap one of the clashing patches for the other. My apologies to everyone for any inconvenience. Many thanks to Dan and H-P for investigating the problem on IRC. Roger -- Roger Sayle, E-mail: [EMAIL PROTECTED] OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
Re: Mainline bootstrap failure (revision 110017)
On Fri, 20 Jan 2006, Zdenek Dvorak wrote: > I propose the following workaround instead, that also restores > bootstrap. It changes the way loop-iv uses df to more conservative one, > that does not seem to cause problems. That's what I like to see... options. Yes, this is OK for mainline, please hold off on the reversion (or if necessary reapply with this change). Many thanks, Roger --
Re: [RFC/RFT] PR/25890 and PR/25905
On Mon, 23 Jan 2006, Paolo Bonzini wrote: > 2006-01-23 Paolo Bonzini <[EMAIL PROTECTED]> > > PR rtl-optimization/25890 > PR rtl-optimization/25905 > * combine.c (expand_compound_operation, expand_field_assignment): > Fail if the bitfield's final position is out of bounds. This is OK for mainline. Though we'd also do well to fix some of the underlying code that's causing this suspicious RTL. Thanks, Roger --
Inconsistency in ix86_binary_operator_ok?
I've a IA-32 backend question on the intended behaviour of the functions ix86_binary_operator_ok and ix86_fixup_binary_operands. My confusion is that these functions currently allow arithmetic operations of the form "reg = op(mem,immed)" even though this shape isn't supported by x86 ISA. For example, the following simple test code, int foo(int x) { return x | 4; } generates the following RTL in combine: (insn 11 4 16 0 (parallel [ (set (reg:SI 60) (ior:SI (mem/f:SI (reg/f:SI 16 argp) [2 x+0 S4 A32]) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) 209 {*iorsi_1} (nil) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) which is then later fixed up by postreload. My first question is whether this is intentional or an oversight? The reason I ask/noticed this was that I'm working on a RTL patch to canonicalize shifts vs. arithmetic operations, such that we always perform the shift first. This should be a big win on ARM, but will also allow us to simplify code such as: x ^= 8; x >>= 2; x ^= 1; where by reordering the arithmetic operations relative to the shifts, we can catch more simplifications in combine. The catch is that currently i386.md allows combine to first combine commutative arithmetic operations with memory, creating reg = xor(mem,8); reg = rshift(reg,2), but then inconsistently won't allow a memory as the first operand in the non-commutative shift operation, i.e. we're not allowed to create reg = rshift(mem,2). Clearly there's an inconsistency. So the options include to either allow reg = op(mem,immed) for non-commutative operators to also be fixed by postreload, if the current behaviour is intentional and serves some useful purpose, or alternatively to change ix86_binary_operator_ok so that we only allow valid instructions at this point. Many thanks in advance, Roger --
Re: Regression introduced by your change
On Thu, 2 Mar 2006, Jeffrey A Law wrote: > Is causing 961206-1.c to regress at -O1 and -O2 on i686-pc-linux-gnu > and possibly other platforms. Doh! This doesn't fail on x86_64-unknown-linux-gnu, where I developed that patch, but I am now seeing those failures on i686-pc-linux-gnu. > OUCH. Instead of testing bit 31, we test bit 7. > It seems to me that we need to emit the shift in the mode of > the comparison's first operand. Then convert the result of the > shift into the desired mode via a subreg or something along > those lines. > > Thoughts? Sorry for the breakage. I'll have a fix before the sun goes down, that performs the shift in the correct mode, then appropriately sign extends, zero extends or truncates if necessary. Many thanks for analyzing this failure. Sorry again. Roger --
Re: [RFC] removal of the convert callback
On Mon, 27 Mar 2006, [UTF-8] Rafael Esp??ndola wrote: > I have put toghether a hack to remove the convert callback. It consists in > 1) adding a lang prefix to the implementations of convert (cxx_, c_, >gfc_, etc) > 2) defining a convert2 function that is basically a copy of c_convert > 3) converting the calls to convert to calls to the lang specif >implementations or convert2 I've a number of patches to clean-up the use of "convert" in GCC myself. If you'll give me a week or so to submit/commit them to the tree, it'll make your life much easier. My patches aim to remove all uses of "convert" from libbackend.a (simplifying the interface between the front-end and the middle-end). Once this is done, your task of renaming the functions gets much simpler. The trick is that not all of the "convert" functions behave identically, which is why basing convert2 on c_convert fails for g++, which really needs cp_convert. Instead of a new convert2, the middle-end should be using the existing fold_convert which performs the common language-independent conversions. Finally, I suspect that some front-ends, probably java, fortran and treelang, only define "convert" for use by the middle-end, which means once the middle-end is cleaned up these functions can be removed from their respective front-ends entirely. I hope this makes sense. Sorry for not noticing your post earlier. I tend to read the gcc list less frequently than gcc-patches. Roger --
[RFC] Ignore TREE_CONSTANT_OVERFLOW in integer_zerop
Following some of my recent middle-end clean-ups, I believe that we're now at the point where incrementally the middle-end can start ignoring the TREE_OVERFLOW bits on constant tree nodes. As a step in this direction, the patch below removes the TREE_CONSTANT_OVERFLOW checks from integer_zerop, integer_onep, and friends in tree.c. This patch bootstraps and regression tests on i686-pc-linux-gnu (including Ada) with no new failures. The major true user of TREE_CONSTANT_OVERFLOW is the C front-end, which doesn't use any of these functions to determine whether it should issue a diagnostic when an overflowed integer is used in a context requiring a compile-time constant. Over the years, this overflow testing in the middle-end has caused numerous bugs, the most recent being last week's PR26859. Removing this test cleans up the semantics of integer constants a little. In the unlikely event that any problems are discovered, by routines relying on these functions testing checking for overflow, the trivial fix is to rewrite the callers as: if (integer_zerop (t) && ! TREE_CONSTANT_OVERFLOW (t)) ... There is now a stronger requirement on fold-const.c and friends that it now be extra careful preserving TREE_CONSTANT_OVERFLOW for the C front-end. Optimizations such as "0 * x" -> "0" where we test for zero using integer_zerop, have been checked to make sure that we return the original zero, which the overflow bits set as appropriate. Once this patch is approved there's some follow-up clean-up that can be done, removing the duplication in the many "local" functions that test for zero but couldn't previously use integer_zerop due to the historical overflow semantics. Presumably everyone agrees that this evolution is a good thing. The contention is whether everyone agrees that we're ready for such a step. Is such a transition safe for stage 3 mainline, and/or would front-ends prefer some time to double check that this won't cause problems on conformance tests not part of GCC's testsuite. Thoughts? 2006-04-01 Roger Sayle <[EMAIL PROTECTED]> * tree.c (integer_zerop): Ignore TREE_CONSTANT_OVERFLOW. (integer_onep): Likewise. (integer_all_onesp): Likewise. (integer_pow2p): Likewise. (integer_nonzerop): Likewise. (real_zerop): Likewise. (real_onep): Likewise. (real_twop): Likewise. (real_minus_onep): Likewise. (int_size_in_bytes): Likewise. (host_integerp): Likewise. Index: tree.c === *** tree.c (revision 112378) --- tree.c (working copy) *** integer_zerop (tree expr) *** 1209,1215 STRIP_NOPS (expr); return ((TREE_CODE (expr) == INTEGER_CST - && ! TREE_CONSTANT_OVERFLOW (expr) && TREE_INT_CST_LOW (expr) == 0 && TREE_INT_CST_HIGH (expr) == 0) || (TREE_CODE (expr) == COMPLEX_CST --- 1209,1214 *** integer_onep (tree expr) *** 1226,1232 STRIP_NOPS (expr); return ((TREE_CODE (expr) == INTEGER_CST - && ! TREE_CONSTANT_OVERFLOW (expr) && TREE_INT_CST_LOW (expr) == 1 && TREE_INT_CST_HIGH (expr) == 0) || (TREE_CODE (expr) == COMPLEX_CST --- 1225,1230 *** integer_all_onesp (tree expr) *** 1250,1257 && integer_zerop (TREE_IMAGPART (expr))) return 1; ! else if (TREE_CODE (expr) != INTEGER_CST ! || TREE_CONSTANT_OVERFLOW (expr)) return 0; uns = TYPE_UNSIGNED (TREE_TYPE (expr)); --- 1248,1254 && integer_zerop (TREE_IMAGPART (expr))) return 1; ! else if (TREE_CODE (expr) != INTEGER_CST) return 0; uns = TYPE_UNSIGNED (TREE_TYPE (expr)); *** integer_pow2p (tree expr) *** 1303,1309 && integer_zerop (TREE_IMAGPART (expr))) return 1; ! if (TREE_CODE (expr) != INTEGER_CST || TREE_CONSTANT_OVERFLOW (expr)) return 0; prec = (POINTER_TYPE_P (TREE_TYPE (expr)) --- 1300,1306 && integer_zerop (TREE_IMAGPART (expr))) return 1; ! if (TREE_CODE (expr) != INTEGER_CST) return 0; prec = (POINTER_TYPE_P (TREE_TYPE (expr)) *** integer_nonzerop (tree expr) *** 1341,1347 STRIP_NOPS (expr); return ((TREE_CODE (expr) == INTEGER_CST - && ! TREE_CONSTANT_OVERFLOW (expr) && (TREE_INT_CST_LOW (expr) != 0 || TREE_INT_CST_HIGH (expr) != 0)) || (TREE_CODE (expr) == COMPLEX_CST --- 1338,1343 *** real_zerop (tree expr) *** 1434,1440 STRIP_NOPS (expr); return ((TREE_CODE (expr) == REAL_CST - && ! TREE_CONSTANT_OVERFLOW (expr) && REAL_VALUES_EQUAL (
Re: [RFC] Ignore TREE_CONSTANT_OVERFLOW in integer_zerop
On Sun, 2 Apr 2006, Eric Botcazou wrote: > > 2006-04-01 Roger Sayle <[EMAIL PROTECTED]> > > > > * tree.c (integer_zerop): Ignore TREE_CONSTANT_OVERFLOW. > > [...] > > (int_size_in_bytes): Likewise. > > (host_integerp): Likewise. > > Is this an oversight? Doh. Indeed. The last two lines of the ChangeLog should read (int_size_in_bytes): Ignore TREE_OVERFLOW. (host_integerp): Likewise. As one aspect of the series I've been investigating is to unify these two flags, I tend to think of them a synonymous. The only difference in semantics is that the C/C++ front-ends uses both to track whether a diagnostic has been emitted. It turns out that this conceptual difference can be handled entirely in the C family front-ends, and at the same time improve the diagnostic warnings that are emitted. Indeed, a many of the uses of TREE_OVERFLOW or TREE_CONSTANT_OVERFLOW currently test the wrong one, but the vast majority of writes are to clear these flags to prevent them screwing things up. Good catch. Thanks. Roger --
Re: [RFH] negate_expr_p bug?
On Mon, 3 Apr 2006, Richard Guenther wrote: > negate_expr_p currently contains > ... > where it looks bogus to simply return true for signed types but > unset flag_trapv. > ... > which is bogus as it should read || (!flag_trapv && flag_wrapv) - no? > I hit this with a patch to fold A +- CST to A -+ CST for negative CST, > which tells me it is ok to negate -INT_MAX. I suspect the problem is in the use of negate_expr_p. This predicate is designed for integer constants to check whether it is reasonable to evaluate NEG_EXPR at compile-time. Hence, fold will/can convert: int foo() { return -INT_MIN; } into "return INT_MIN;" during tree-ssa, provided that we don't rely on trapping math. i.e. we take advantage of the undefined behaviour of signed types, with or without flag_wrapv. The real question is whether your new transformation is using negate_expr_p correctly. It sounds as though you may be introducing an overflow that wasn't there before, in which case you need to be testing flag_wrapv && !flag_trapv in the calling clause. negate_expr_p (t) is simply an efficient version of temp = fold_build1 (NEGATE_EXPR, type, t); return TREE_CODE (temp) != NEGATE_EXPR; and its logic precisely mirrors the equivalent code in negate_expr. Perhaps some of the logic in fold_negate_const is more appropriate: temp = fold_negate_const (arg0, type); return !TREE_OVERFLOW (temp); I hope this helps. Roger --
Re: [RFH] negate_expr_p bug?
On Mon, 3 Apr 2006, Richard Guenther wrote: > > || (TREE_CODE (type) == INTEGER_TYPE >&& (TREE_CODE (arg1) == INTEGER_CST >|| TYPE_UNSIGNED (type) >|| (flag_wrapv && !flag_trapv))) > > Does this sound reasonable? Yes, this sounds reasonable. It was not the negate_expr_p call that's causing your problems but the overly restrictive guard on this transformation. Let me know the results of a bootstrap and regression test in case that points out something I've missed. Roger --
Re: Relying on precise integer calculation with double
On Thu, 6 Apr 2006, Daniel Bratell wrote: > that it would be safe to use that flag unless I relied on extreme > precision or behaviour related to NaN or +-Inf, and that normal simple > arithmetics should still give the same result. Unfortunately, with -ffast-math simple arithemtics no longer have precisely the same result. Converting the division into multiplication by reciprocal sometimes results in numbers that can't precisely be represented such as 1/7. Of course, in this case you are relying on "extreme precision" as you describe it above. As pointed out by Andrew the result is 0.4448884876874217298 which should be close enough to one for most -ffast-math applications. The real problem is the source line: >if(diff_days/7.0 != (int)(diff_days/7.0)) Here you're using C's default FP->int conversion with is truncation towards zero. The fact that 0. is less than 1 means the result is zero. Instead, you should probably use "rint" or "round" or "lround" to return the nearest integer. Roger --
Repository Write Access: A privilege or a right?
I've a quick question regarding the policy for granting/sponsoring CVS/subversion write access. There seems to have been a recent change in thinking (or difference of opinions) on the pre-requisites for allowing a new contributor modification rights to GCC's source repository. My understanding (which is admittedly mostly personal and historical) is that someone is granted write access once they have demonstrated an understanding of the rules and policies for submissions to GCC. I'd submitted ten or twenty patches to GCC, before RTH offered to sponsor my write access. Admittedly this is a bit extreme, but the rule of thumb that I currently use is to sponsor someones write access once they've submitted a patch that doesn't require changes (correct, following GNU-style, documentation changes, new testcase(s), with a ChangeLog entry, properly tested listing target triples). There does however appear to be another school of thought that suggests that write access should be immediate, following the acceptance of FSF copyright assignments, possibly implicitly by becoming a employee of a "blanket-assignment" company. In the interests of consistency I was wondering whether there was any official advice or guidelines on this aspect of GCC protocol? Increasingly, there have been examples of patches getting checked-in without maintainer approval, or without posting to gcc-patches, etc... I'd assumed that one reason for the right-of-passage of GCC CVS access was that no-one would be able to delete/corrupt the GCC repository without first having read contribute.html. One reason for requiring a "sponsor" is that a maintainer is required to vouch for the behaviour/apptitute of a new "Write After Approval" contributor. And in theory breakage of the regulations should reflect not only on the violator but also of his/her sponsor's judgement. I also suspect that the ultimate recourse of the SC is to revoke someone's write access. Unfortunately, this aspect of GCC protocol isn't clearly documented, and I base my current actions on experience of legacy policies. Admittedly, it'd be easier to relax "admission requirements" and thereby reduce the number of commits that have to be made on behalf of those without access. Perhaps the effectiveness of the "three stage system" at stabilizing the tree has reduced the need for the earlier methods used to ensure/improve the quality of contributions. There's also the new/recent complication that companies/groups now maintain their own GCC branches, perhaps requiring write access even for programmers who don't contribute to FSF GCC. What do the other maintainers, or the SC, think? Roger --
Re: Status of SEE and Autovectorization patches?
Hi Mark, On Tue, 2 May 2006, Mark Mitchell wrote: > Roger, I know that you reviewed the SEE patches. Is there anything > more than needs to be done to get them committed, in your view? As far as I'm aware, we're still just waiting for the Haifa folks to commit them to mainline. There are a number of harmless code generation changes on some platforms, but as far as we know no known performance regressions. And the Intel folks are keen to submit follow-up patches for x86_64 to extend/enhance this framework once its in the tree. And importantly there's a single command-line option/variable that can enable/disable this pass should there be any adverse events. However, my growing concern at the moment is the response time of the Haifa folks. If it's going to go into the tree in stage 3, I'd feel more comfortable that the patch submitters/contributors looked like they could respond to queries/issues with 24 or 48 hours. The current prolonged silence/inaction doesn't inspire great confidence. Perhaps you could make the executive decision, that if it isn't in the tree by June (or some date of your choosing), that it'll just have to wait for 4.3? My apologies for taking so long to review these changes, and I and the GCC maintainership should take some of the responsibility for the delays and current inconvenience. It looks from the e-mail addresses that possibly the primary developer of SEE, Leehod, is no longer at IBM but has returned to academia, with Mircea pinging the patches. This could explain things, but is pure speculation. Maybe someone could volunteer to update the posted patches for mainline and make the minor style corrections requested in the final review/approval? I've Cc'd Leehod and Mircea on this e-mail, so perhaps they'll be able to better explain the current status themselves. Roger --
Re: Status of SEE and Autovectorization patches?
Hi Mircea, On Fri, 5 May 2006, Mircea Namolaru wrote: > > That certainly does suggest a bug in the SEE patches. They needn't do > > anything useful on IA32/AMD64, but they should presumably either (a) not > > cause a bootstrap failure on these architectures, or (b) be disabled on > > these architectures. > > Agree. I will check the bootstrapping on x86. (a) seems preferable but > if not feasible in a short time frame, it will be (b). Given that this is more than a bootstrap problem with non-default flags, but testsuite regressions for gfortran and SPEC failures on a primary platform, I think this falls under GCC's 48 hour rule. This simply formalizes your phrase "short time frame" above, and means that it you're unlikely to come up with a solution to these problems in the next day or two, that you/we should simply disable -fsee from being turned on by default at -O3. I appreciate your efforts to actually correct the defficiencies in SEE, which is indeed preferable, but for regression breakage in stage3, its often better to simply band-aid the problem as quickly as possible, even if you're close to a fix, as a courtesy to other developers. Roger --
Re: Revert patch for MIPS TImode functions for 4.1.1
Hi Mark and Richard, On Fri, 19 May 2006, Mark Mitchell wrote: > Roger, would you please revert your MIPS MIN_UNITS_PER_WORD change > for MIPS on the GCC 4.1 branch? > > (My brain failed to digest the fact that the patch was on 4.1 as well as > on mainline, perhaps in part because there doesn't seem to be a PR; > Richard indicated to me that he would locate or open one now.) Please can you explicitly confirm that we want to reintroduce PR target/22209 as a regression from gcc 4.1.0, and break the build of libgfortran on all 64-bit MIPS platforms, including IRIX? The MIPS backend still claims that TImode is scalar_mode_supported_p, and it's therefore possible to write simple C and C++ testcases that will regress with this change. I know my testing of such a reversion will reintroduce at least 4800 new failures in the dejagnu testsuite, for no reported improvement vs gcc 4.1.0. It's a lesser of two evils issue, and it's only further development on mainline, and not a user reported problem, that's revealed the potential fallout in 4.1.0. The tradeoff is between unlinkable code in the default build on the affected platforms, vs. unlinkable code when explicitly using the -msoft-float flag, which isn't tested by the testsuite. I'm happy to revert the changes at a moment's notice, especially if that's what the realease manager and MIPS maintainers want, but I just want to be certain that everyone's aware of the tradeoffs, so that whatever the outcome, it's not my fault. I'll start a bootstrap and regression test of the reversion against the 4.1 branch straight away, it's not clear if recent backports have introduced additional dependencies, but these machines aren't the fastest on the planet, and it's still behind on evaluating the recent set of mainline patches/changes. Roger --
Re: Revert patch for MIPS TImode functions for 4.1.1
Hi Richard, On Fri, 19 May 2006, Richard Sandiford wrote: > Mark Mitchell <[EMAIL PROTECTED]> writes: > > (My brain failed to digest the fact that the patch was on 4.1 as well as > > on mainline, perhaps in part because there doesn't seem to be a PR; > > Richard indicated to me that he would locate or open one now.) > > Opened as 27681. (And Roger: sorry for all the hassle this patch has > caused you. A good deed and all that...) Indeed, no good deed ever goes unpunished. In fact, isn't it the MIPS backend's use of the GOFAST libraries which is one of the major blockers of adding -msoft-float tests to the testsuite? :-) My very strong recommendation is to add a "target-soft-float" predicate to the dejgnu testsuite's "require" predicates, that'll allow us to test -msoft-float functionality on most platforms, by explicitly disabling the handful of problematic non-libgcc soft FP targets. This will both prevent this sort of regression in the future, and actually allow us to (automatedly) confirm that you patch has improved things. The MIN_UNITS_PER_WORD trick is used on multiple platforms, including PowerPC, IA-64 and s390, so without such tests we've no way of knowing that your recent fix isn't also required on them in addition to MIPS. See, for example, http://gcc.gnu.org/ml/gcc/2003-03/msg01365.html which was the Ulrich's s390 change that inspired my problematic fix for mips.h. Roger --
Re: Revert patch for MIPS TImode functions for 4.1.1
On Fri, 19 May 2006, Mark Mitchell wrote: > > No, you can invoke it via using the attribute mode(TI) > Sure, but I'm not worried about that case. That would be the only class of C or C++ failures that I could easily construct by hand. Although the RTL optimizers will introduce TImode moves and temporary registers, it looks like they won't introduce any arithmetic/logical operations without explicit pattern support from the backend. Hence, it's only things like configure scripts that identify the presence of __int128_t and __uint128_t, and then explicitly use 128-bit operations in their source code that will be affected. This is the source of the libgfortran failure, where the runtime library detects these types are available, and uses them. It's not a problem in the gfortran compiler itself, which treats INTEGER*8, exactly the same way the C/C++ treat __int128_t. If libstdc++-3's configure checked for __int128_t and provided a specialized STL instantiation, it would exhibit the same issue. Roger --
Re: Revert patch for MIPS TImode functions for 4.1.1
On Fri, 19 May 2006, Mark Mitchell wrote: > I'm evaluating the options. It would be helpful if someone has time to > apply and test Richard's patch on 4.1, as that would let us know whether > that option is viable as well. I've bootstrapped and regression tested a backport of Richard's patch against the gcc-4_1-branch on mips-sgi-irix6.5 with no new failures. His mainline change also has no testsuite problems on the same machine. The patch applies cleanly, with the exception of some mklibgcc.in hunks, due to the fact that the _floatun* symbols were added to 4.2 and aren't available in 4.1.x libgcc, and that the LIB2FUNCS_EXCLUDE functionality isn't on the branch. For the record, the final mklibgcc.in changes that I tested are attached to this e-mail. I hope this helps. Roger -- Index: mklibgcc.in === *** mklibgcc.in (revision 113936) --- mklibgcc.in (working copy) *** *** 17,22 --- 17,23 # LIB2ADDEHSTATIC # LIB2ADDEHSHARED # LIB2ADDEHDEP + # LIB2LITERAL_CONVERSION_MODES # LIBUNWIND # LIBUNWINDDEP # SHLIBUNWIND_LINK *** echo 'dirs = libgcc' *** 54,63 echo # Library members defined in libgcc2.c. lib2funcs='_muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 ! _cmpdi2 _ucmpdi2 _floatdidf _floatdisf _fixunsdfsi _fixunssfsi ! _fixunsdfdi _fixdfdi _fixunssfdi _fixsfdi _fixxfdi _fixunsxfdi ! _floatdixf _fixunsxfsi _fixtfdi _fixunstfdi _floatditf _clear_cache _enable_execute_stack _trampoline __main _absvsi2 _absvdi2 _addvsi3 _addvdi3 _subvsi3 _subvdi3 _mulvsi3 _mulvdi3 _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 _ctzsi2 _ctzdi2 _popcount_tab --- 55,81 echo # Library members defined in libgcc2.c. + + # The floating-point conversion routines that involve a single-word integer. + # XX stands for the integer mode. + swfloatfuncs= + for mode in sf df xf; do + swfloatfuncs="$swfloatfuncs _fixuns${mode}XX" + done + + # Likewise double-word routines. + dwfloatfuncs= + for mode in sf df xf tf; do + dwfloatfuncs="$dwfloatfuncs _fix${mode}XX _fixuns${mode}XX" + dwfloatfuncs="$dwfloatfuncs _floatXX${mode}" + done + + # Entries of the form :: indicate that libgcc2.c + # should be compiled with L defined and with LIBGCC2_UNITS_PER_WORD + # set to . is the name of the associated object file + lib2funcs='_muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 ! _cmpdi2 _ucmpdi2 _floatdidf _clear_cache _enable_execute_stack _trampoline __main _absvsi2 _absvdi2 _addvsi3 _addvdi3 _subvsi3 _subvdi3 _mulvsi3 _mulvdi3 _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 _ctzsi2 _ctzdi2 _popcount_tab *** lib2funcs='_muldi3 _negdi2 _lshrdi3 _ash *** 65,70 --- 83,103 _powixf2 _powitf2 _mulsc3 _muldc3 _mulxc3 _multc3 _divsc3 _divdc3 _divxc3 _divtc3' + if [ "$LIB2LITERAL_CONVERSION_MODES" ]; then + for func in $swfloatfuncs; do + sifunc=`echo $func | sed -e 's/XX/si/'` + lib2funcs="$lib2funcs $sifunc:$sifunc:4" + done + for func in $dwfloatfuncs; do + difunc=`echo $func | sed -e 's/XX/di/'` + tifunc=`echo $func | sed -e 's/XX/ti/'` + lib2funcs="$lib2funcs $difunc:$difunc:4 $tifunc:$difunc:8" + done + else + lib2funcs="$lib2funcs `echo $swfloatfuncs | sed -e 's/XX/si/g'`" + lib2funcs="$lib2funcs `echo $dwfloatfuncs | sed -e 's/XX/di/g'`" + fi + # Disable SHLIB_LINK if shared libgcc not enabled. if [ "@enable_shared@" = "no" ]; then SHLIB_LINK="" *** fi *** 145,152 # Remove any objects from lib2funcs and LIB2_DIVMOD_FUNCS that are # defined as optimized assembly code in LIB1ASMFUNCS. for name in $LIB1ASMFUNCS; do ! lib2funcs=`echo $lib2funcs | sed -e 's/^'$name' //' \ ! -e 's/ '$name' / /' \ -e 's/ '$name'$//'` LIB2_DIVMOD_FUNCS=`echo $LIB2_DIVMOD_FUNCS | sed -e 's/^'$name' //' \ -e 's/ '$name' / /' \ --- 178,185 # Remove any objects from lib2funcs and LIB2_DIVMOD_FUNCS that are # defined as optimized assembly code in LIB1ASMFUNCS. for name in $LIB1ASMFUNCS; do ! lib2funcs=`echo $lib2funcs | sed -e 's/^'$name'[ :]//' \ ! -e 's/ '$name'[ :]/ /' \ -e 's/ '$name'$//'` LIB2_DIVMOD_FUNCS=`echo $LIB2_DIVMOD_FUNCS | sed -e 's/^'$name' //' \ -e 's/ '$name' / /' \ *** for ml in $MULTILIBS; do *** 248,263 # for name in $lib2funcs; do if [ "$libgcc_s_so" ]; then out="libgcc/${dir}/${name}${objext}" outS="libgcc/${dir}/${name}_s${objext}" echo $outS: $libgcc2_c_dep ! echo " $gcc_s_compile" $flags -DL$name -c '$(srcdir)/libgcc2.c' \ -o $outS echo $out: $libgcc2_c_dep ! echo
Re: bootstrap of trunk fails for x86-64
On Thu, 29 Jun 2006, Andreas Jaeger wrote: > Current svn does not build, it fails for me with: > build/genpreds: Internal error: RTL check: expected elt 0 type 'e' or 'u', > have 's' (rtx match_code) in write_match_code_switch, at genpreds.c:546 > > Roger, is this is a result of your changes? Grr! Sorry again for the second "typo" breakage in what should have been a very simple patch. Although I can't reproduce the above failure on my x86_64 box (or any of the other systems I tested this patch on), the RTL check is diagnosing a real problem that I'm sorry I zoned on myself. Fixed with the following patch. Committed to mainline as obvious as revision 115076, after a full 3-stage bootstrap of the gcc/ directory on x86_64-unknown-linux-gnu. My apologies yet again for the breakage, and especially to Mark. 2006-06-29 Roger Sayle <[EMAIL PROTECTED]> * genpreds.c (write_match_code_switch): Correctly use XSTR instead of XEXP to extract the operands of a MATCH_CODE rtx. Index: genpreds.c === *** genpreds.c (revision 115074) --- genpreds.c (working copy) *** write_predicate_expr (rtx exp) *** 543,550 static void write_match_code_switch (rtx exp) { ! const char *codes = (const char *) XEXP (exp, 0); ! const char *path = (const char *) XEXP (exp, 1); const char *code; fputs (" switch (GET_CODE (", stdout); --- 543,550 static void write_match_code_switch (rtx exp) { ! const char *codes = XSTR (exp, 0); ! const char *path = XSTR (exp, 1); const char *code; fputs (" switch (GET_CODE (", stdout); Roger --
Re: Splay Tree
Hi Ian, On Sun, 9 Jul 2006, Ian Blanes wrote: > I've been recently looking at the splay tree implementation. I've noticed > that this revision http://gcc.gnu.org/viewcvs?view=rev&revision=106584 > does a strange splaying. It does a bottom up splaying but starting at the > root (not the same that top down). It works but performs worst. Interesting. Richard Guenther's post for the above change, which was based upon a previous proposal by Brian Makin, suggested that this patch reduced the compile-time of his tramp3d benchmark by 1%. http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00294.html I appreciate that you've also benchmarked "typical" splay tree usage during a GCC boostrap. Any chance, that you could repeat your experiment but instead evaluate the behaviour of splay tree usage during a tramp3d compilation: http://www.suse.de/~rguenther/tramp3d/ Whilst your analysis demonstrates that splay_tree_splay is behaving curiously, I'm also interested in resolving the reported performance benefits of the different implementations. Thanks in advance, Roger --
Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
Hi Denis, On Mon, 31 Jul 2006, Jim Wilson wrote: > At the moment, there is probably no one who understands this code as > well as you do, so you may not get much help from others. In my defence, I'm struggling to get up to speed with all of the issues. The first and obvious comments are that patches should be submitted to [EMAIL PROTECTED] rather than gcc@gcc.gnu.org, we're currently in stage 3, which is a code freeze for regression fixes, and therefore missed optimization improvements such as yours can't be committed until gcc 4.2 has branched, which means reviewers tend not to review complex patches until then, and finally you really should take a look at GCC's coding standards documentation as the formatting of your submission is just all wrong. In the meantime, however, I have been running your example code, and it does indeed find better correct multipliers than the algorithm currently used by GCC. As mentioned by Jim, the current GCC algorithm is based upon the paper by Torbjorn Granlund and Peter Montgomery, "Division by Invariant Integers using Multiplication", PLDI-94. http://swox.se/~tg/divcnst-pldi94.pdf Another excellent reference describing division by integer constant algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley publishers, 2003. The strange thing is that all three of GCC, the paper and the book agree on the a preferred multiplier, which differs from the one selected by your code. Following through both sets of correctness proofs is tedious, but it looks like Granlund's work attempts to find multipliers whose residual error is less than 1/n, which is sufficient by not necessary. Your code relies on round to zero semantics of truncating division, so that provided the residual is less than 1, we'll always truncate to the correct result. One issue is that Grunlund's derivation starts with signed division and is then extended/tweaked to handle unsigned division. Your work on the other hand, starts its derivation with unsigned division. So currently I'm trying to find the catch. For which divisions are 33-bit operations required. Your proof looks reasonable and the code does the correct thing for the divisions in your examples, though the coding style issues prevent it being used in the compiler without being cleaned up. There's also a return clause where it gives up after failing to find a suitable multiplier, where falling back to the exisiting code might be a better compromise. I've not played with your algorithm enough to determine how often it triggers. Congratulations for the impressive work! You'll forgive me if it takes a while to understand why your method is better than previous works, i.e. what it is they overlooked (or that you've overlooked). To be honest all this discrete math makes my head hurt. Indeed, when its appreciated that its not the smallest multiplier that we'd prefer, but the cheapest (using shifts/adds etc...) and that choose_multiplier is potentially intertwined with synth_mult, the world starts spinning and I need to reach for the headache tablets. Roger --
RE: Issue generating GCC coverage report since r14-1625-geba3565ce6d766
Hi Martin, It's great to hear from you. My apologies for the inconvenience. I believe that the problem has been solved by Jakub's patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;f=gcc/config/i386/i386.md;h=43a3252c42af12ad90082e4088ea58eecd0bf582 I strongly suspect that the problem was that my patch was emitting "(const_int 0)" as an instruction into the RTL stream, which I'd misunderstood to be recognized as a no-op by the middle-end. This isn't the case and the correct idiom is to (also) use: emit_note (NOTE_INSN_DELETED); DONE; I can easily believe that this unintended behaviour is/was interfering with your code coverage scripts (I should study your posted results). I hope this explains things. Please let me know if things really are not fixed (or not). Cheers, Roger -- > -Original Message- > From: Martin Jambor > Sent: 16 June 2023 13:51 > To: GCC Mailing List > Cc: Roger Sayle > Subject: Issue generating GCC coverage report since r14-1625-geba3565ce6d766 > > Hello, > > we try to build coverage info for GCC for our testsuite and upload it to > https://gcc.opensuse.org/gcc-lcov/ every weekend. But since patch > r14-1625-geba3565ce6d766 (Add support for stc and cmc instructions in > i386.md) the generation broke down. However, I don't think there is something > necessarily wrong with that particular commit, at least I don't see anything > suspicious. > > I inherited the generating script from Martin Liška and have not really looked > much into it much, but it simply does the following after a fresh GCC master > checkout (I added the --disable-multilib and reduced the number of languages > to > reproduce this more quickly): > > > ../src/configure --prefix=/home/mjambor/gcc/mine/inst --enable- > languages=c,c++ --disable-bootstrap --enable-host-shared --enable-coverage=opt > --disable-multilib > make -j64 && make -j64 -k check > find gcc/testsuite/ -name '*.gcda' -exec rm -rvf {} \; # I don't know why > the > script does this > lcov -d . --capture --output-file gcc.info > > > and this last step, since the commit, when processing file > ./gcc/insn-attrtab.gcda > fails with error: > > geninfo: ERROR: mismatched end line for _Z12get_attr_isaP8rtx_insn at > /home/mjambor/gcc/mine/src/gcc/config/i386/i386.md:5776: 5776 -> 8963 > (use "geninfo --ignore-errors mismatch ..." to bypass this error) > > I tried looking briefly into the perl sources of lcov and geninfo but I am > afraid I > don't have the necessary knowledge of the language and the tool or the time to > properly debug this. So I am inclined to simply add --ignore-errors mismatch > to > lcov options, which avoids the issue, and be done with it. Nevertheless, I > thought > I'd mention this here in case anyone here has any ideas what can be going > wrong. > > Thanks, > > Martin
Re: [RFC] fold Reorganization Plan
On Sat, 12 Feb 2005, Nathan Sidwell wrote: > I question if it is better to fold early. As I've said before, I think > the optimizations that fold performs should be turned into a proper SSA > optimization phase% -- that can be repeatedly applied as necessary. As for a proper tree-ssa optimization pass, I believe Andrew Pinski's proposed tree-combiner pass which attempts to merge/combine consecutive tree statements and check whether the result can be folded, would fit your description. However, the utility of early fold to the GCC compiler is much greater than simply compile-time evaluating expressions with constant operands. One of the reasons that fold is so fast, is that it can rely on the fact that all of a trees operands have already been folded. In fact, much of the middle-end makes or can make use of the assumptions that constant operands to binary operators appear second, that a NOP_EXPRs only appear where needed, that NEGATE_EXPR never wraps another NEGATE_EXPR and that "x+x" and "2*x" are represented the same way. For example, left to it's own devices a front-end would produce a potentially unbounded tree for "!!...x". Whilst, it might be desirable to reproduce such ugliness in source code analyis tools, the fact that GCC currently only has to work minimal trees (compact by construction) both improves compile-time and memory usage. Another example is the some front-ends (cough, g++, cough) love calling save_expr at any opportunity, even on constant integers and string literals. If it wasn't for the middle-end/fold only creating such tree nodes as necessary, many analyses/transformations, would be significantly complicated. As several front-end people have suggested, calling fold whilst constructing parse trees shouldn't be necessary (as shown by the shining examples of g77 and GNAT). In reality, many of the transformations performed by fold (most? as I expect expressions with constant operands are actually fairly rare at the source level) are purely to tidy up the inefficiencies or incorrect tree representations constructed by the front-ends. If it isn't clear from the name fold_buildN, these interfaces are designed to apply the invariants expected when building trees. In the further flung future, the buildN interfaces may become deprecated and even further still trees might be marked read-only (i.e. const) outside of the middle-end (allowing static trees and more middle-end controlled tree sharing). Bwa-ha-ha-ha-ha (evil laugh)! Roger --
Re: [RFC] fold Reorganization Plan
On Tue, 15 Feb 2005, Andrew Haley wrote: > There's one other thing to bear in mind when considering the way that > fold relates to language front ends. We've seen problems in Java > where fold() returned tree nodes that Java didn't recognize: one time, > for example, it returned a BITFIELD_EXPR. I see the current java front-end's inability to lower standard tree nodes in to JVM bytecodes as a failing of the gcj implementation rather an intrinsic problem in the constant folder. The operator BIT_FIELD_REF, for example, can be seen as a canonical representation of a shift-and-mask or a mask-and-shift operation. Hence, should the tree-ssa analysis passes prefer to see "(x & 12) >> 2" or "(x >> 2) & 3" as BIT_FIELD_REF (x, 2, 2), it should be a relatively trivial task for jcf-write.c to convert BIT_FIELD_REF (x, 2, 2) back into an ishr/iand sequence. Exactly, the same problems were claimed with jcf-write's inability to handle ABS_EXPR, MIN_EXPR, MAX_EXPR, NOP_EXPR, RSHIFT_EXPR, etc... The design decision made by gcj was that rather than to provide a JVM back-end target, was to instead side-step GCC's RTL expansion and perform the lowering to Java bytecodes itself. As such, any RTL expansion functionality that failed to get implemented in gcj would appear to be a short coming of the java front-end, i.e. a sorry(), rather than a bug or problem in fold or the middle-end. My understanding is that its this failure of the java folks to finish implementing an expand_expr replacement for JVM bytecodes thats the primary motivating factor for the new "gcjx" front-end :) Roger --
Re: GCC 4.3.0 Status Report (2007-09-04)
This is an optimization pass which leads to dramatically better code on at least one SPEC benchmark. Ian, Roger, Diego, would one of you care to review this? My concern is that as formulated, conditional store elimination is not always a win. Transforming if (cond) *p = x; into tmp = *p; if (cond) tmp = x; *p = tmp; on it's own, effectively transforms a conditional write to memory into an unconditional write to memory. On many platforms, even x86, this a pessimization. For example, the "Intel Architecture Optimization Manual", available at ftp://download.intel.com/design/PentiumII/manuals/24281603.PDF in section 3.5.5 "Write Allocation Effects", actually recommends the inverse transformation. On page 3-21 they show how the "Sieve of Erastothenes" benchmark can be sped up on Pentium class processors by transforming the line array[j] = 0; into the equivalent if (array[j] != 0) array[j] = 0; i.e. by introducing conditional stores. The significant observation with Michael Matz's extremely impressive 26% improvement on 456.hmmer is the interaction between this transformation with other passes, that allow the conditional store to be hoisted out of a critical loop. By reading the value into a "tmp" before the loop, conditionally storing to the register tmp in the loop, then unconditionally writing the result back afterwards, we dramatically reduce the number of memory writes, rather than increase them as when this transformation is applied in isolation. I think the correct fix is not to apply this transformation everywhere, but to correctly identify those loop cases where it helps and perform the loop transformation there. i.e. conditional induction variable identification, hoisting and sinking needs to be improved instead of pessimizing code to a simpler form that allows our existing flawed passes to trigger. I do very much like the loop-restricted version of this transformation, and it's impressive impact of HMMR (whose author Sean Eddy is a good friend). Perhaps Mark might give revised versions of this patch special dispensation to be applied in stage 3. I'd not expect any correctness issues/bugs, just performance trade-offs that need to be investigated. Perhaps we should even apply this patch as is during stage 2, and allow the potential non-loop performance degradations to be addressed as follow-up patches and therefore regression fixes suitable for stage 3? Congratulations again to Michael for this impressive performance improvement. Roger -- Roger Sayle, Ph.D. OpenEye Scientific Software, Suite #D, 9 Bisbee Court, Santa Fe, New Mexico, 87508.