Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions

2006-10-25 Thread Roger Sayle

Hi Andrew,

On Wed, 25 Oct 2006, Andrew Haley wrote:
> I must admit to being a little perplexed by this.
>
> We have an unsafe optimization that causes bad code to be generated on
> at least one platform.  However, we want to continue to perform this
> unsafe optimization on our release branch until we are sure that
> removing it doesn't cause performance regressions.  And, perhaps, if
> removing the optimization does cause performance regressions we won't
> remove it, preferring bad code to reduced performance.
>
> Is that a fair summary?  Perhaps I'm misunderstanding what you wrote.


The issue is that there are two possible fixes to this problem...

This wrong code regression was introduced by the recent bug-fix patch
to address an aspect of PR middle-end/28690, a missed optimization
regression affecting gcc 4.2 and 4.3.  That change then exposed a
latent bug in the RTL optimizers that has been there 2002.

The ultra-conservative approach would be to simply back-out the patch
that exposed the bug, and simply accept the performance degradation.

The alternative is to progress towards "our ideal world" by actually
fixing the problem rather than working around the issue.  Unfortunately,
moving forward is itself not without risk, both to the tree's stability
and to the performance impact of removing an optimization that's been
there for years.  The fixes trade-off one performance issue against the
possibility of another.


In my professional judgement as a middle-end maintainer, the appropriate
course is to move mainline forward to better evaluate the risks and
tradeoffs of the proposed changes.  Then depending upon the outcome,
decide which of the two solutions should be backported to the release
branches.  Given the expected release timings, it makes sense to avoid
"acting in haste" and to make the right decision.

If we're in a rush, I'd push for reverting the PR28690 from the release
branch and return us to a known state, and accept the performance loss.
But if Richard Sandiford's and David Daney's patch works out, we can
correct the wrong-code issue, without the performance loss.


Once explained, I'd expect most maintainers would make precisely the
same call?

Roger
--



Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions

2006-10-25 Thread Roger Sayle

On Wed, 25 Oct 2006, Richard Sandiford wrote:
> Roger Sayle <[EMAIL PROTECTED]> writes:
> > Once explained, I'd expect most maintainers would make precisely the
> > same call?
>
> I suppose the counter-argument is that we shouldn't ship 4.2 in its
> current state.  We should either back out the patch that made
> REG_POINTER more prominent or go with the fix for this PR.

I repeat that its not an issue of does it get fixed or not, but of
which is the more suitable fix.

I think it's very safe to assume that we have at least a few days to
evaluate our options on mainline rather than make a rush decision.
With 108 serious regressions including 12 P1s, of course we shouldn't
ship 4.2 in its current state.  We've not yet had a release candidate,
or timeline for release.  I've absolutely no intention of not selecting
which fix to backport to the release branch long before the release's
due date.

My counter-counter argument would be that it's the rush to backport
changes to the release branch that has caused this problem.  If the
PR 28690 fix had had longer on mainline, such that we'd identified its
side-effect of breaking libgcj on MIPS, it wouldn't have been applied
to the branch before we'd fully understood which other changes are
also required.

The 28690 fix was fully tested on IA-32 and PowerPC and there was no
fault in applying that fix; it's fallout was unforeseeable.  There's
also no fault on the MIPS maintainers and testers for not identifying
the issue earlier.  Indeed, you came up with a corrective fix very
quickly.  But once again, this change is not 100% safe (very few
changes to the compiler ever are).  There is a very real possibility
that disabling this optimization could have a significant performance
impact on non-MIPS platforms.  Indeed, that David's seen a large
number of code generation differences means that I'd like to get a
better handle of its effects on benchmarking.



Perhaps we should discuss this further on IRC, or open up the thread
to general discussion and comment.  Back when I first started contributing
to GCC, it was not uncommon to be expected to bootstrap RTL patches on
five or six targets, including simulators and to present SPEC figures.
Several of my large changes took many months to get approved, and for all
that the statistics on how frequently mainline bootstrap was broken were
abysmal.  Even with any amount of contributor and reviewer testing, it's
inevitable that bugs slip through.  The huge size of software engineering
projects such as GCC means that the combinatorial number of interactions
between different components means that its impossible for an individual
or any goup of maintainers to predict how they'll play with each other.

My belief/position is that rather than become paralysed by this
complexity, the work-flow needs to be adapted to accomodate for it.
One of the contributing factors for the egcs-FSF split was poor
reviewer response time.  Hence, my vote is to be more accepting and
lenient of patches, instead relying on the power of GCC's community
to support patch validation.  Very few contributors have access to
MIPS hardware, or alpha hardware, or SPEC, or nullstone, etc... but
our project benefits from the fact that one of the major "unseen"
contributions is that folks regularly contribute their time to build,
test and benchmark.  The open-source principle that "given sufficient
eyes all bugs are shallow".


Rather than be (what I consider) unreasonable and require that David
run SPEC on at least both PowerPC and IA-32 to verify there's no
slow-down or quantify how severe it is, I have the option to provisionally
approve patches for mainline, and then look at Deigo's, Andreas's and
Richi's automatic benchmarking, and the results of the autotesters.  I
honestly would like to have the ability to look at a patch and say with
100% confidence and certainty, and without reservation that it should be
applied to the 4.0 branch.  However, waiting 48 or 72 hours to obtain
more feedback not only "covers my arse", but should be common sense
and "best practice".


Now I completely agree that there is a real downside to this approach.
The strategy leads to the additional complexity of large numbers of
patches in flight, applied to different branches and in different
orders.  Fortunately for us, we have excellent engineering practices
in place to keep track of this.  Bugzilla (and the bug masters)
continually maintains an up to date list of which bugs are present
on which branches.  A release manager can instantly see which bugs
still affect an upcoming release, *and* see that they may have been
fixed on mainline.

It's also no additional overhead to the contributor.  Our rules for
applying patches to the branches require that each patch must be
tested against the branch to whic

Re: [PATCH] Fix PR29519 Bad code on MIPS with -fnon-call-exceptions

2006-10-25 Thread Roger Sayle

On Wed, 25 Oct 2006, David Daney wrote:
> The patch is fully tested and ready to go for the 4.2 branch.

The last thing I want is for this fix to get delayed whilst we argue
over patch testing/approval policy.  This fix addresses the known
wrong-code issue, and at worst may replace it with missed optimization
opportunity.  Hence although we don't know for sure whether this is
better than reverting the patch that caused the regression, it's
certainly better than where 4.2 is now, and keeps us sync'd with
mainline.

Ok for the 4.2 branch.


Sorry for the confusion.  Hopefully, there'll be no more surprises.

Roger
--



Re: [PING] fwprop in 4.3 stage 1?

2006-10-31 Thread Roger Sayle

Hi Paolo,

On Mon, 30 Oct 2006, Paolo Bonzini wrote:
> Given that Roger started the ball rolling, by approving Steven's
> -fcse-skip-blocks patch, I'll ping the discussion...
>
> http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01066.html

I believe the appropriate next step is to freshen these patches against
mainline, and re-bootstrap and regression test them.  Not only do
the patches you cite date back to last May, but they haven't been
revised to reflect the reviewer comments they recieved.  For example,
Joseph Myers pointed out they contained the wrong FSF address.

I might also be getting confused, but I also remember that supporting
patches were committed to the tree, to address some of the performance
issues with these patches, so benchmarking should look better now.


I foresee no problems in getting the fwprop pass merged into mainline
this week.  One detail I would like resolved however, is if you and
Steven Bosscher could confirm you're both co-ordinating your efforts.
Presumably, adding fwprop is part of the agreed upon game-plan, and
not something that will complicate Steven's CSE efforts.

Roger
--



Re: GCSE again: bypass_conditional_jumps -vs- commit_edge_insertions - problem with ccsetters?

2006-11-01 Thread Roger Sayle


On Tue, 31 Oct 2006, Dave Korn wrote:
> Tracking down a gcse bug while unrolling a loop where the count is
> known to be one, I've narrowed the problem down to the actions of
> commit_edge_insertions and bypass_conditional_jumps, and I could use
> a little help in appreciating the reasoning behind what they're /trying/
> to do.

Sorry for the delay, but I only read the "gcc" list infrequently and
have only just seen your question.


> Is the bug perhaps because the bypass code notes that there are two code
> paths from BB#2 to BB#7, notices that the indirect one depends on a
> condition that it knows is true, but doesn't realise that that doesn't
> mean they are equivalent to each other?

Aftre reading through your analysis I suspect this is precisely the
problem.  The jump bypassing code is making the assumption that there
can temporarily be two (or more) edges between the same pair of basic
blocks.  The distinction being that one edge has an instructin inserted
on it, and the other one doesn't.

I suspect that somewhere this is falling foul of some other CFG related
optimization that says if both branches of a conditional jump go the same
label, then the conditional jump is redundant and can be replaced by an
unconditional jump.  One the CFG believes that BB#2 has only one sucessor,
then commit_edge_insertions is free to place things there.

Your investigation and understanding is flawless so far.  We now need
to figure out how these edges are getting merged.  If this is a
side-effect of recent CFG/RTL related changes jump bypassing might
need to be restructured/rewritten to avoid using the insn on edge
functionality.


Steven Bosscher might even have plans for reorganizing jump bypassing
already as part of his CSE/GCSE overhaul?

Roger
--



Re: Unsure about a new warning in mainline

2007-01-14 Thread Roger Sayle

Hi Manuel and Paolo,

On Sun, January 14, 2007 3:59 pm, Paolo Carlini wrote:
>>> By the way, "new" also wrt current 4_2-branch, in the sense that the
>>> latter doesn't emit *any* overflow warning for the snippet in 30465,
>>> with explicit -Wconversion too...

I think I can explain everything.  The first issue, is that although the C
front-end has issued these warnings for years, a recent change to the C++
front-end means that it now emits them too.  This is why Paolo hadn't seen
them before, but from the C/middle-end perspective the code is unchanged.

The second issue is that there are two different overflow related
warnings.  The first warning indicates that an overflow has occurred in an
expression, and the second is that the overflowed result is used in a
context required
to be a compile-time constant.  Apparently, the latter is more serious
than the first, as an expression that overflows isn't considered a
"constant" according to the/some standard.

Manuel's confusion here is that recent changes mean that we now only emit
one warning of the first kind in an expression.  Previously, for suitably
nasty expressions we'd generate multiple such warnings.  As for example,
in "- - - - - - INT_MIN".  Formerly, we'd issue one "overflow warning" per
negation.  However, we still continue to generate one warning for any
overflow in an expression, and another if the result of that expression is
used in a context required to be a constant.

Hopefully, these differences can be seen by compiling the problematic code
given in PR 30465 with "gcc" then with "g++".

I agree with Paolo that this is a change for C++, and should at least be
documented in changes.html, and probably be tweaked to avoid warning in
system headers.  However, it's also odd that in this respect C has had
stricter/better diagnostics that C++ for some time.

Specifically, for PR 30465 "((T)1 << 31) - 1" is potentially undefined
when  T is a 32-bit signed type, but well-defined if T is unsigned or
wider than 32-bits.

I hope this helps.

Roger
--



Re: [patch] fold-const.c: Reorganize fold - Part 1/n

2005-03-02 Thread Roger Sayle

I should probably also add that if any of the branches scheduled for
merge prior to 4.1 contain/require changes to "fold", perhaps for
additional transformations they need, I'll be glad to review/approve
those specific changes now, ahead of their branch's planned merge dates
to allow them to synchronize fold-const.c prior/during the current
"reconstruction".  This should minimize or eliminate the conflict
headaches for pending branch maintainers (for this source file at
least).

I hope this seems reasonable to everyone.  During this stage of
reorganization there are no external/visible API changes, so source
files other than fold-const.c are/will be unaffected.

Hopefully everyone should find this satisfactory.  Mark, are you
OK with this single source file exception/tweak to the current
plan/rules?  I'd expect few, if any, of the development branches
contain any changes to fold, and they're welcome to maintain those
changes locally until the prescribed time, but I thought I'd make
the offer anyway.

Roger
--



Re: A headache with fold_ternary and CALL_EXPR.

2005-03-03 Thread Roger Sayle

On Thu, 3 Mar 2005, Kazu Hirata wrote:
> If we want to change fold_builtin to take operands like op0, op1, and
> op2, I would have to change so many things that depend on
> fold_builtin.  (Note that the subroutines of fold_builtin also depends
> on fold_builtin in a sense that they need the original tree, one whose
> TREE_CODE is CALL_EXPR is available.)
>
> So the question is: Is it worth changing all these?  Or should I stop
> folding builtins in fold_ternary?

I don't think this is as bad as it first appears.  Most of the
individual fold_builtin_foo functions in builtins.c take the
decomposed CALL_EXPR, i.e. the fndecl, the arglist and the type.
I'm fairly sure those that take the full tree "exp" are relatively
easily tweaked to take fndecl, arglist and type instead.

Following on from the current strategy (game plan?) for fold, I think
we should do something identical with fold_builtin.  We keep the
external interface to fold_builtin the same, but underneath the
surface we change fold_builtin_1 to take the individual components.

For the time being we could probably move the:

  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD)
return targetm.fold_builtin (exp, ignore);

from fold_builtin_1 to fold_builtin which has the full tree.  Once
the transition of fold_builtin_1 to fndecl/arglist/type is complete
we can export it outside of builtins.c, and call this API from your
new fold_ternary.


I appreciate its more work, but it keeps with the philosophy for
this reorganization.

> Thoughts?

Likewise?

Roger
--



Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary

2005-03-07 Thread Roger Sayle

Hi Alex and Mark,

On 7 Mar 2005, mark at codesourcery dot com wrote:
> Yes, I understand.  You still need to take it up with Roger, though.

My apologies to both of you for being curiously/annoyingly silent on
this is issue.  I've been getting up to speed on the internals of the
C++ parser, in order to come up with a correct solution to this problem.

For example, I believe that Alex's proposed solution to PR c++/19199
isn't an appropriate fix.  It's perfectly reasonable for fold to convert
a C++ COND_EXPR into a MIN_EXPR or MAX_EXPR, as according to the C++
front-end all three of these tree nodes are valid lvalues.  Hence it's
not this transformation in fold that's in error.

It also doesn't help with the problematic C++ extension:

(a >? b) = c;

which use MIN_EXPR/MAX_EXPR as a valid lvalue in C++ without any help
from "fold".


In all fairness, though I suspect most of the middle-end is currently
set-up to handle the implications of this.  For example, in several
places we perform transformations that assume MIN_EXPR and MAX_EXPR
are commutative, but this is only true for lvalue MIN_EXPR and lvalue
MAX_EXPR.  For rvalue MIN_EXPR and rvalue MAX_EXPR, the semantics need
to specify a reference to the first operand is returned for values
comparing equal.  I also suspect that the gimplification process may
not be preserving the lvalue-ness during lowering.

Simply disabling the COND_EXPR -> MIN_EXPR/MAX_EXPR transformation is
also likely to be a serious performance penalty, especially on targets
that support efficient sequences for "min" and "max".  Clearly, it's
always safe to only perform this when one side or other is an rvalue,
but taking the maximum of two user defined variables, I would expect
to be a fairly common instance.


My current thinking of how best to solve this falls into one of three
plans.


The first is to see if in the C++ parser provides enough information
such that it knows we're if parsing an lvalue or an rvalue.  In these
cases, the C++ front-end can avoid calling "fold" on the COND_EXPR its
creating in build_conditional_expr.  This only impacts that C++ front-end
and doesn't impact performance of C/fortran/java code that may usefully
take advantage of min/max.

If the parser can't be used to provide useful context, we could
potentially postpone the calling of "fold" for C++ COND_EXPRs during
parsing, but call "fold" on COND_EXPRs while lowering to gimple, where
we'll always know whether we're expecting lvalues/rvalues.

An improvement on this approach is for the C++ front-end never to create
COND_EXPR, MIN_EXPR or MAX_EXPR as the target of MODIFY_EXPRs, and lower
them itself upon creation.  By forbidding them in "generic", support for
COND_EXPRs as lvalues can be removed from language-independent
gimplification, fold, non_lvalue, etc... greatly simplifying the compiler.

Finally, the third approach would be to introduce new tree codes for
LCOND_EXPR, LMIN_EXPR and LMAX_EXPR that reflect the subtley different
semantics of these operators when uses as lvalues.  If these were then
created by the C++ front-end when a lvalue is potentially required, it
would provide a convenient mechanism for the middle-end to distinguish
which transformations and implementation/lowering strategies are
appplicable.



My final comment is that in the lowering of (a ? b : c) = d by using
a pointer temporary runs into the problems with bitfields "b" and "c",
and marking of objects addressable when they need not be.
i.e.

t = a ? &b : &c;
*t = d;

runs into problems.  A better approach it to use a temporary for "d".
i.e.

t = d;
if (a)
  b = t;
else
  c = t;

If C++ semantics don't allow for a new temporary (i.e. a SAVE_EXPR) in
this case, then we can do far worse than to duplicate "d", i.e.

if (a)
  b = d;
else
  c = d;


Sorry again for my hesitancy in replying, but I'm desparately trying
to get up to speed on the options available to the C++ front-end before
recommending the correct long term fix.  Does anyone have opinions on
each of the above approaches?  [I predict the tree-ssa folks will argue
we disallow COND_EXPRs as lvalues, and the C++ folks will argue that
knowing whether an expression appears before "=" is difficult (requiring
deep lookahead) at parse time.]

"Stuck-in-the-middle"-end!

Roger
--



Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary

2005-03-07 Thread Roger Sayle

On Mon, 7 Mar 2005, Mark Mitchell wrote:
> Roger Sayle wrote:
> > For example, I believe that Alex's proposed solution to PR c++/19199
> > isn't an appropriate fix.  It's perfectly reasonable for fold to convert
> > a C++ COND_EXPR into a MIN_EXPR or MAX_EXPR, as according to the C++
> > front-end all three of these tree nodes are valid lvalues.  Hence it's
> > not this transformation in fold that's in error.
>
> Only sort-of: you added the support for MIN_EXPR/MAX_EXPR as lvalues in
> build_modify, but not the same kind of back-to-COND_EXPR transformation
> elsewhere.

I truly hope you're not trying to suggest that it was me that introduced
the concept of MIN_EXPR and MAX_EXPR as lvalues into the C++ front-end:

cvs annotate -r1.1 cp/tree.c
1.1  (law  11-Aug-97): int
1.1  (law  11-Aug-97): real_lvalue_p (ref)
1.1  (law  11-Aug-97):  tree ref;
1.1  (law  11-Aug-97): {
...
1.1  (law  11-Aug-97):   switch (TREE_CODE (ref))
1.1  (law  11-Aug-97): {
...
1.1  (law  11-Aug-97): case MAX_EXPR:
1.1  (law  11-Aug-97): case MIN_EXPR:
1.1  (law  11-Aug-97):   return (real_lvalue_p (TREE_OPERAND 
(ref, 0))
1.1  (law  11-Aug-97):&& real_lvalue_p (TREE_OPERAND 
(ref, 1)));


I'm just a guy trying to sort out the mess.  Clearly, a fraction of the
compiler currently supports the use of COND_EXPR, MIN_EXPR and MAX_EXPR
as lvalues and another fraction doesn't.  This issue predates CVS revision
1.1 (a.k.a. the dawn of time).

The very simple decision that we face is whether to change the compiler
consistently to allow lvalue COND_EXPRs everywhere, or to consistently
disallow them everywhere.  Most of this decision making to date, has
been on how easy C++ is to parse/how well constructed the C++ parser is.
Disabling compiler optimizations to workaround a few instances of this
problem isn't a "fix".

Getting us out of this mess has only become possible with the advent
of tree-ssa, and tree walking passes between parsing and RTL expansion.


Roger
--



Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary

2005-03-07 Thread Roger Sayle

On Mon, 7 Mar 2005, Mark Mitchell wrote:
> Roger Sayle wrote:
> > I truly hope you're not trying to suggest that it was me that introduced
> > the concept of MIN_EXPR and MAX_EXPR as lvalues into the C++ front-end:
>
> I thought you were the person who introduced changes to fold that caused
> it to generate these expressions when the GNU source extension did not
> appear.  Anyhow, who did what to whom isn't important. :-)

I do appreciate the smiley, but I still feel unfairly maligned by such
accusations.  Once again, fold-const.c CVS version 1.1, converts
COND_EXPRs into MIN_EXPRs and MAX_EXPRs, though admittedly under the
control of "pedantic_lvalues".  I suspect that with the various
rearrangements of GCC's ANSI/ISO warnings since then, the distinction
got lost.

I agree "who did what to whom isn't important", but it is important
(to me) that neither subject nor object was me! :-)   I'm sure you'll
agree that its bad for the community, if the handful of people fixing
problems get blamed for causing them.  I've certainly been fixing
bugzilla PRs under the assumption that there'd be no stigma attached
to fixing wrong-code regressions.  If I only fixed problems caused by
me, I'd have an easy life.



> Actually, I think it's always been fixable.  It's not a question of
> extra passes; it's just a question of ordering.  We could have done fold
> at the end of a statement (rather than while building up a statement)
> even back in GCC 2.8, and that would have fixed the problem.

Depends upon your definition of a "pass".  fold isn't recursive, i.e.
it only rearranges the top-level node of the tree assuming that all of
it's subtrees/operands have already been folded.  Hence, to perform
"fold" at the end of a statement would still require an additional
post-order traversal of the entire tree.  Whether its done per statment,
per function or per compilation unit, isn't usually significant in the
definition of what constitutes a pass.


I still believe the issue is resolvable for 4.0, but I just haven't
completely figured out the safest, least-intrusive change to do it yet.

Roger
--



Re: [Bug c++/19199] [3.3/3.4/4.0/4.1 Regression] Wrong warning about returning a reference to a temporary

2005-03-07 Thread Roger Sayle

On Mon, 7 Mar 2005, Richard Henderson wrote:
> On Mon, Mar 07, 2005 at 08:55:14AM -0700, Roger Sayle wrote:
> > For rvalue MIN_EXPR and rvalue MAX_EXPR, the semantics need
> > to specify a reference to the first operand is returned for values
> > comparing equal.
>
> NOT true.  And the docs explicitly say so.

Which docs?!  There's currently *no* documentation for MIN_EXPR
or MAX_EXPR in c-tree.texi.  Indeed the only references to these
within the entire docs subdirectory is for calling REAL_ARITHMETIC.
And the commentary above MIN_EXPR and MAX_EXPR in tree.def also has
absolutely nothing to say on the matter.  Even the descriptions of
the extensions ">?" and " /* In C++ a ?: expression can be an lvalue, so put the
>operand which will be used if they are equal first
>so that we can convert this back to the
>corresponding COND_EXPR.  */


And as an example of why this is relevant, see bugzilla PR c++/7503,
particularly the example in comment #1.


As has been described earlier on this thread, GCC has folded the C++
source "(a >= b ? a : b) = c" into "MAX_EXPR (a,b) = c" and equivalently
"(a > b ? a : b) = c" into "MAX_EXPR (b,a) = c" since the creation of
current CVS.  The correct behavior of this code relies on (and has always
relied on) the fact that the middle-end prescribes a behaviour for
equal comparison.


In balance, to support your side of the argument, ever since the same
creation of current CVS, the middle-end has considered MIN_EXPR and
MAX_EXPR to be commutative operations.


We're screwed if we do and screwed if we don't.  Basically the lvalue
forms of these tree nodes have different algebraic properties to the
rvalue forms.


I'm looking forward to any clarification you can make to your comments.

Roger
--



Re: expand_binop misplacing results?

2005-03-21 Thread Roger Sayle

Hi DJ,

On Mon, 21 Mar 2005, DJ Delorie wrote:
> 2005-03-21  DJ Delorie  <[EMAIL PROTECTED]>
>
>   * optabs.c (expand_binop): Make sure the first subword's result
>   gets stored.

This is OK for mainline, provided that you bootstrap and regression
test it somewhere.  Thanks.  You're quite right that callers can't
rely on expand_binop placing the result in target, however most
backends and RTL expansion sequences try hard to honor the request.
There does appear to be a bug in the "i == 0" case, which has probably
never been an issue as most targets are either able to place the
result of the addition/subtraction in the requested destination or
provide their own adddi3/addti3 expanders.

Thanks for finding/fixing this.  This might be a candidate for backporting
to the GCC 4.0 branch if we can find a target/testcase that triggers a
problem.

Roger
--



Re: [PATCH] Cleanup fold_rtx, 1/n

2005-04-13 Thread Roger Sayle

On Wed, 13 Apr 2005, Steven Bosscher wrote:
> My gcov attachment to PR19721 shows that the code does not
> trigger on amd64 either.

Hi Paolo and Steven,

I'm still reading through the patch, but I think the wider issue
of using profiling to determine which of GCC's transformations are
worthwhile and which are not deserves some broader discussion.

I admit that fold_rtx is perhaps a special case, as we're trying
to eliminate/reduce the significant amount of compile-time spent
in CSE.  More generally, I'm uncomfortable with using profile
techniques that determine whether a piece of code ever triggers
to decide whether it has benefit or not.  Such protocols are
inherently dependent upon the source code corpus being used.
In this case the comments in PR19271, suggest that you're using
571 source files from GCC's own cc1.

I suspect that in this entire "coverage suite", there isn't a
single "complex" or "vector" operation, and the total number of
floating point operations is severely under-represented compared
to SPECfp for example.  We know that cc1 doesn't link against
the host's libm, for example, so none of GCC's math builtins
would be triggered.

A slightly better approach would be to include the dejagnu
testsuite in the coverage analysis, as the preferred policy of
adding a new testcase to trigger each transformation/optimization
might help here.  Of course, this policy isn't rigorously enforced,
so I'd still be uncomfortable removing code based simply on the
fact that we couldn't get it to trigger.


Of course, there are no clear black-and-whites in such arguments
and everything is a compromise.  For example, even if well-written
pattern matching code should take little extra time if they never
match (e.g. unreachable cases in switch statements), the additional
code leads to larger/slower binaries and presents a maintenance
overhead.  Alternatively, code that never triggers can't cause
any bugs (but may bit-rot until it does trigger at some point in
the future).


My personal preference remains that these RTL optimizations should
be moved from fold_rtx to simplify_rtx when possible, and those that
depend upon the CSE'ing different functional forms are left.  For
example, one of the "???" comments in your patch recommends that that
transformation be moved to combine.c, which is probably as true today
as it was when it was first written.


Thoughts?  Opinions?


I'm not trying to be a road block on the path of progress, but I'd
just like to get a feel for where we intend to draw the line with
these protocols.  If taken to extremes, these policies can clearly
be dangerous (if none of these cc1 files contains K&R prototypes,
perhaps we could drop parser support for pre-ANSI C, etc...).

Roger
--
Roger Sayle, E-mail: [EMAIL PROTECTED]
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833



My opinions on tree-level and RTL-level optimization

2005-04-16 Thread Roger Sayle

I seem to have inadvertantly annoyed Steven Bosscher on IRC,
so by way of an apology and explanation I thought I'd post my current
opinion and thinking on the optimizations performed by GCC's middle-end
both at the tree-level in the tree-ssa optimizers and at the
RTL-level in the exisiting RTL-optimizers.  The intention is to
(i) explain my current position, understanding and misunderstandings
and (ii) for people to disagree and/or convince me otherwise.


My current opinion is that for the medium-term, let's arbitrarily
say prior to GCC 5.x, GCC will continue to need to perform optimizations
at both the tree-level and the RTL-level.  The tree-level, I see as
fundamentally responsible for high-level and machine independent
transformations, and the RTL optimizers for the lower-level
machine-dependent transformations.

It's unlikely that RTL, or an instruction-based, representation of
a program will ever go away.  And the RTL passes corresponding to
instruction selection (combine), register allocation (reload) and
scheduling will almost certainly be RTL-level tasks/passes.

But the advent of the tree-ssa infrastructure, the other RTL
"optimization" passes, notably CSE, GCSE, loop, if-conversion, and
jump bypassing may not be long for this world.  And certainly not
preserved in their current incarnations.


The difficulty here is that by expressing a program as low-level
insns, we expose optimization opportunities not visible as trees.
The RTL expanders and backend's pattern expanders inadvertantly
introduce redundancies and inefficiencies, or more commonly by
providing machine-specific idioms open up code transformations,
for example with SUBREGs, that just didn't occur previously.  For
example, expand_mult may produce a sequence of shifts and additions
for an integer multiplication.  If there are two integer
multiplications by a constant in a basic block, CSE will be able
to identify shared terms between their shift/add sequences that
aren't visible to the tree-level multiplications.

One might argue that the shifts/adds should be exposed at the
tree-level.  But not only does this require detailed knowledge
of the instructions available on a machine, but it would also
obfuscate the multiplication from the tree-ssa optimizers, such
as iv-opts, etc...  Instead, representing the multiplication at
the tree-level as a MULT_EXPR, propagating constants into it,
and letting expand do it's thing gets the best of both worlds.


This doesn't constrain us to keep all of GCC's original RTL
optimizers though.  Unlike gcc 3.x and earlier, we now expect
tree-ssa to do most of the heavy lifting.  One goal is to have
the initial RTL that we generate to be pretty close to it's
final form.  Perhaps only requiring minor clean-ups.

I forsee a world in which RTL transformations become far more
local.  Perhaps the RTL-level loop optimizers will disappear
completely.  Although, RTL expansion may introduce new loops,
these tend to be rare, and the expanders have all the information
they need to hoist/sink invariant expressions and unroll/peel
themselves.  Certainly there should be no need for RTL-level
loop optimizations to do loop unrolling or other large scale
reorganization.

Simiarly, CSE shouldn't need to process more than a single
basic blocks, and GCSE shouldn't need to move anything other
than simple expressions.  The quality of alias analysis at
the RTL-level shouldn't be an issue.


This brave new world also has a profound impact on how backends
are written.  Previously, because with the exception of "fold"
and "expand", all optimization was performed on the RTL-level,
backend's played tricks to virtualize their hardware to the
middle-end in an attempt to allow the RTL optimizers to understand
what was going on.   Virtual high-level patterns would be introduced
for the benefit of the RTL optimizers, only to be split later.  The
problem with this is that by splitting/lowering after the RTL
optimizers any inefficiencies that are introduced are too late
to be cleaned up.  Perhaps one of the most notorious examples of
this is the handling of DImode on IA-32.  The i386.c backend
pretends that it has native DImode support prior to reload.
Whilst in gcc 3.x, this is essential, constant propagation and
constant folding wouldn't work with highparts and lowparts, and
for example, the loop analysis would be unable to do anything
with a loop using a "long long" counter once the loop exit test
has been decomposed into highpart and lowpart subtests.  Unfortunately,
the late splitting leads to terrible register allocation, and numerous
dead loads/stores to/from memory.  SPECint's crafty suffers.

With tree-ssa, however, these design decisions can and need to be
re-evaluated.  Now that a "long long" for-loop will be reversed at
tree-level, the x86 backend is far better of decomposing DImode
early, and getting the benefit of RTL optimizers to eliminate the
dead/load stores etc...

One side-effect of such a move is that although we genera

Re: My opinions on tree-level and RTL-level optimization

2005-04-17 Thread Roger Sayle

On Sat, 16 Apr 2005, Richard Kenner wrote:
> Although, RTL expansion may introduce new loops, these tend to be
> rare, and the expanders have all the information they need to
> hoist/sink invariant expressions and unroll/peel themselves.
>
> I disagree.  In order to make the proper decisions about merging givs
> and chosing which giv should represent a biv, you have to know a lot
> about the valid addressing modes on the machine and this isn't something
> the tree level optimizers should have to deal with.
> ...
>
> Simiarly, CSE shouldn't need to process more than a single basic
> blocks,
>
> Again, not clear.  Certainly the costly stuff I put in ages ago to
> walk through comparisons and around loops needs to go, but there's
> no reason to tie CSE to a basic block: it can operate until the next
> label, like it does now.  Admittedly, the number of CSE opportunities
> won't be great, but why restrict them to a basic block?
>
> and GCSE shouldn't need to move anything other than simple
> expressions.
>
> Why would we need a GCSE at the RTL level at all?  I'd guess the number
> of wins it would produce would be very small.
>
> The quality of alias analysis at the RTL-level shouldn't be an issue.
>
> Here I disagree the strongest!  Instruction scheduling is rapidly
> becoming one of the most critical optimizations and must be done at
> the RTL level.  The quality of instruction scheduling depends quite
> heavily on the quality of the aliasing information.


Thanks for your feedback.  I agree with all your points.  I had
greatly underestimated the importance of RTL alias analysis, especially
with respect to scheduling.  The compile-time hog issues with CSE are
primarily due to deep path-following (AFAIU).  Simple/cheap tricks are
such as extended basic blocks are clearly a win, but their benefit depends
on whether we keep GCSE. If we don't then EBBs of course, if we do, the
EBBs are probably subsumed by GCSE.  And its very true at the RTL-level
addressing modes have always been an Achilles' heel.  But whilst
induction variable selection is very dependent upon target parameters,
as is prefetching, but it's not yet clear whether uglifying tree-ssa or
continuing to improve RTL loop analysis is the better long-term strategy.

I take it from your comments, that you are in the camp that believes
that "the sun has not yet set" on the need for RTL optimizers. :-)
The intent of my post was to defend what was seen as my pro-RTL
stance, it's interesting to see that the loudest feedback is that
I'm underestimating/down-playing the importance of the RTL optimizers.

Much appreciated,

Roger
--



Re: My opinions on tree-level and RTL-level optimization

2005-04-18 Thread Roger Sayle

On Mon, 18 Apr 2005, Paolo Bonzini wrote:
> Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa!  How
> would you do it?  Take
>
>  long long a, b, c;
>  c = a + b;
>
> Would it be
>
>  c = ((int)a + (int)b)
>  + ((int) (a >> 32) + (int) (b >> 32)
> + ((unsigned int) a < (unsigned int) b)) << 32;
>
> Or will you introduce new tree codes and uglifying tree-ssa?
> Seriously...


I think you may have misinterpreted or over-interpreted
what I meant by performing optimizations/lowering, earlier vs.
later.  When I suggested the i386.c backend should lower DImode
operations earlier, my intention was to lower during RTL expansion
instead of where it currently happens after reload.  Whilst doing
it in tree-ssa would technically also be "doing it earlier", this
isn't really what I had in mind.

The fact that DImode moves aren't lowered, and that the high and
low parts of a DImode register can't be independently live/dead
(for register allocation) is the reason for PR17236, and why GCC
generates poorer code than Intel on something as simple as "a * b".


Not that it would be impossible to do this with trees.  It's not
uncommon for representations to be lowered, for example high-level
trees to low-level trees.  RTL has always done this, where addressof
could be seen as a lowering pass, as can combine, reload, regstack
and the process of splitting,  where the RTL representation afterwards
is a more accurate description of the generated code than the RTL
representation before.

Roger
--



Re: GCC 4.0, Fast Math, and Acovea

2005-04-30 Thread Roger Sayle

On Fri, 29 Apr 2005, Scott Robert Ladd wrote:
> I've been down (due to illness) for a couple of months, so I don't know
> if folk here are aware of something I discovered about GCC 4.0 on AMD64:
> -ffast-math is "broken" on AMD64/x86_64.

Hi Scott,

I was wondering if you could do some investigating for me...

The change in GCC was made following the observation that given
operands in SSE registers, it was actually faster on x86_64 boxes
to call the optimized SSE implementations of intrinsics in libm,
than to shuffle the SSE registers to x87 registers (via memory),
invoke the x87 intrinsic, and then shuffle the result back from
the x87 registers to SSE registers (again via memory).

See the threads at
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01877.html
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02119.html

(Your benchmarking with acovea 4 is even quotetd in
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02154.html)


Not only are the recent libm implementations faster than x87 intrinsics,
but they are also more accurate (in terms of ulp).

This helps explains why tbp reported that gcc is faster than icc8.1
on opteron, but slower than it ia32 (contradiciting your observations).


Of course, the decision to disable x87 intrinsics with (the default)
-fpmath=sse on x86_64 is predicated on a number of requirements.  These
include that the mathematical intrinsics are implemented in libm using
fast SSE implementations with arguments and results being passed and
returned in SSE registers (the TARGET64 ABI).  If this isn't the case,
then you'll see the slowdowns you're seeing.  Could you investigate if
this is the case?  For example, which OS and version are you using?

And what code is being generated for:

double test(double a, double b) {
   return sin(a*b);
}


One known source of problem is old system headers for , where
even on x86_64 targets and various -fpmath=foo options the header files
insist on using x87 intrinsics, forcing the compiler to shuffle registers
by default.  As pointed out previously, -D__NO_MATH_INLINES should cure
this.

Thanks in advance,

Roger
--
Roger Sayle, E-mail: [EMAIL PROTECTED]
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833



Re: GCC-3.3.6 release status

2005-04-30 Thread Roger Sayle

On 30 Apr 2005, Gabriel Dos Reis wrote:
> There were 36 PRs open against it (as of this morning, 6am, GMT-05).
> I wnet through all of them and the appeared to be very  minor or
> impossible bugs to fix in 3.3.6 major restructuring (typically, they
> were bugs fixed in 3.4.x or 4.0.x). Two of them were pretty simple to
> handle.  Others were closed as  "fixed in 3.4.x, won't fix in 3.3.6".
> Except this:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19579
> for which I would appreicate inputs from both the author and Roger.

My apologies for the delay.  Yes, this should be safe for the 3.3
release branch, as mentioned in the original approval:
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01714.html.

The cause of my delay was to investigate whether the code on the
gcc-3_3-branch was significantly different or whether there was
some other reason why this fix hadn't yet been committed there.
If I get a few hours, and nobody beats me to it, I'll try a bootstrap
and regression testing a backport of Jakub's fix myself.

Roger
--



Re: On fold_indirect_ref

2005-05-15 Thread Roger Sayle

On Sun, 15 May 2005, Richard Guenther wrote:
> What can be done about this issues?  First, we can use VIEW_CONVERT_EXPR
> unconditionally in fold_indirect_ref and only break some optimizations
> (like temp1.C).  Second, we can declare fold_indirect_ref being unsafe
> about types and deal with this in its callers where we possibly know
> about if we're dealing with lvalues or rvalues, using either NOP_EXPRs
> or VIEW_CONVERT_EXPRs as appropriate.  We could ease this with providing
> wrappers around fold_indirect_ref (or a flag) - but checking if we're
> (possibly) dealing with a lhs is not possible, so uses may remain
> unsafe.

Exactly which optimization do we miss by changing:

  /* *&p => p */
- if (lang_hooks.types_compatible_p (type, optype))
+ if (type == optype)
return op;

This is the least intrusive and most correct change.  Normally, in
fold-const.c the invariant is (or at least should be) that

gcc_assert (TREE_TYPE (t) == TREE_TYPE (fold (t))

Returning a tree with any type other than "type" from fold_indirect_ref
will potentially screw something up.  Probably the reason that fold_unary
doesn't call fold_indirect_ref is precisely because it is unsafe.


If there are significant optimizations that require playing fast and
loose with optimization, then these really should me moved (back) out
of fold-const.c, to gimplify.c where the tree-ssa passes have a less
strict type system.  Creating a gimplify_indirect_ref, or better still
both a gimplify_lhs_indirect_ref and a gimplify_rhs_indirect_ref would
allow these.

Not only, as you mention in your post, can't

float x;
*(int *)&x = 0x8000;

validly be reduced to (int)x = 0x8000, because of the need
for a cast on the lhs, but similarly, this transformation is just
as unsafe for the RHS, where

return *(int*)&x;

really isn't equivalent to (int)x, even when the front-end's language
hooks allow assignment/conversion from floats to ints and ints to
floats.  Interpreting a float's bits as an integer isn't the same as
implicitly casting the float to an integer.

Likewise

int y;
return *(char*)y;

isn't always the same as (char)y depending up on the endianness of
the machine, even if ints can be assigned to chars by the front-end.


In the above cases, the usual casting operators NOP_EXPR and CONVERT_EXPR
are the wrong things to use.  If these transformations are to be
performed at all, on both LHS and RHS, they must use VIEW_CONVERT_EXPR.
At the moment, it looks like if ever "optype != type" we're assuming
that the callers implicitly perform fold_convert or equivalent, which is
clearly the wrong form of conversion to use.  The issue is that if ever
fold returns a type other than it's arguments, there's an ambiguity as
which form of conversion to use to correct it.  In these cases, the
assumption that fold_convert can be used is incorrect.

You mentioned on IRC that many of the optimizers aren't expecting
VIEW_CONVERT_EXPRs, in which case it really is best to leave these
indirect references through casts of ADDR_EXPRs in their original form.

Roger
--



Re: On fold_indirect_ref

2005-05-15 Thread Roger Sayle

On Sun, 15 May 2005, Richard Guenther wrote:
> > Exactly which optimization do we miss by changing:
> >
> >   /* *&p => p */
> > - if (lang_hooks.types_compatible_p (type, optype))
> > + if (type == optype)
> > return op;
>
> I don't know - maybe stripping sign casts.  But if we use equality
> comparison here we can as well use STRIP_TYPE_NOPS instead of
> STRIP_NOPS - but the patch doing so caused some optimization regressions.

You're confusing the types of the pointers vs. the types of what they
point to.

The call to STRIP_NOPS at the top of fold_indirect_ref is stripping
the conversions between pointer types. Both your recent change to
STRIP_TYPE_NOPS (and it's reversion) and your suggestion above about
STRIP_SIGN_NOPS are red herrings.

These calls are responsible for eliminating the casts between
"char*", "const char*", "int*", "unsigned int*", "float*" etc.
In the C front-end, for example, these are all equivalent types,
typically all with mode Pmode and with (irrelevant?) signedness.

The NOPS that you are trying to strip are for the pointed to
type, i.e. that you strip a conversion for T1* to T2*, if T1
and T2 have the same mode, not that the pointers T1* and T2*
have the same mode.


If you really know of a missed optimization, rather than just
your suspicion above about the potential optimization opportunity
for signedness changes, then you can try something more aggressive
such as:

/* *&p => p */
if (TYPE_MODE (type) == TYPE_MODE (optype)
&& TYPE_MODE (type) != BLKmode)
  return fold_convert (type, op)

But this is only safe if the operand is known to be on the RHS
of an expression, and not used as an lvalue.  Unfortunately, testing
"maybe_lvalue_p (op)" here doesn't help, as op has to have been an
lvalue to be the argument of an ADDR_EXPR.  In gimplify.c, you might
be able to get away without the fold_convert, but I would like to
comment on what/when type conversions are useless in tree-ssa.

Hence, if there are any missed optimizations, for my original suggestion
these would have to be caught at gimplification (or in fold_stmt), with
a mode test like the one above, when the context of the indirect_ref
expression is known.


I hope this clears up the STRIP_NOPS confusion.

Roger
--



Re: Sine and Cosine Accuracy

2005-05-29 Thread Roger Sayle

On Thu, 26 May 2005, Scott Robert Ladd wrote:
> I prefer breaking out the hardware intrinsics from
> -funsafe-math-optimizations, such that people can compile to use their
> hardware *without* the other transformations implicit in the current
> collective.
>
> If someone can explain how this hurts anything, please let me know.


I apologise for coming into this argument late.  I'll admit that I
haven't even caught up on the entire thread, but an interesting
relevant article that may or may not have already been mentioned is:

http://web.archive.org/web/20040409144725/http://www.naturalbridge.com/floatingpoint/intelfp.html

[My bookmark 404'd, so I had to use the wayback machine to find it!]

The crux of the issue is that only two GCC targets have ever supported
trigonometic functions in hardware; the x87 coprocessor on IA-32 systems
and the 68881 co-processor on m68k systems.  Of these two, GCC builtin
support has only ever been added for the i386 backend and as mentioned
in the article above the FSIN and FCOS functions produce results well
outside the 1ulp allowed by the relevant standards, even for arguments
in the range [0...2PI].  As such, the reason why hardware support for
these intrinsics is considered part of flag_unsafe_math_optimizations,
is that, for some applications, they are exactly that, "unsafe".

Admittedly on many IA-32 systems there's little difference between
using FSIN vs calling the OS's libm's sin function, as glibc and
microsoft's runtimes (for example) themselves use the x87 intrinsics.
GCC, however, is not to know this and assumes that the user might
provide a high-precision library, such as Lefevre's perfect O.5ulp
implementation.  [It's nice to see him join this argument! :)]


In this instance, "unsafe" math need only be different.  For example,
if the compile-time evaluation, the run-time library and the hardware
intrinsic could potentially return different results, even if the
change is to return a more accurate result, then it is potentially
unsafe.  Tests such as:

double x = 2.0;
if (sin(x) != sin(2.0))
  abort();

and

double (*fptr)(double) = &sin;
if (sin(x) != fptr(x))
  abort();

should continue to work as expected.  This might also explain some
of your accuracy results.  Even if GCC can determine a way of
evaluating an expression that results in less loss of precision,
e.g. "(x + 1.0) - 1.0" -> "x", it can't do so unless allowed to
be "unsafe".

Sorry if all this has been mentioned before.  Your own results
show that you get different results using x87 hardware intrinsics,
and this alone classifies their use as "unsafe" in GCC terminology,
i.e. may potentially produce different results.

Roger
--



Re: ifcvt.c question

2005-05-30 Thread Roger Sayle

On Sun, 29 May 2005, Steven Bosscher wrote:
> I don't understand what lines 2728 to 2741 are for.  Could someone give
> an example of when we can have a then_bb that has no successors, but
> still ends in something that satisfies simplejump_p()?  What kind of
> strange indirect jump would that be?  And shouldn't the check just use
> computed_jump_p() if that is what it is looking for??

For the benefit of the list, I did some archeology to determine when
this functionality was written and how it evolved to be so broken now.

The mistake was a bug introduced by Michael Meisner back in 2000.
The functionality was originally introduced by Richard Henderson to
allow predication of then blocks that end in noreturn calls.

2000-05-31  Richard Henderson  <[EMAIL PROTECTED]>

* ifcvt.c (merge_if_block): Be prepared for JOIN to have no
remaining edges.
(find_if_block): Allow THEN with no outgoing edges.
* flow.c (merge_blocks_nomove): Remove a barrier not following
a jump as well.

posted: http://gcc.gnu.org/ml/gcc-patches/2000-05/msg01711.html


Unfortunately, this exposed a bug in the d30v port that caused a
regression in gcc.c-torture/execute/920428-1.c.  As I suggested on
IRC, this was a then-block that ended in a non-local goto, or more
precisely an indirect jump.

Michael tried to resolve this problem by excluding those blocks that
ended in indirect jumps.  His mistake was to assume that the way to
eliminate indirect jumps was to limit this code to simple jumps!
Doh!  This inadvertantly disabled RTH's optimization, and the
optimizations' code has been dead ever since.

2000-08-19  Michael Meissner  <[EMAIL PROTECTED]>

* ifcvt.c (find_if_block): Do not assume that a THEN block has any
instructions in it before checking for indirect jumps.

* ifcvt.c (find_if_block): Do not consider a THEN block that ends
in an indirect jump as a potential for conditional execution.

* d30v.h (d30v_init_expanders): Don't declare here.
* d30v-protos.h (d30v_init_expanders): Declare here with a valid
prototype.

posted: http://gcc.gnu.org/ml/gcc-patches/2000-08/msg00736.html


RTH should probably have added a testcase that checked whether this
code was being optimized, so we'd know when it regressed.  I also can't
immediately see why a block that ends in an indirect jump shouldn't
be considered for conditional execution, and he might well have been
papering over another problem in the compiler (esp. the D30V backend).

Of course now that support for D30V has been removed from CVS, it's
difficult to diagnose whether this really was just a backend problem,
and whether we should just return the code to a form similar to the
one RTH originally contributed.

I hope this answers your question.

Roger
--



Re: PR 23046. Folding predicates involving TYPE_MAX_VALUE/TYPE_MIN_VALUE

2005-08-05 Thread Roger Sayle

Hi Diego,

On Fri, 5 Aug 2005, Diego Novillo wrote:
> In PR 23046 we ICE inside tree-vrp.c because fold() does not
> realize that for
>
> enum enumtype { ENUM1, ENUM2 } x;
>
> the predicate 'if (x > 1)' is always false.  This causes VRP to
> create the impossible range [2, 1] for that predicate.
>
> While it would be trivial for VRP to paper over this problem, the
> real fix should be in fold().  I looked at the logic that detects
> these cases and it is fairly convoluted (fold-const.c:9174).
>
> I'm wondering why doesn't fold() just use TYPE_MAX_VALUE/TYPE_MIN_VALUE
> if they're available?

I'm trying to remember/find the conclusions of the many discussions
on the ranges of enumeral types, that we've had in the past.

One of the explanations describing the problems in this area was
at http://gcc.gnu.org/ml/gcc-patches/2004-05/msg02002.html
There was also a huge thread (famewar?) at the time over the issue
which began at http://gcc.gnu.org/ml/gcc/2004-08/msg01129.html

One of the fundamental problem/contentious issues was whether GCC
guarantees that enumeral variables are always truncated and sign
or zero extended when assigned from integers, or whether the optimizers
can take advantage of the of the fact that the stored representation
is always in its enumerated range.

A related discussion can also be found in the thread following
http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00968.html


Perhaps the issue may now be closer the resolution than in was last
time around.  But the uneasy "detente" is that the middle-end currently
treats enumerated types as having the widths of their machine modes.
Perhaps things have now changed significantly since the last time this
was discussed, I have no vested interest in one side vs. the other, other
than the semantics need to be consistent lest the same kinds of tree-ssa
bugs resurface again.

I hope this helps.  My personal advice, assuming that you're looking
for an easy life, is to place the seemingly unnessary consistency checks
in VRP.  Some of the cases discussed in the above threads might make
interesting tests for the VRP code.  I'll admit some of this "lore" should
be documented, but the issue has never been satisfactorily resolved to
everyone's satisfaction, so we keep with the less than idea "status quo".

Roger
--



Re: Question about merging two instructions.

2005-08-21 Thread Roger Sayle

On Sun, 21 Aug 2005, Leehod Baruch wrote:
> *** 329,334 
> --- 328,341 
>  GET_MODE (SUBREG_REG (x)),
>  SUBREG_BYTE (x));
>   return op0 ? op0 : x;
> +   }
> +   if (code == SET)
> +   {
> + op0 = simplify_replace_rtx (XEXP (x, 0), old_rtx, new_rtx);
> + op1 = simplify_replace_rtx (XEXP (x, 1), old_rtx, new_rtx);
> + if ((op0 == XEXP (x, 0)) && (op1 == XEXP (x, 1)))
> +   return x;
> + return gen_rtx_SET (VOIDmode, op0, op1);
> }
> break;
>
> This way, when the replacement succeeds and the result instruction
> is recognizable, I can eliminate the extension.
>
> Does it seem like a good change?
> Does anyone have a better solution?

This approach seems reasonable.  The current structure of the code
in simplify_replace_rtx is intended to handle RTL expressions rather
than patterns, so normally it would be passed just SET_SRC (pat),
instead of the whole set.

The reason why no-one has wanted/required simplify_replace_rtx to
work on SETs is that all current passes tend to be cautious about
changes to the LHS of an assignment.  Performing substitution and
simplifications in the RHS/SET_SRC always produces a semantically
equivalent instruction, but blindly replacing subexpressions in
the LHS/SET_DEST can significantly change its meaning.

Hence, when combine/GCSE/CSE/reload and other RTL optimization
passes modify the target/destination of an assignment, they only
perform the transformations they expect or can guarantee are safe
(building the SET themselves), but are happy to let simplify_replace_rtx
do it's thing on the source expression.


To summarise, the change above is not unreasonable and I'd be
happy to allow this change to simplify-rtx.c, but I'd be more
cautious about where and why it was used.  For example, if you're
sure nothing bad can happen in the LHS, it might be reasonable
to instead place this code in a simplify_replace_set() function.
As the SET (and perhaps support for PARALLEL) should only ever
occur at the top-level, which can then call the recursive
simplify_replace_rtx.

I hope this helps.

Roger
--



Re: Question about merging two instructions.

2005-08-22 Thread Roger Sayle

On Sun, 21 Aug 2005, Leehod Baruch wrote:

> >>(insn 1 0 2 0 (set (reg/v:Xmode r)
> >>(sign_extend:Xmode (op:Ymode (...
> >>(insn 2 1 3 0 (set (lhs) (rhs)))
>
> 1. Can you please give me an example of something bad that can happen to
> the LHS.  Maybe I'm missing something here.

(set (reg:Xmode r) (sign_extend:Xmode (reg:Ymode p)))
(set (subreg:Ymode (reg:Xmode r) 0) (reg:Ymode q))

would be transfomed (by replacing all uses of "reg r" with it's
definition on both LHS and RHS) into:

(set (reg:Ymode p) (reg:Ymode q))


Originally, r's high part would be set to the signbit of p and r's low
part would be set to the value of q.  After the transformation, we now
overwrite the operand p with the value q, which isn't quite the same
thing.


> 2. After calling simplify_replace_rtx I try to recognize the instruction.
> Is this been cautious or is it unnecessary?

Except for register-register moves, all synthesized instructions need to
be rerecognized, especially after "RTL simplification".


> 3. Isn't it reasonable to expect that every instance on old_rtx will be
> replaced by new_rtx even if it can't be simplified?
> This is what I understand from the function's documentation.
> But actually every expressions that can't be simplified is not replaced.

Every instance of old_rtx should be replaced by new_rtx.  You may be
getting confused by the code to reduce memory usage.  If a replacement
doesn't occur within all operands/subtrees of a tree, then return this
tree. The invariant in the recursion is that if a substitution has been
made anywhere in the tree, it returns a newly allocated RTX.
Simplification of this newly allocated RTX, will itself return a newly
allocated RTX.

Hence the testing whether the return value of simplify_replace_rtx
matches it's original first argument is a way of determining whether
any substitution has been made (whether it was subsequently simplified
or not).


The one caveat to this is that simplify_replace_rtx is less robust
to unrecognized RTL codes than replace_rtx.  i.e. it won't traverse
UNSPECs or other non-unary/non-binary/non-comparison expressions.
This can/should probably be fixed by tweaking the "default:" case
to match the GET_RTX_FORMAT loop in replace_rtx.  Note this isn't
a simple cut'n'paste, as replace_rtx destructively overwrites it's
input expression, whilst simplify_replace_rtx returns a different
tree if anything changed.

Roger
--



Re: Bug in builtin_floor optimization

2005-08-23 Thread Roger Sayle

On Mon, 22 Aug 2005, Dale Johannesen wrote:
> There is some clever code in convert_to_real that converts
>
> double d;
> (float)floor(d)
>
> to
>
> floorf((float)d)
> ...
>
> Comments? Should I preserve the buggy behavior with -ffast-math?

Good catch.  This is indeed a -ffast-math (or more precisely a
flag_unsafe_math_optimizations) transformation.  I'd prefer to
keep these transformations with -ffast-math, as Jan described them
as significantly helping SPEC's mesa when they were added.

My one comment is that we should try to make sure that we continue
to optimize the common safe case (even without -ffast-math):

float x, y;
x = floor(y);

i.e. that (float)floor((double)y) is the same as floorf(y).
Hmm, it might be good to know the relative merits of the safe vs.
unsafe variants.  If the majority of the benefit is from the "safe"
form, I wouldn't be opposed to removing the "unsafe" form completely,
if people think its an optimization "too far".

Thanks for investigating this.

Roger
--



Re: fold_build1 (NOP_EXPR, ...) vs. fold_build1 (CONVERT_EXPR, ...)

2005-12-01 Thread Roger Sayle

On Thu, 1 Dec 2005, Richard Guenther wrote:
> It looks like it is safe to exchange both of them (the first one for sure)
> to fold_convert (...) due to the fact that fold_unary handles NOP_EXPR
> the same way than CONVERT_EXPR apart from cases that look like oversights,
> ...
> In fact, I remember a plan of merging NOP_EXPR and CONVERT_EXPR.


Doh!  I follow gcc-patches more closely than the gcc list, so I saw
your post there first and replied without cross-posting.  For those
interested in the topic, my thoughts are at:
http://gcc.gnu.org/ml/gcc-patches/2005-12/msg00124.html

Roger
--



Re: Problem with gcc.c-torture/execute/960608-1.c on dataflow

2005-12-07 Thread Roger Sayle

Hi Dan,

On Wed, 7 Dec 2005, Daniel Berlin wrote:
> This is *already* wrong, AFAICT, because reg:QI 58 is uninitialized, and
> we are trying to use it's value.  Why do we do this?

This may have been a rhetorical question, but the middle-end's RTL
expanders are generating uses of uninitialized registers when writing
to bitfields of a structure that has been allocated to a register.

In your example, the middle-end has decided that the struct "flags"
should be placed in (reg:QI 58).  Then when the first statement
decides to initialize the single bit corresponding to field c to zero,
it uses the idiom of (set (reg:QI 58) (and:QI (reg:QI 58) (const_int)))

The possible fixes to this are for the middle-end to generate either
code to initialize bitfield-containing structures when they're stored
in registers, i.e. something like (set (reg:QI 58) (const_int 0)) at
the start of the function, or to emit some form of nop_set or clobber
to make it explicit that the contents of register QI:58 are initially
undefined but may be used before becoming defined.

Both of these should allow the middle-end's optimizers to do a better
job, for example, the and's and or's to set QI:58 to a specific value
should ideally be reduced to a single (set (reg:QI 58) (const_int foo)).

The potential concern with "initialization" is that this may result
in additional overhead, if the RTL optimizers can't determine that
an initialization is dead when the result is unused/cobbered.  I've
no idea whether we've another target-independent mechanism to inform
dataflow that a structure or some fields of it are used undefined.


This should explain why the middle-end is doing what it's doing.
As for how best to "fix" this behaviour, I'll leave to someone else.

Roger
--



[RFC/RFT] Should we hoist FP constants on x87?

2005-12-27 Thread Roger Sayle

One significant (philosophical) difference between the floating point
code generated by GCC vs that generated by commercial compilers for
IA-32, is the decision whether or not to hoist floating point constants
on the x87.  Or phrased equivalently, whether to allocate an x87 stack
register to hold compile-time FP constants over basic block boundaries.


Consider the following code:

double a[10];
double b[10];

void foo(int n)
{
  int i;
  for (i=0; i

* gcse.c (want_to_gcse_p): On STACK_REGS targets, look through
constant pool references to identify stack mode constants.
* loop.c (constant_pool_constant_p): New predicate to check
whether operand is a floating point constant in the pool.
(scan_loop): Avoid hoisting constants from the constant pool
on STACK_REGS targets.
(load_mems): Likewise.


Index: gcse.c
===
*** gcse.c  (revision 108834)
--- gcse.c  (working copy)
*** static basic_block current_bb;
*** 1184,1189 
--- 1184,1197 
  static int
  want_to_gcse_p (rtx x)
  {
+ #ifdef STACK_REGS
+   /* On register stack architectures, don't GCSE constants from the
+  constant pool, as the benefits are often swamped by the overhead
+  of shuffling the register stack between basic blocks.  */
+   if (IS_STACK_MODE (GET_MODE (x)))
+ x = avoid_constant_pool_reference (x);
+ #endif
+
switch (GET_CODE (x))
  {
  case REG:
Index: loop.c
===
*** loop.c  (revision 108834)
--- loop.c  (working copy)
*** find_regs_nested (rtx deps, rtx x)
*** 977,982 
--- 977,991 
return deps;
  }

+ /* Check whether this is a constant pool constant.  */
+ bool
+ constant_pool_constant_p (rtx x)
+ {
+   x = avoid_constant_pool_reference (x);
+   return GET_CODE (x) == CONST_DOUBLE;
+ }
+
+
  /* Optimize one loop described by LOOP.  */

  /* ??? Could also move memory writes out of loops if the destination address
*** scan_loop (struct loop *loop, int flags)
*** 1228,1233 
--- 1237,1248 
  if (GET_MODE_CLASS (GET_MODE (SET_DEST (set))) == MODE_CC
  && CONSTANT_P (src))
;
+ #ifdef STACK_REGS
+ /* Don't hoist constant pool constants into stack regs. */
+ else if (IS_STACK_MODE (GET_MODE (SET_SRC (set)))
+  && constant_pool_constant_p (SET_SRC (set)))
+   ;
+ #endif
  /* Don't try to optimize a register that was made
 by loop-optimization for an inner loop.
 We don't know its life-span, so we can't compute
*** load_mems (const struct loop *loop)
*** 10830,10835 
--- 10845,10857 
  && SCALAR_FLOAT_MODE_P (GET_MODE (mem)))
loop_info->mems[i].optimize = 0;

+ #ifdef STACK_REGS
+   /* Don't hoist constant pool constants into stack registers.  */
+   if (IS_STACK_MODE (GET_MODE (mem))
+   && constant_pool_constant_p (mem))
+   loop_info->mems[i].optimize = 0;
+ #endif
+
/* If this MEM is written to, we must be sure that there
 are no reads from another MEM that aliases this one.  */
if (loop_info->mems[i].optimize && written)


Roger
--



Re: Is fold_binary_expr being too picky here?

2006-01-12 Thread Roger Sayle

Hi Diego,

On Thu, 12 Jan 2006, Diego Novillo wrote:
> In fold_binary() we are asserting:
>
>   gcc_assert (IS_EXPR_CODE_CLASS (kind)
>   && TREE_CODE_LENGTH (code) == 2
>   && op0 != NULL_TREE
>   && op1 != NULL_TREE);
...
> DEFTREECODE (OMP_SINGLE, "omp_single", tcc_statement, 2)

Interesting!

If tcc_statement is the best tree_code_class for OMP_SINGLE, then I think
its reasonable for fold_binary to be more forgiving.  For example, the
least intrusive fix is to change the assert to:

  gcc_assert (IS_EXPR_CODE_CLASS (kind) && TREE_CODE_LENGTH (code) == 2);
  if (op0 == NULL_TREE || op1 == NULL_TREE)
return NULL_TREE;

If there are useful high level optimizations of OMP_SINGLE, that could
be applied by fold, then this issue needs a little more thought so that
fold_binary can be restructured based upon TREE_CODE_CLASS to avoid any
potential NULL pointer dereferences.

Anyone feel strongly for or against the above change?  I'd prefer not
to have to bloat the trees we use with non-NULL operands, just to work
around the sanity checks we have.  The types of error caught by this
assertion should be extremely rare.

Thoughts?

Roger
--



Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Roger Sayle

On Fri, 20 Jan 2006, Kenneth Zadeck wrote:
> I would like permission to revert Zdenek's patch for a few days.  There
> is nothing wrong with zdenek's patch, pe se, but it excercises a part of
> df that should work but does not.


I'm going to make an executive decision on this one, to restore bootstrap
immediately.  I agree that Zdenek's patch be reverted for a few days
without prejudice.  Both patch submitters followed the rules of bootstrap
and regression test, but a bad (unforetestable) interaction occurred
between the two.  It turns out that it was the Kenny's DF changes at
fault, that contain paths that weren't fully tested with his changes
alone.  Zdenek's patch is innocent, but reliant on this problematic
functionality.

Normally, we'd allow 48 hours of shouting before fixing the tree.
But I think we agree that in the case "the mid-air collision" needs
to be resolved by Kenny, and his request for a short-term work around
carries some weight.  With no shame on the killloop merge, I suspect we
should back it out, as it is mostly new functionality, whilst Kenny's
patch also contained bug-fixes, and is therefore probably the more stable
of the two bits when applied independently.

I hope this is satisfactory to anyone.  The judge's decision is final
modulo an over-ruling by Mark or the SC.  This gets the tree back to
bootstrap land, and allows the rest of GOMP to be merged etc... at a
busy point in the GCC lifecycle.

The clock is ticking for Kenny.  I propose a reverse 48 hour rule
where we reinstate Zdenek's patch on Monday, either by fixing DF
by then or by reverting the DF changes.  i.e. swap one of the clashing
patches for the other.

My apologies to everyone for any inconvenience.  Many thanks to Dan
and H-P for investigating the problem on IRC.

Roger
--
Roger Sayle, E-mail: [EMAIL PROTECTED]
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833



Re: Mainline bootstrap failure (revision 110017)

2006-01-20 Thread Roger Sayle

On Fri, 20 Jan 2006, Zdenek Dvorak wrote:
> I propose the following workaround instead, that also restores
> bootstrap.  It changes the way loop-iv uses df to more conservative one,
> that does not seem to cause problems.

That's what I like to see... options.  Yes, this is OK for mainline,
please hold off on the reversion (or if necessary reapply with this
change).

Many thanks,

Roger
--



Re: [RFC/RFT] PR/25890 and PR/25905

2006-01-24 Thread Roger Sayle

On Mon, 23 Jan 2006, Paolo Bonzini wrote:
> 2006-01-23  Paolo Bonzini  <[EMAIL PROTECTED]>
>
>   PR rtl-optimization/25890
>   PR rtl-optimization/25905
>   * combine.c (expand_compound_operation, expand_field_assignment):
>   Fail if the bitfield's final position is out of bounds.

This is OK for mainline.  Though we'd also do well to fix some of
the underlying code that's causing this suspicious RTL.

Thanks,

Roger
--



Inconsistency in ix86_binary_operator_ok?

2006-02-28 Thread Roger Sayle

I've a IA-32 backend question on the intended behaviour of the functions
ix86_binary_operator_ok and ix86_fixup_binary_operands.

My confusion is that these functions currently allow arithmetic
operations of the form "reg = op(mem,immed)" even though this
shape isn't supported by x86 ISA.  For example, the following
simple test code,

int foo(int x)
{
  return x | 4;
}

generates the following RTL in combine:

(insn 11 4 16 0 (parallel [
(set (reg:SI 60)
(ior:SI (mem/f:SI (reg/f:SI 16 argp) [2 x+0 S4 A32])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) 209 {*iorsi_1} (nil)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

which is then later fixed up by postreload.


My first question is whether this is intentional or an oversight?

The reason I ask/noticed this was that I'm working on a RTL patch to
canonicalize shifts vs. arithmetic operations, such that we always
perform the shift first.  This should be a big win on ARM, but will
also allow us to simplify code such as:

x ^= 8;
x >>= 2;
x ^= 1;

where by reordering the arithmetic operations relative to the
shifts, we can catch more simplifications in combine.

The catch is that currently i386.md allows combine to first
combine commutative arithmetic operations with memory, creating
reg = xor(mem,8); reg = rshift(reg,2), but then inconsistently
won't allow a memory as the first operand in the non-commutative
shift operation, i.e. we're not allowed to create reg = rshift(mem,2).

Clearly there's an inconsistency.  So the options include to either
allow reg = op(mem,immed) for non-commutative operators to also be
fixed by postreload, if the current behaviour is intentional and serves
some useful purpose, or alternatively to change ix86_binary_operator_ok
so that we only allow valid instructions at this point.

Many thanks in advance,

Roger
--



Re: Regression introduced by your change

2006-03-02 Thread Roger Sayle

On Thu, 2 Mar 2006, Jeffrey A Law wrote:
> Is causing 961206-1.c to regress at -O1 and -O2 on i686-pc-linux-gnu
> and possibly other platforms.

Doh!  This doesn't fail on x86_64-unknown-linux-gnu, where I developed
that patch, but I am now seeing those failures on i686-pc-linux-gnu.

> OUCH.  Instead of testing bit 31, we test bit 7.
> It seems to me that we need to emit the shift in the mode of
> the comparison's first operand.  Then convert the result of the
> shift into the desired mode via a subreg or something along
> those lines.
>
> Thoughts?

Sorry for the breakage.  I'll have a fix before the sun goes down,
that performs the shift in the correct mode, then appropriately
sign extends, zero extends or truncates if necessary.

Many thanks for analyzing this failure.  Sorry again.

Roger
--



Re: [RFC] removal of the convert callback

2006-03-28 Thread Roger Sayle

On Mon, 27 Mar 2006, [UTF-8] Rafael Esp??ndola wrote:

> I have put toghether a hack to remove the convert callback. It consists in
> 1) adding a lang prefix to the implementations of convert (cxx_, c_,
>gfc_, etc)
> 2) defining a convert2 function that is basically a copy of c_convert
> 3) converting the calls to convert to calls to the lang specif
>implementations or convert2

I've a number of patches to clean-up the use of "convert" in GCC
myself.  If you'll give me a week or so to submit/commit them to
the tree, it'll make your life much easier.  My patches aim to
remove all uses of "convert" from libbackend.a (simplifying the
interface between the front-end and the middle-end).  Once this
is done, your task of renaming the functions gets much simpler.

The trick is that not all of the "convert" functions behave
identically, which is why basing convert2 on c_convert fails
for g++, which really needs cp_convert.  Instead of a new
convert2, the middle-end should be using the existing fold_convert
which performs the common language-independent conversions.

Finally, I suspect that some front-ends, probably java, fortran
and treelang, only define "convert" for use by the middle-end,
which means once the middle-end is cleaned up these functions
can be removed from their respective front-ends entirely.

I hope this makes sense.  Sorry for not noticing your post
earlier.  I tend to read the gcc list less frequently than
gcc-patches.

Roger
--



[RFC] Ignore TREE_CONSTANT_OVERFLOW in integer_zerop

2006-04-01 Thread Roger Sayle

Following some of my recent middle-end clean-ups, I believe that
we're now at the point where incrementally the middle-end can
start ignoring the TREE_OVERFLOW bits on constant tree nodes.
As a step in this direction, the patch below removes the
TREE_CONSTANT_OVERFLOW checks from integer_zerop, integer_onep,
and friends in tree.c.  This patch bootstraps and regression
tests on i686-pc-linux-gnu (including Ada) with no new failures.

The major true user of TREE_CONSTANT_OVERFLOW is the C front-end,
which doesn't use any of these functions to determine whether it
should issue a diagnostic when an overflowed integer is used in
a context requiring a compile-time constant.

Over the years, this overflow testing in the middle-end has caused
numerous bugs, the most recent being last week's PR26859.  Removing
this test cleans up the semantics of integer constants a little.
In the unlikely event that any problems are discovered, by routines
relying on these functions testing checking for overflow, the trivial
fix is to rewrite the callers as:

if (integer_zerop (t)
&& ! TREE_CONSTANT_OVERFLOW (t))
  ...


There is now a stronger requirement on fold-const.c and friends that
it now be extra careful preserving TREE_CONSTANT_OVERFLOW for the
C front-end.  Optimizations such as "0 * x" -> "0" where we
test for zero using integer_zerop, have been checked to make sure
that we return the original zero, which the overflow bits set as
appropriate.

Once this patch is approved there's some follow-up clean-up that
can be done, removing the duplication in the many "local" functions
that test for zero but couldn't previously use integer_zerop due
to the historical overflow semantics.


Presumably everyone agrees that this evolution is a good thing.
The contention is whether everyone agrees that we're ready for
such a step.  Is such a transition safe for stage 3 mainline,
and/or would front-ends prefer some time to double check that
this won't cause problems on conformance tests not part of GCC's
testsuite.

Thoughts?



2006-04-01  Roger Sayle  <[EMAIL PROTECTED]>

* tree.c (integer_zerop): Ignore TREE_CONSTANT_OVERFLOW.
(integer_onep): Likewise.
(integer_all_onesp): Likewise.
(integer_pow2p): Likewise.
(integer_nonzerop): Likewise.
(real_zerop): Likewise.
(real_onep): Likewise.
(real_twop): Likewise.
(real_minus_onep): Likewise.
(int_size_in_bytes): Likewise.
(host_integerp): Likewise.


Index: tree.c
===
*** tree.c  (revision 112378)
--- tree.c  (working copy)
*** integer_zerop (tree expr)
*** 1209,1215 
STRIP_NOPS (expr);

return ((TREE_CODE (expr) == INTEGER_CST
-  && ! TREE_CONSTANT_OVERFLOW (expr)
   && TREE_INT_CST_LOW (expr) == 0
   && TREE_INT_CST_HIGH (expr) == 0)
  || (TREE_CODE (expr) == COMPLEX_CST
--- 1209,1214 
*** integer_onep (tree expr)
*** 1226,1232 
STRIP_NOPS (expr);

return ((TREE_CODE (expr) == INTEGER_CST
-  && ! TREE_CONSTANT_OVERFLOW (expr)
   && TREE_INT_CST_LOW (expr) == 1
   && TREE_INT_CST_HIGH (expr) == 0)
  || (TREE_CODE (expr) == COMPLEX_CST
--- 1225,1230 
*** integer_all_onesp (tree expr)
*** 1250,1257 
&& integer_zerop (TREE_IMAGPART (expr)))
  return 1;

!   else if (TREE_CODE (expr) != INTEGER_CST
!  || TREE_CONSTANT_OVERFLOW (expr))
  return 0;

uns = TYPE_UNSIGNED (TREE_TYPE (expr));
--- 1248,1254 
&& integer_zerop (TREE_IMAGPART (expr)))
  return 1;

!   else if (TREE_CODE (expr) != INTEGER_CST)
  return 0;

uns = TYPE_UNSIGNED (TREE_TYPE (expr));
*** integer_pow2p (tree expr)
*** 1303,1309 
&& integer_zerop (TREE_IMAGPART (expr)))
  return 1;

!   if (TREE_CODE (expr) != INTEGER_CST || TREE_CONSTANT_OVERFLOW (expr))
  return 0;

prec = (POINTER_TYPE_P (TREE_TYPE (expr))
--- 1300,1306 
&& integer_zerop (TREE_IMAGPART (expr)))
  return 1;

!   if (TREE_CODE (expr) != INTEGER_CST)
  return 0;

prec = (POINTER_TYPE_P (TREE_TYPE (expr))
*** integer_nonzerop (tree expr)
*** 1341,1347 
STRIP_NOPS (expr);

return ((TREE_CODE (expr) == INTEGER_CST
-  && ! TREE_CONSTANT_OVERFLOW (expr)
   && (TREE_INT_CST_LOW (expr) != 0
   || TREE_INT_CST_HIGH (expr) != 0))
  || (TREE_CODE (expr) == COMPLEX_CST
--- 1338,1343 
*** real_zerop (tree expr)
*** 1434,1440 
STRIP_NOPS (expr);

return ((TREE_CODE (expr) == REAL_CST
-  && ! TREE_CONSTANT_OVERFLOW (expr)
   && REAL_VALUES_EQUAL (

Re: [RFC] Ignore TREE_CONSTANT_OVERFLOW in integer_zerop

2006-04-02 Thread Roger Sayle

On Sun, 2 Apr 2006, Eric Botcazou wrote:
> > 2006-04-01  Roger Sayle  <[EMAIL PROTECTED]>
> >
> > * tree.c (integer_zerop): Ignore TREE_CONSTANT_OVERFLOW.
> >   [...]
> > (int_size_in_bytes): Likewise.
> > (host_integerp): Likewise.
>
> Is this an oversight?

Doh.  Indeed.  The last two lines of the ChangeLog should read

(int_size_in_bytes): Ignore TREE_OVERFLOW.
(host_integerp): Likewise.

As one aspect of the series I've been investigating is to unify these
two flags, I tend to think of them a synonymous.  The only difference
in semantics is that the C/C++ front-ends uses both to track whether
a diagnostic has been emitted.  It turns out that this conceptual
difference can be handled entirely in the C family front-ends, and
at the same time improve the diagnostic warnings that are emitted.

Indeed, a many of the uses of TREE_OVERFLOW or TREE_CONSTANT_OVERFLOW
currently test the wrong one, but the vast majority of writes are
to clear these flags to prevent them screwing things up.


Good catch.  Thanks.

Roger
--



Re: [RFH] negate_expr_p bug?

2006-04-03 Thread Roger Sayle

On Mon, 3 Apr 2006, Richard Guenther wrote:
> negate_expr_p currently contains
> ...
> where it looks bogus to simply return true for signed types but
> unset flag_trapv.
> ...
> which is bogus as it should read || (!flag_trapv && flag_wrapv) - no?
> I hit this with a patch to fold A +- CST to A -+ CST for negative CST,
> which tells me it is ok to negate -INT_MAX.

I suspect the problem is in the use of negate_expr_p.  This predicate
is designed for integer constants to check whether it is reasonable to
evaluate NEG_EXPR at compile-time.

Hence, fold will/can convert:

int foo()
{
  return -INT_MIN;
}

into "return INT_MIN;" during tree-ssa, provided that we don't rely
on trapping math.  i.e. we take advantage of the undefined behaviour
of signed types, with or without flag_wrapv.

The real question is whether your new transformation is using
negate_expr_p correctly.  It sounds as though you may be introducing
an overflow that wasn't there before, in which case you need to be
testing flag_wrapv && !flag_trapv in the calling clause.


negate_expr_p (t) is simply an efficient version of

temp = fold_build1 (NEGATE_EXPR, type, t);
return TREE_CODE (temp) != NEGATE_EXPR;

and its logic precisely mirrors the equivalent code in negate_expr.
Perhaps some of the logic in fold_negate_const is more appropriate:

temp = fold_negate_const (arg0, type);
return !TREE_OVERFLOW (temp);


I hope this helps.

Roger
--



Re: [RFH] negate_expr_p bug?

2006-04-03 Thread Roger Sayle

On Mon, 3 Apr 2006, Richard Guenther wrote:
>
>  || (TREE_CODE (type) == INTEGER_TYPE
>&& (TREE_CODE (arg1) == INTEGER_CST
>|| TYPE_UNSIGNED (type)
>|| (flag_wrapv && !flag_trapv)))
>
> Does this sound reasonable?

Yes, this sounds reasonable.  It was not the negate_expr_p call
that's causing your problems but the overly restrictive guard
on this transformation.  Let me know the results of a bootstrap
and regression test in case that points out something I've missed.

Roger
--



Re: Relying on precise integer calculation with double

2006-04-06 Thread Roger Sayle

On Thu, 6 Apr 2006, Daniel Bratell wrote:
> that it would be safe to use that flag unless I relied on extreme
> precision or behaviour related to NaN or +-Inf, and that normal simple
> arithmetics should still give the same result.

Unfortunately, with -ffast-math simple arithemtics no longer have
precisely the same result.  Converting the division into multiplication
by reciprocal sometimes results in numbers that can't precisely be
represented such as 1/7.  Of course, in this case you are relying on
"extreme precision" as you describe it above.  As pointed out by
Andrew the result is 0.4448884876874217298 which
should be close enough to one for most -ffast-math applications.

The real problem is the source line:

>if(diff_days/7.0 != (int)(diff_days/7.0))

Here you're using C's default FP->int conversion with is truncation
towards zero.  The fact that 0. is less than 1 means the result
is zero.  Instead, you should probably use "rint" or "round" or "lround"
to return the nearest integer.

Roger
--



Repository Write Access: A privilege or a right?

2006-04-09 Thread Roger Sayle

I've a quick question regarding the policy for granting/sponsoring
CVS/subversion write access.  There seems to have been a recent
change in thinking (or difference of opinions) on the pre-requisites
for allowing a new contributor modification rights to GCC's source
repository.

My understanding (which is admittedly mostly personal and historical)
is that someone is granted write access once they have demonstrated
an understanding of the rules and policies for submissions to GCC.
I'd submitted ten or twenty patches to GCC, before RTH offered to
sponsor my write access.  Admittedly this is a bit extreme, but the
rule of thumb that I currently use is to sponsor someones write access
once they've submitted a patch that doesn't require changes (correct,
following GNU-style, documentation changes, new testcase(s), with
a ChangeLog entry, properly tested listing target triples).

There does however appear to be another school of thought that
suggests that write access should be immediate, following the
acceptance of FSF copyright assignments, possibly implicitly by
becoming a employee of a "blanket-assignment" company.

In the interests of consistency I was wondering whether there was
any official advice or guidelines on this aspect of GCC protocol?


Increasingly, there have been examples of patches getting checked-in
without maintainer approval, or without posting to gcc-patches, etc...
I'd assumed that one reason for the right-of-passage of GCC CVS
access was that no-one would be able to delete/corrupt the GCC
repository without first having read contribute.html.  One reason
for requiring a "sponsor" is that a maintainer is required to
vouch for the behaviour/apptitute of a new "Write After Approval"
contributor.  And in theory breakage of the regulations should
reflect not only on the violator but also of his/her sponsor's
judgement.  I also suspect that the ultimate recourse of the
SC is to revoke someone's write access.


Unfortunately, this aspect of GCC protocol isn't clearly documented,
and I base my current actions on experience of legacy policies.
Admittedly, it'd be easier to relax "admission requirements" and
thereby reduce the number of commits that have to be made on behalf
of those without access.  Perhaps the effectiveness of the "three
stage system" at stabilizing the tree has reduced the need for the
earlier methods used to ensure/improve the quality of contributions.
There's also the new/recent complication that companies/groups now
maintain their own GCC branches, perhaps requiring write access even
for programmers who don't contribute to FSF GCC.


What do the other maintainers, or the SC, think?

Roger
--



Re: Status of SEE and Autovectorization patches?

2006-05-03 Thread Roger Sayle

Hi Mark,

On Tue, 2 May 2006, Mark Mitchell wrote:
> Roger, I know that you reviewed the SEE patches.  Is there anything
> more than needs to be done to get them committed, in your view?

As far as I'm aware, we're still just waiting for the Haifa folks to
commit them to mainline.  There are a number of harmless code generation
changes on some platforms, but as far as we know no known performance
regressions.  And the Intel folks are keen to submit follow-up patches
for x86_64 to extend/enhance this framework once its in the tree.
And importantly there's a single command-line option/variable that
can enable/disable this pass should there be any adverse events.

However, my growing concern at the moment is the response time of
the Haifa folks.  If it's going to go into the tree in stage 3, I'd
feel more comfortable that the patch submitters/contributors looked
like they could respond to queries/issues with 24 or 48 hours.  The
current prolonged silence/inaction doesn't inspire great confidence.

Perhaps you could make the executive decision, that if it isn't in
the tree by June (or some date of your choosing), that it'll just
have to wait for 4.3?  My apologies for taking so long to review
these changes, and I and the GCC maintainership should take some of
the responsibility for the delays and current inconvenience.


It looks from the e-mail addresses that possibly the primary developer
of SEE, Leehod, is no longer at IBM but has returned to academia, with
Mircea pinging the patches.  This could explain things, but is pure
speculation.  Maybe someone could volunteer to update the posted patches
for mainline and make the minor style corrections requested in the final
review/approval?

I've Cc'd Leehod and Mircea on this e-mail, so perhaps they'll be
able to better explain the current status themselves.

Roger
--



Re: Status of SEE and Autovectorization patches?

2006-05-05 Thread Roger Sayle

Hi Mircea,

On Fri, 5 May 2006, Mircea Namolaru wrote:
> > That certainly does suggest a bug in the SEE patches.  They needn't do
> > anything useful on IA32/AMD64, but they should presumably either (a) not
> > cause a bootstrap failure on these architectures, or (b) be disabled on
> > these architectures.
>
> Agree. I will check the bootstrapping on x86. (a) seems preferable but
> if not feasible in a short time frame, it will be (b).

Given that this is more than a bootstrap problem with non-default flags,
but testsuite regressions for gfortran and SPEC failures on a primary
platform, I think this falls under GCC's 48 hour rule.  This simply
formalizes your phrase "short time frame" above, and means that it you're
unlikely to come up with a solution to these problems in the next day
or two, that you/we should simply disable -fsee from being turned on by
default at -O3.

I appreciate your efforts to actually correct the defficiencies in SEE,
which is indeed preferable, but for regression breakage in stage3, its
often better to simply band-aid the problem as quickly as possible, even
if you're close to a fix, as a courtesy to other developers.

Roger
--



Re: Revert patch for MIPS TImode functions for 4.1.1

2006-05-19 Thread Roger Sayle

Hi Mark and Richard,

On Fri, 19 May 2006, Mark Mitchell wrote:
> Roger, would you please revert your MIPS MIN_UNITS_PER_WORD change
> for MIPS on the GCC 4.1 branch?
>
> (My brain failed to digest the fact that the patch was on 4.1 as well as
> on mainline, perhaps in part because there doesn't seem to be a PR;
> Richard indicated to me that he would locate or open one now.)

Please can you explicitly confirm that we want to reintroduce PR
target/22209 as a regression from gcc 4.1.0, and break the build
of libgfortran on all 64-bit MIPS platforms, including IRIX?

The MIPS backend still claims that TImode is scalar_mode_supported_p,
and it's therefore possible to write simple C and C++ testcases that
will regress with this change.  I know my testing of such a reversion
will reintroduce at least 4800 new failures in the dejagnu testsuite,
for no reported improvement vs gcc 4.1.0.  It's a lesser of two evils
issue, and it's only further development on mainline, and not a user
reported problem, that's revealed the potential fallout in 4.1.0.

The tradeoff is between unlinkable code in the default build on the
affected platforms, vs. unlinkable code when explicitly using
the -msoft-float flag, which isn't tested by the testsuite.

I'm happy to revert the changes at a moment's notice, especially if
that's what the realease manager and MIPS maintainers want, but I
just want to be certain that everyone's aware of the tradeoffs, so
that whatever the outcome, it's not my fault.

I'll start a bootstrap and regression test of the reversion against
the 4.1 branch straight away, it's not clear if recent backports have
introduced additional dependencies, but these machines aren't the
fastest on the planet, and it's still behind on evaluating the recent
set of mainline patches/changes.

Roger
--



Re: Revert patch for MIPS TImode functions for 4.1.1

2006-05-19 Thread Roger Sayle

Hi Richard,

On Fri, 19 May 2006, Richard Sandiford wrote:
> Mark Mitchell <[EMAIL PROTECTED]> writes:
> > (My brain failed to digest the fact that the patch was on 4.1 as well as
> > on mainline, perhaps in part because there doesn't seem to be a PR;
> > Richard indicated to me that he would locate or open one now.)
>
> Opened as 27681.  (And Roger: sorry for all the hassle this patch has
> caused you.  A good deed and all that...)

Indeed, no good deed ever goes unpunished.  In fact, isn't it the MIPS
backend's use of the GOFAST libraries which is one of the major blockers
of adding -msoft-float tests to the testsuite?  :-)  My very strong
recommendation is to add a "target-soft-float" predicate to the dejgnu
testsuite's "require" predicates, that'll allow us to test -msoft-float
functionality on most platforms, by explicitly disabling the handful of
problematic non-libgcc soft FP targets.  This will both prevent this sort
of regression in the future, and actually allow us to (automatedly)
confirm that you patch has improved things.

The MIN_UNITS_PER_WORD trick is used on multiple platforms, including
PowerPC, IA-64 and s390, so without such tests we've no way of knowing
that your recent fix isn't also required on them in addition to MIPS.
See, for example, http://gcc.gnu.org/ml/gcc/2003-03/msg01365.html
which was the Ulrich's s390 change that inspired my problematic fix
for mips.h.

Roger
--



Re: Revert patch for MIPS TImode functions for 4.1.1

2006-05-19 Thread Roger Sayle

On Fri, 19 May 2006, Mark Mitchell wrote:
> > No, you can invoke it via using the attribute mode(TI)
> Sure, but I'm not worried about that case.

That would be the only class of C or C++ failures that I could easily
construct by hand.  Although the RTL optimizers will introduce TImode
moves and temporary registers, it looks like they won't introduce any
arithmetic/logical operations without explicit pattern support from
the backend.

Hence, it's only things like configure scripts that identify the
presence of __int128_t and __uint128_t, and then explicitly use
128-bit operations in their source code that will be affected.
This is the source of the libgfortran failure, where the runtime
library detects these types are available, and uses them.  It's
not a problem in the gfortran compiler itself, which treats
INTEGER*8, exactly the same way the C/C++ treat __int128_t.

If libstdc++-3's configure checked for __int128_t and provided
a specialized STL instantiation, it would exhibit the same issue.

Roger
--



Re: Revert patch for MIPS TImode functions for 4.1.1

2006-05-21 Thread Roger Sayle

On Fri, 19 May 2006, Mark Mitchell wrote:
> I'm evaluating the options.  It would be helpful if someone has time to
> apply and test Richard's patch on 4.1, as that would let us know whether
> that option is viable as well.

I've bootstrapped and regression tested a backport of Richard's patch
against the gcc-4_1-branch on mips-sgi-irix6.5 with no new failures.
His mainline change also has no testsuite problems on the same machine.

The patch applies cleanly, with the exception of some mklibgcc.in
hunks, due to the fact that the _floatun* symbols were added to 4.2
and aren't available in 4.1.x libgcc, and that the LIB2FUNCS_EXCLUDE
functionality isn't on the branch.  For the record, the final
mklibgcc.in changes that I tested are attached to this e-mail.

I hope this helps.

Roger
--
Index: mklibgcc.in
===
*** mklibgcc.in (revision 113936)
--- mklibgcc.in (working copy)
***
*** 17,22 
--- 17,23 
  # LIB2ADDEHSTATIC
  # LIB2ADDEHSHARED
  # LIB2ADDEHDEP
+ # LIB2LITERAL_CONVERSION_MODES
  # LIBUNWIND
  # LIBUNWINDDEP
  # SHLIBUNWIND_LINK
*** echo 'dirs = libgcc'
*** 54,63 
  echo
  
  # Library members defined in libgcc2.c.
  lib2funcs='_muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3
!   _cmpdi2 _ucmpdi2 _floatdidf _floatdisf _fixunsdfsi _fixunssfsi
!   _fixunsdfdi _fixdfdi _fixunssfdi _fixsfdi _fixxfdi _fixunsxfdi
!   _floatdixf _fixunsxfsi _fixtfdi _fixunstfdi _floatditf _clear_cache
_enable_execute_stack _trampoline __main _absvsi2 _absvdi2 _addvsi3
_addvdi3 _subvsi3 _subvdi3 _mulvsi3 _mulvdi3 _negvsi2 _negvdi2 _ctors
_ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 _ctzsi2 _ctzdi2 _popcount_tab
--- 55,81 
  echo
  
  # Library members defined in libgcc2.c.
+ 
+ # The floating-point conversion routines that involve a single-word integer.
+ # XX stands for the integer mode.
+ swfloatfuncs=
+ for mode in sf df xf; do
+   swfloatfuncs="$swfloatfuncs _fixuns${mode}XX"
+ done
+ 
+ # Likewise double-word routines.
+ dwfloatfuncs=
+ for mode in sf df xf tf; do
+   dwfloatfuncs="$dwfloatfuncs _fix${mode}XX _fixuns${mode}XX"
+   dwfloatfuncs="$dwfloatfuncs _floatXX${mode}"
+ done
+ 
+ # Entries of the form :: indicate that libgcc2.c
+ # should be compiled with L defined and with LIBGCC2_UNITS_PER_WORD
+ # set to .   is the name of the associated object file
+ 
  lib2funcs='_muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3
!   _cmpdi2 _ucmpdi2 _floatdidf _clear_cache
_enable_execute_stack _trampoline __main _absvsi2 _absvdi2 _addvsi3
_addvdi3 _subvsi3 _subvdi3 _mulvsi3 _mulvdi3 _negvsi2 _negvdi2 _ctors
_ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 _ctzsi2 _ctzdi2 _popcount_tab
*** lib2funcs='_muldi3 _negdi2 _lshrdi3 _ash
*** 65,70 
--- 83,103 
_powixf2 _powitf2 _mulsc3 _muldc3 _mulxc3 _multc3 _divsc3 _divdc3
_divxc3 _divtc3'
  
+ if [ "$LIB2LITERAL_CONVERSION_MODES" ]; then
+   for func in $swfloatfuncs; do
+ sifunc=`echo $func | sed -e 's/XX/si/'`
+ lib2funcs="$lib2funcs $sifunc:$sifunc:4"
+   done
+   for func in $dwfloatfuncs; do
+ difunc=`echo $func | sed -e 's/XX/di/'`
+ tifunc=`echo $func | sed -e 's/XX/ti/'`
+ lib2funcs="$lib2funcs $difunc:$difunc:4 $tifunc:$difunc:8"
+   done
+ else
+   lib2funcs="$lib2funcs `echo $swfloatfuncs | sed -e 's/XX/si/g'`"
+   lib2funcs="$lib2funcs `echo $dwfloatfuncs | sed -e 's/XX/di/g'`"
+ fi
+ 
  # Disable SHLIB_LINK if shared libgcc not enabled.
  if [ "@enable_shared@" = "no" ]; then
SHLIB_LINK=""
*** fi
*** 145,152 
  # Remove any objects from lib2funcs and LIB2_DIVMOD_FUNCS that are
  # defined as optimized assembly code in LIB1ASMFUNCS.
  for name in $LIB1ASMFUNCS; do
!   lib2funcs=`echo $lib2funcs | sed -e 's/^'$name' //' \
!  -e 's/ '$name' / /' \
   -e 's/ '$name'$//'`
LIB2_DIVMOD_FUNCS=`echo $LIB2_DIVMOD_FUNCS | sed -e 's/^'$name' //' \
   -e 's/ '$name' / /' \
--- 178,185 
  # Remove any objects from lib2funcs and LIB2_DIVMOD_FUNCS that are
  # defined as optimized assembly code in LIB1ASMFUNCS.
  for name in $LIB1ASMFUNCS; do
!   lib2funcs=`echo $lib2funcs | sed -e 's/^'$name'[ :]//' \
!  -e 's/ '$name'[ :]/ /' \
   -e 's/ '$name'$//'`
LIB2_DIVMOD_FUNCS=`echo $LIB2_DIVMOD_FUNCS | sed -e 's/^'$name' //' \
   -e 's/ '$name' / /' \
*** for ml in $MULTILIBS; do
*** 248,263 
#
  
for name in $lib2funcs; do
  if [ "$libgcc_s_so" ]; then
out="libgcc/${dir}/${name}${objext}"
outS="libgcc/${dir}/${name}_s${objext}"
  
echo $outS: $libgcc2_c_dep
!   echo "  $gcc_s_compile" $flags -DL$name -c '$(srcdir)/libgcc2.c' \
-o $outS
  
echo $out: $libgcc2_c_dep
!   echo 

Re: bootstrap of trunk fails for x86-64

2006-06-29 Thread Roger Sayle

On Thu, 29 Jun 2006, Andreas Jaeger wrote:
> Current svn does not build, it fails for me with:
> build/genpreds: Internal error: RTL check: expected elt 0 type 'e' or 'u',
> have 's' (rtx match_code) in write_match_code_switch, at genpreds.c:546
>
> Roger, is this is a result of your changes?

Grr!  Sorry again for the second "typo" breakage in what should have been
a very simple patch.  Although I can't reproduce the above failure on my
x86_64 box (or any of the other systems I tested this patch on), the RTL
check is diagnosing a real problem that I'm sorry I zoned on myself.

Fixed with the following patch.  Committed to mainline as obvious as
revision 115076, after a full 3-stage bootstrap of the gcc/ directory
on x86_64-unknown-linux-gnu.

My apologies yet again for the breakage, and especially to Mark.



2006-06-29  Roger Sayle  <[EMAIL PROTECTED]>

* genpreds.c (write_match_code_switch): Correctly use XSTR instead
of XEXP to extract the operands of a MATCH_CODE rtx.


Index: genpreds.c
===
*** genpreds.c  (revision 115074)
--- genpreds.c  (working copy)
*** write_predicate_expr (rtx exp)
*** 543,550 
  static void
  write_match_code_switch (rtx exp)
  {
!   const char *codes = (const char *) XEXP (exp, 0);
!   const char *path = (const char *) XEXP (exp, 1);
const char *code;

fputs ("  switch (GET_CODE (", stdout);
--- 543,550 
  static void
  write_match_code_switch (rtx exp)
  {
!   const char *codes = XSTR (exp, 0);
!   const char *path = XSTR (exp, 1);
const char *code;

fputs ("  switch (GET_CODE (", stdout);


Roger
--



Re: Splay Tree

2006-07-09 Thread Roger Sayle

Hi Ian,

On Sun, 9 Jul 2006, Ian Blanes wrote:
> I've been recently looking at the splay tree implementation. I've noticed
> that this revision http://gcc.gnu.org/viewcvs?view=rev&revision=106584
> does a strange splaying. It does a bottom up splaying but starting at the
> root (not the same that top down). It works but performs worst.

Interesting.  Richard Guenther's post for the above change, which was
based upon a previous proposal by Brian Makin, suggested that this
patch reduced the compile-time of his tramp3d benchmark by 1%.
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00294.html

I appreciate that you've also benchmarked "typical" splay tree usage
during a GCC boostrap.  Any chance, that you could repeat your experiment
but instead evaluate the behaviour of splay tree usage during a tramp3d
compilation:  http://www.suse.de/~rguenther/tramp3d/

Whilst your analysis demonstrates that splay_tree_splay is behaving
curiously, I'm also interested in resolving the reported performance
benefits of the different implementations.

Thanks in advance,

Roger
--



Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()

2006-08-02 Thread Roger Sayle

Hi Denis,

On Mon, 31 Jul 2006, Jim Wilson wrote:
> At the moment, there is probably no one who understands this code as
> well as you do, so you may not get much help from others.

In my defence, I'm struggling to get up to speed with all of the
issues.  The first and obvious comments are that patches should be
submitted to [EMAIL PROTECTED] rather than gcc@gcc.gnu.org,
we're currently in stage 3, which is a code freeze for regression
fixes, and therefore missed optimization improvements such as yours
can't be committed until gcc 4.2 has branched, which means reviewers
tend not to review complex patches until then, and finally you really
should take a look at GCC's coding standards documentation as
the formatting of your submission is just all wrong.

In the meantime, however, I have been running your example code,
and it does indeed find better correct multipliers than the algorithm
currently used by GCC.

As mentioned by Jim, the current GCC algorithm is based upon the
paper by Torbjorn Granlund and Peter Montgomery, "Division by
Invariant Integers using Multiplication", PLDI-94.
http://swox.se/~tg/divcnst-pldi94.pdf

Another excellent reference describing division by integer constant
algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley
publishers, 2003.

The strange thing is that all three of GCC, the paper and the book
agree on the a preferred multiplier, which differs from the one
selected by your code.  Following through both sets of correctness
proofs is tedious, but it looks like Granlund's work attempts to
find multipliers whose residual error is less than 1/n, which is
sufficient by not necessary.  Your code relies on round to zero
semantics of truncating division, so that provided the residual is
less than 1, we'll always truncate to the correct result.

One issue is that Grunlund's derivation starts with signed division
and is then extended/tweaked to handle unsigned division.  Your
work on the other hand, starts its derivation with unsigned division.


So currently I'm trying to find the catch.  For which divisions
are 33-bit operations required.  Your proof looks reasonable and
the code does the correct thing for the divisions in your examples,
though the coding style issues prevent it being used in the compiler
without being cleaned up.  There's also a return clause where it gives
up after failing to find a suitable multiplier, where falling back
to the exisiting code might be a better compromise.  I've not played
with your algorithm enough to determine how often it triggers.


Congratulations for the impressive work!  You'll forgive me if it
takes a while to understand why your method is better than previous
works, i.e. what it is they overlooked (or that you've overlooked).
To be honest all this discrete math makes my head hurt.  Indeed,
when its appreciated that its not the smallest multiplier that we'd
prefer, but the cheapest (using shifts/adds etc...) and that
choose_multiplier is potentially intertwined with synth_mult, the
world starts spinning and I need to reach for the headache tablets.

Roger
--



RE: Issue generating GCC coverage report since r14-1625-geba3565ce6d766

2023-06-16 Thread Roger Sayle


Hi Martin,
It's great to hear from you.  My apologies for the inconvenience.
I believe that the problem has been solved by Jakub's patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;f=gcc/config/i386/i386.md;h=43a3252c42af12ad90082e4088ea58eecd0bf582

I strongly suspect that the problem was that my patch was emitting "(const_int 
0)" as
an instruction into the RTL stream, which I'd misunderstood to be recognized as 
a
no-op by the middle-end.  This isn't the case and the correct idiom is to 
(also) use:
emit_note (NOTE_INSN_DELETED); DONE;

I can easily believe that this unintended behaviour is/was interfering with 
your code
coverage scripts (I should study your posted results).

I hope this explains things.  Please let me know if things really are not fixed 
(or not).
Cheers,
Roger
--

> -Original Message-
> From: Martin Jambor 
> Sent: 16 June 2023 13:51
> To: GCC Mailing List 
> Cc: Roger Sayle 
> Subject: Issue generating GCC coverage report since r14-1625-geba3565ce6d766
> 
> Hello,
> 
> we try to build coverage info for GCC for our testsuite and upload it to
> https://gcc.opensuse.org/gcc-lcov/ every weekend.  But since patch
> r14-1625-geba3565ce6d766 (Add support for stc and cmc instructions in
> i386.md) the generation broke down.  However, I don't think there is something
> necessarily wrong with that particular commit, at least I don't see anything
> suspicious.
> 
> I inherited the generating script from Martin Liška and have not really looked
> much into it much, but it simply does the following after a fresh GCC master
> checkout (I added the --disable-multilib and reduced the number of languages 
> to
> reproduce this more quickly):
> 
> 
>   ../src/configure --prefix=/home/mjambor/gcc/mine/inst --enable-
> languages=c,c++ --disable-bootstrap --enable-host-shared --enable-coverage=opt
> --disable-multilib
>   make -j64 && make -j64 -k check
>   find gcc/testsuite/ -name '*.gcda' -exec rm -rvf {} \;  # I don't know why 
> the
> script does this
>   lcov -d . --capture --output-file gcc.info
> 
> 
> and this last step, since the commit, when processing file 
> ./gcc/insn-attrtab.gcda
> fails with error:
> 
>   geninfo: ERROR: mismatched end line for _Z12get_attr_isaP8rtx_insn at
> /home/mjambor/gcc/mine/src/gcc/config/i386/i386.md:5776: 5776 -> 8963
>   (use "geninfo --ignore-errors mismatch ..." to bypass this error)
> 
> I tried looking briefly into the perl sources of lcov and geninfo but I am 
> afraid I
> don't have the necessary knowledge of the language and the tool or the time to
> properly debug this.  So I am inclined to simply add --ignore-errors mismatch 
> to
> lcov options, which avoids the issue, and be done with it.  Nevertheless, I 
> thought
> I'd mention this here in case anyone here has any ideas what can be going 
> wrong.
> 
> Thanks,
> 
> Martin



Re: [RFC] fold Reorganization Plan

2005-02-12 Thread Roger Sayle

On Sat, 12 Feb 2005, Nathan Sidwell wrote:
> I question if it is better to fold early.  As I've said before, I think
> the optimizations that fold performs should be turned into a proper SSA
> optimization phase% -- that can be repeatedly applied as necessary.

As for a proper tree-ssa optimization pass, I believe Andrew Pinski's
proposed tree-combiner pass which attempts to merge/combine consecutive
tree statements and check whether the result can be folded, would fit
your description.


However, the utility of early fold to the GCC compiler is much greater
than simply compile-time evaluating expressions with constant operands.
One of the reasons that fold is so fast, is that it can rely on the fact
that all of a trees operands have already been folded.  In fact, much
of the middle-end makes or can make use of the assumptions that constant
operands to binary operators appear second, that a NOP_EXPRs only appear
where needed, that NEGATE_EXPR never wraps another NEGATE_EXPR and that
"x+x" and "2*x" are represented the same way.

For example, left to it's own devices a front-end would produce a
potentially unbounded tree for "!!...x".  Whilst, it
might be desirable to reproduce such ugliness in source code analyis
tools, the fact that GCC currently only has to work minimal trees
(compact by construction) both improves compile-time and memory usage.

Another example is the some front-ends (cough, g++, cough) love calling
save_expr at any opportunity, even on constant integers and string
literals.  If it wasn't for the middle-end/fold only creating such
tree nodes as necessary, many analyses/transformations, would be
significantly complicated.


As several front-end people have suggested, calling fold whilst
constructing parse trees shouldn't be necessary (as shown by the
shining examples of g77 and GNAT).  In reality, many of the
transformations performed by fold (most? as I expect expressions with
constant operands are actually fairly rare at the source level)
are purely to tidy up the inefficiencies or incorrect tree representations
constructed by the front-ends.


If it isn't clear from the name fold_buildN, these interfaces are
designed to apply the invariants expected when building trees.  In
the further flung future, the buildN interfaces may become deprecated
and even further still trees might be marked read-only (i.e. const)
outside of the middle-end (allowing static trees and more middle-end
controlled tree sharing).  Bwa-ha-ha-ha-ha (evil laugh)!

Roger
--



Re: [RFC] fold Reorganization Plan

2005-02-15 Thread Roger Sayle

On Tue, 15 Feb 2005, Andrew Haley wrote:
> There's one other thing to bear in mind when considering the way that
> fold relates to language front ends.  We've seen problems in Java
> where fold() returned tree nodes that Java didn't recognize: one time,
> for example, it returned a BITFIELD_EXPR.

I see the current java front-end's inability to lower standard tree
nodes in to JVM bytecodes as a failing of the gcj implementation
rather an intrinsic problem in the constant folder.  The operator
BIT_FIELD_REF, for example, can be seen as a canonical representation
of a shift-and-mask or a mask-and-shift operation.  Hence, should
the tree-ssa analysis passes prefer to see "(x & 12) >> 2" or
"(x >> 2) & 3" as BIT_FIELD_REF (x, 2, 2), it should be a relatively
trivial task for jcf-write.c to convert BIT_FIELD_REF (x, 2, 2) back
into an ishr/iand sequence.

Exactly, the same problems were claimed with jcf-write's inability
to handle ABS_EXPR, MIN_EXPR, MAX_EXPR, NOP_EXPR, RSHIFT_EXPR, etc...

The design decision made by gcj was that rather than to provide a
JVM back-end target, was to instead side-step GCC's RTL expansion
and perform the lowering to Java bytecodes itself.  As such, any
RTL expansion functionality that failed to get implemented in gcj
would appear to be a short coming of the java front-end, i.e. a
sorry(), rather than a bug or problem in fold or the middle-end.


My understanding is that its this failure of the java folks to finish
implementing an expand_expr replacement for JVM bytecodes thats the
primary motivating factor for the new "gcjx" front-end :)

Roger
--



Re: GCC 4.3.0 Status Report (2007-09-04)

2007-09-09 Thread Roger Sayle



This is an optimization pass which leads to dramatically better code on



at least one SPEC benchmark.  Ian, Roger, Diego, would one of you care
to review this?


My concern is that as formulated, conditional store elimination is not 
always a win.


Transforming

   if (cond)
 *p = x;

into

 tmp = *p;
 if (cond)
   tmp = x;
 *p = tmp;

on it's own, effectively transforms a conditional write to memory into an 
unconditional write to memory.
On many platforms, even x86, this a pessimization.  For example, the "Intel 
Architecture Optimization Manual", available at 
ftp://download.intel.com/design/PentiumII/manuals/24281603.PDF in section 
3.5.5 "Write Allocation Effects", actually recommends the inverse 
transformation.  On page 3-21 they show how the "Sieve of Erastothenes" 
benchmark can be sped up on Pentium class processors by transforming the 
line


   array[j] = 0;

into the equivalent

   if (array[j] != 0)
 array[j] = 0;

i.e. by introducing conditional stores.


The significant observation with Michael Matz's extremely impressive 26% 
improvement on 456.hmmer is the interaction between this transformation with 
other passes, that allow the conditional store to be hoisted out of a 
critical loop.  By reading the value into a "tmp" before the loop, 
conditionally storing to the register tmp in the loop, then unconditionally 
writing the result back afterwards, we dramatically reduce the number of 
memory writes, rather than increase them as when this transformation is 
applied in isolation.



I think the correct fix is not to apply this transformation everywhere, but 
to correctly identify those loop cases where it helps and perform the loop 
transformation there.  i.e. conditional induction variable identification, 
hoisting and sinking needs to be improved instead of pessimizing code to a 
simpler form that allows our existing flawed passes to trigger.



I do very much like the loop-restricted version of this transformation, and 
it's impressive impact of HMMR (whose author Sean Eddy is a good friend). 
Perhaps Mark might give revised versions of this patch special dispensation 
to be applied in stage 3.  I'd not expect any correctness issues/bugs, just 
performance trade-offs that need to be investigated.  Perhaps we should even 
apply this patch as is during stage 2, and allow the potential non-loop 
performance degradations to be addressed as follow-up patches and therefore 
regression fixes suitable for stage 3?


Congratulations again to Michael for this impressive performance 
improvement.


Roger
--
Roger Sayle, Ph.D.
OpenEye Scientific Software,
Suite #D, 9 Bisbee Court,
Santa Fe, New Mexico, 87508.