On Wed, 25 Oct 2006, Richard Sandiford wrote:
> Roger Sayle <[EMAIL PROTECTED]> writes:
> > Once explained, I'd expect most maintainers would make precisely the
> > same call?
>
> I suppose the counter-argument is that we shouldn't ship 4.2 in its
> current state.  We should either back out the patch that made
> REG_POINTER more prominent or go with the fix for this PR.

I repeat that its not an issue of does it get fixed or not, but of
which is the more suitable fix.

I think it's very safe to assume that we have at least a few days to
evaluate our options on mainline rather than make a rush decision.
With 108 serious regressions including 12 P1s, of course we shouldn't
ship 4.2 in its current state.  We've not yet had a release candidate,
or timeline for release.  I've absolutely no intention of not selecting
which fix to backport to the release branch long before the release's
due date.

My counter-counter argument would be that it's the rush to backport
changes to the release branch that has caused this problem.  If the
PR 28690 fix had had longer on mainline, such that we'd identified its
side-effect of breaking libgcj on MIPS, it wouldn't have been applied
to the branch before we'd fully understood which other changes are
also required.

The 28690 fix was fully tested on IA-32 and PowerPC and there was no
fault in applying that fix; it's fallout was unforeseeable.  There's
also no fault on the MIPS maintainers and testers for not identifying
the issue earlier.  Indeed, you came up with a corrective fix very
quickly.  But once again, this change is not 100% safe (very few
changes to the compiler ever are).  There is a very real possibility
that disabling this optimization could have a significant performance
impact on non-MIPS platforms.  Indeed, that David's seen a large
number of code generation differences means that I'd like to get a
better handle of its effects on benchmarking.



Perhaps we should discuss this further on IRC, or open up the thread
to general discussion and comment.  Back when I first started contributing
to GCC, it was not uncommon to be expected to bootstrap RTL patches on
five or six targets, including simulators and to present SPEC figures.
Several of my large changes took many months to get approved, and for all
that the statistics on how frequently mainline bootstrap was broken were
abysmal.  Even with any amount of contributor and reviewer testing, it's
inevitable that bugs slip through.  The huge size of software engineering
projects such as GCC means that the combinatorial number of interactions
between different components means that its impossible for an individual
or any goup of maintainers to predict how they'll play with each other.

My belief/position is that rather than become paralysed by this
complexity, the work-flow needs to be adapted to accomodate for it.
One of the contributing factors for the egcs-FSF split was poor
reviewer response time.  Hence, my vote is to be more accepting and
lenient of patches, instead relying on the power of GCC's community
to support patch validation.  Very few contributors have access to
MIPS hardware, or alpha hardware, or SPEC, or nullstone, etc... but
our project benefits from the fact that one of the major "unseen"
contributions is that folks regularly contribute their time to build,
test and benchmark.  The open-source principle that "given sufficient
eyes all bugs are shallow".


Rather than be (what I consider) unreasonable and require that David
run SPEC on at least both PowerPC and IA-32 to verify there's no
slow-down or quantify how severe it is, I have the option to provisionally
approve patches for mainline, and then look at Deigo's, Andreas's and
Richi's automatic benchmarking, and the results of the autotesters.  I
honestly would like to have the ability to look at a patch and say with
100% confidence and certainty, and without reservation that it should be
applied to the 4.0 branch.  However, waiting 48 or 72 hours to obtain
more feedback not only "covers my arse", but should be common sense
and "best practice".


Now I completely agree that there is a real downside to this approach.
The strategy leads to the additional complexity of large numbers of
patches in flight, applied to different branches and in different
orders.  Fortunately for us, we have excellent engineering practices
in place to keep track of this.  Bugzilla (and the bug masters)
continually maintains an up to date list of which bugs are present
on which branches.  A release manager can instantly see which bugs
still affect an upcoming release, *and* see that they may have been
fixed on mainline.

It's also no additional overhead to the contributor.  Our rules for
applying patches to the branches require that each patch must be
tested against the branch to which it is to be applied.  Even without
the complexity of pending fixes waiting to be backported, there's always
a significant divergence between branches that requires retesting
that changes are still safe.  Some contributors, obviously have the
computing power to bootstrap and regression test against the 4.0,
4.1, 4.2 branches and mainline simultaneously, and can commit to
all parts of the tree in quick succession.  I personally use a
pipelined approach, where only after bootstrapping and regression
testing and commiting to one branch, do I then kick off the bootstrap
and regression testing for a previous branch.


Finally before I finish the retrospective part of this e-mail, I'll
point out this isn't a sudden recent unilateral policy decision, but
purely a crystallization of the prescribed GCC work-flow outlined in
contributing.html that has been refined over many years.  As a reviewer
I have more comfort and faith in patches that have had more testing
than those that haven't.  Given that different branches have differing
stability/certainty requirements, it makes sense that its easier to
get a patch approached for mainline and/or 4.3, than it is for 4.1,
4.0 or 3.4.  Having said that should a contributor provide (i) an
obviously safe patch that's (ii) been tested everywhere, then I'm more
than happy to approve it immediately whereever its needed.


So now to be constructive and work forward...

It sounds as though people aren't happy with this approach.  Certainly,
the complaints appear to be much louder than the praise.  I'm not an
unreasonable man, so I'm happy to change/evolve if that makes things
easier.  It goes without saying, its always possible for contributor
to wait for another reviewer to comment or provide a second opinion.


The obvious question is what can be done to learn from and prevent
repeating the problems of the past.  I'd suggest that absolutely no-one
could have forseen that the PR28690 would have adversely affected
MIPS/libgcj.  I'm not sure that applying it immediately to the release
branches would improve things.  Likewise, for David's current fix, we
know it'll affect primary platforms that haven't been tested, we just
don't know how.

Even with the case of your and Eric's mid-air collision with PR28243.
Firstly, although the current policy led to a unintentional duplication
as you both worked on independent fixes, not appreciating the problems
were related.  The tree remained stable, and at worst we had two fixes
for the problem in the tree, rather than one.

Better yet, although your fixes were a better longer term solution
(i.e. PR25514 was a superset of PR28243), you'll remember that your
PR25514 fix introduced the failure PR27736 on HPPA.  The good news
was that the policy of applying first to mainline revealed this fallout,
so that the decision to apply both PR25514 and PR27736 to previous
branches was an informed one.


One area in which we probably could improve, is to make better use of
the "depends on" and "blocks" fields in bugzilla.  Although these
three PRs (25514, 27736 and 28243) are all related, some actually
interdependent, we currently don't track that in bugzilla.  It seems
a great same to go the effort of identifying 27736 as fallout of
25514, if we don't record that somewhere.  If anyone should decide
to backport one, they'll be aware of the dependencies.



Given the facts as I understand them above, I believe that policy
of mainline testing has helped considerably in both instances.  If
anything, we've been too hasty in backporting fixes that haven't
yet proved themselves to the stable release branches.  Short of
requiring all patches to be tested on HPPA and MIPS, it's unclear
if there's a better way.  Though I'm very open to suggestions.



I think one source of the problem is that standards the that I hold
myself to, may potentially be unrealistic for many contributors.  I
personally wouldn't consider applying one of my own fixes to a release
branch until it had been scrutinized on mainline.  Indeed the CVS logs
will show I've never committed to a branch anything that hadn't been on
mainline for at least 24 hours.  Even then I've still been bitten by
unforeseen problems (and my apologies to Mark for these).  However,
just because this is my personal credo, doesn't mean I should impose
it on anyone else.   I personally can't fathom how frequently changes
are applied to the tree that break bootstrap everywhere.  This sort
of thing should be impossible given our current guidelines.

Almost certainly, my paranoia comes from my involvement working with
GCC's RTL optimizers.  Once upon a time, you couldn't commit a change
to combine, reload or elsewhere without being almost certain to break
a cc0 target, or some obscure backend.  I think improved practices
have gone a long way to advance us from those dark days.   But even
so, our large number of regressions and ever lengthening of stages
two and three reveal that more should be done to improve quality.


So what can I do differently?  And how should that help?

Roger
--

Reply via email to