On Wed, 25 Oct 2006, Richard Sandiford wrote: > Roger Sayle <[EMAIL PROTECTED]> writes: > > Once explained, I'd expect most maintainers would make precisely the > > same call? > > I suppose the counter-argument is that we shouldn't ship 4.2 in its > current state. We should either back out the patch that made > REG_POINTER more prominent or go with the fix for this PR.
I repeat that its not an issue of does it get fixed or not, but of which is the more suitable fix. I think it's very safe to assume that we have at least a few days to evaluate our options on mainline rather than make a rush decision. With 108 serious regressions including 12 P1s, of course we shouldn't ship 4.2 in its current state. We've not yet had a release candidate, or timeline for release. I've absolutely no intention of not selecting which fix to backport to the release branch long before the release's due date. My counter-counter argument would be that it's the rush to backport changes to the release branch that has caused this problem. If the PR 28690 fix had had longer on mainline, such that we'd identified its side-effect of breaking libgcj on MIPS, it wouldn't have been applied to the branch before we'd fully understood which other changes are also required. The 28690 fix was fully tested on IA-32 and PowerPC and there was no fault in applying that fix; it's fallout was unforeseeable. There's also no fault on the MIPS maintainers and testers for not identifying the issue earlier. Indeed, you came up with a corrective fix very quickly. But once again, this change is not 100% safe (very few changes to the compiler ever are). There is a very real possibility that disabling this optimization could have a significant performance impact on non-MIPS platforms. Indeed, that David's seen a large number of code generation differences means that I'd like to get a better handle of its effects on benchmarking. Perhaps we should discuss this further on IRC, or open up the thread to general discussion and comment. Back when I first started contributing to GCC, it was not uncommon to be expected to bootstrap RTL patches on five or six targets, including simulators and to present SPEC figures. Several of my large changes took many months to get approved, and for all that the statistics on how frequently mainline bootstrap was broken were abysmal. Even with any amount of contributor and reviewer testing, it's inevitable that bugs slip through. The huge size of software engineering projects such as GCC means that the combinatorial number of interactions between different components means that its impossible for an individual or any goup of maintainers to predict how they'll play with each other. My belief/position is that rather than become paralysed by this complexity, the work-flow needs to be adapted to accomodate for it. One of the contributing factors for the egcs-FSF split was poor reviewer response time. Hence, my vote is to be more accepting and lenient of patches, instead relying on the power of GCC's community to support patch validation. Very few contributors have access to MIPS hardware, or alpha hardware, or SPEC, or nullstone, etc... but our project benefits from the fact that one of the major "unseen" contributions is that folks regularly contribute their time to build, test and benchmark. The open-source principle that "given sufficient eyes all bugs are shallow". Rather than be (what I consider) unreasonable and require that David run SPEC on at least both PowerPC and IA-32 to verify there's no slow-down or quantify how severe it is, I have the option to provisionally approve patches for mainline, and then look at Deigo's, Andreas's and Richi's automatic benchmarking, and the results of the autotesters. I honestly would like to have the ability to look at a patch and say with 100% confidence and certainty, and without reservation that it should be applied to the 4.0 branch. However, waiting 48 or 72 hours to obtain more feedback not only "covers my arse", but should be common sense and "best practice". Now I completely agree that there is a real downside to this approach. The strategy leads to the additional complexity of large numbers of patches in flight, applied to different branches and in different orders. Fortunately for us, we have excellent engineering practices in place to keep track of this. Bugzilla (and the bug masters) continually maintains an up to date list of which bugs are present on which branches. A release manager can instantly see which bugs still affect an upcoming release, *and* see that they may have been fixed on mainline. It's also no additional overhead to the contributor. Our rules for applying patches to the branches require that each patch must be tested against the branch to which it is to be applied. Even without the complexity of pending fixes waiting to be backported, there's always a significant divergence between branches that requires retesting that changes are still safe. Some contributors, obviously have the computing power to bootstrap and regression test against the 4.0, 4.1, 4.2 branches and mainline simultaneously, and can commit to all parts of the tree in quick succession. I personally use a pipelined approach, where only after bootstrapping and regression testing and commiting to one branch, do I then kick off the bootstrap and regression testing for a previous branch. Finally before I finish the retrospective part of this e-mail, I'll point out this isn't a sudden recent unilateral policy decision, but purely a crystallization of the prescribed GCC work-flow outlined in contributing.html that has been refined over many years. As a reviewer I have more comfort and faith in patches that have had more testing than those that haven't. Given that different branches have differing stability/certainty requirements, it makes sense that its easier to get a patch approached for mainline and/or 4.3, than it is for 4.1, 4.0 or 3.4. Having said that should a contributor provide (i) an obviously safe patch that's (ii) been tested everywhere, then I'm more than happy to approve it immediately whereever its needed. So now to be constructive and work forward... It sounds as though people aren't happy with this approach. Certainly, the complaints appear to be much louder than the praise. I'm not an unreasonable man, so I'm happy to change/evolve if that makes things easier. It goes without saying, its always possible for contributor to wait for another reviewer to comment or provide a second opinion. The obvious question is what can be done to learn from and prevent repeating the problems of the past. I'd suggest that absolutely no-one could have forseen that the PR28690 would have adversely affected MIPS/libgcj. I'm not sure that applying it immediately to the release branches would improve things. Likewise, for David's current fix, we know it'll affect primary platforms that haven't been tested, we just don't know how. Even with the case of your and Eric's mid-air collision with PR28243. Firstly, although the current policy led to a unintentional duplication as you both worked on independent fixes, not appreciating the problems were related. The tree remained stable, and at worst we had two fixes for the problem in the tree, rather than one. Better yet, although your fixes were a better longer term solution (i.e. PR25514 was a superset of PR28243), you'll remember that your PR25514 fix introduced the failure PR27736 on HPPA. The good news was that the policy of applying first to mainline revealed this fallout, so that the decision to apply both PR25514 and PR27736 to previous branches was an informed one. One area in which we probably could improve, is to make better use of the "depends on" and "blocks" fields in bugzilla. Although these three PRs (25514, 27736 and 28243) are all related, some actually interdependent, we currently don't track that in bugzilla. It seems a great same to go the effort of identifying 27736 as fallout of 25514, if we don't record that somewhere. If anyone should decide to backport one, they'll be aware of the dependencies. Given the facts as I understand them above, I believe that policy of mainline testing has helped considerably in both instances. If anything, we've been too hasty in backporting fixes that haven't yet proved themselves to the stable release branches. Short of requiring all patches to be tested on HPPA and MIPS, it's unclear if there's a better way. Though I'm very open to suggestions. I think one source of the problem is that standards the that I hold myself to, may potentially be unrealistic for many contributors. I personally wouldn't consider applying one of my own fixes to a release branch until it had been scrutinized on mainline. Indeed the CVS logs will show I've never committed to a branch anything that hadn't been on mainline for at least 24 hours. Even then I've still been bitten by unforeseen problems (and my apologies to Mark for these). However, just because this is my personal credo, doesn't mean I should impose it on anyone else. I personally can't fathom how frequently changes are applied to the tree that break bootstrap everywhere. This sort of thing should be impossible given our current guidelines. Almost certainly, my paranoia comes from my involvement working with GCC's RTL optimizers. Once upon a time, you couldn't commit a change to combine, reload or elsewhere without being almost certain to break a cc0 target, or some obscure backend. I think improved practices have gone a long way to advance us from those dark days. But even so, our large number of regressions and ever lengthening of stages two and three reveal that more should be done to improve quality. So what can I do differently? And how should that help? Roger --