On 8 May 2014, at 5:40, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote:
> > On May 7, 2014, at 7:07 AM, HawkOwl <hawk...@atleastfornow.net> wrote: > >> Hi everyone, > > Hi HawkOwl, > >> I’m sure that some of you have been following the past seven or so weeks of >> Twisted 14.0 release shenanigans, and this email hopes to explain what went >> wrong, > > Given that there does not appear to be a 14.0 final, shouldn't this be "what > is still going wrong"? This is more like a death rattle, not a post mortem > ;-). Pre-post-mortem! :) > >> what we can do better next time, and where we can go from here. > > Thank so much for doing this. I'm sorry the 14.0 release process has been a > tough one, and that its toughness has been partially my fault. > > However, I'm glad that this has provoked some reflection and discussion. The > fact that you've done such a thorough analysis almost makes a challenging > release cycle worth it :). > >> Problem 1: Twisted 14.0.0pre1 had a regression. This was not noticed in the >> prerelease stage because it was not marked as a regression, where the RM >> does a check for open regressions on the milestone. > > When you say it was "not noticed in the prerelease stage", do you just mean > it didn't show up before the pre-release was made? > > Also, in the future, can you always include specific links to the tickets > involved in the problems encountered? I'm not exactly sure which regressions > we're talking about in pre1. This regression was https://twistedmatrix.com/trac/ticket/6926 - ie. that all our docs would be wrong. > >> What we can do better next time: Tickets that are regressions need to be >> marked as regressions and applied to the release milestone. If you think it >> might be a regression - even slightly - mark it as such, and comment that >> you are not sure. It’s easier to find the ticket later and decide it is not >> actually a regression than have to abort a release because it’s come up >> after a prerelease. > > At the same time, I feel like I should stress like this, by itself, was not a > huge problem. Specifically, rolling a second pre-release is okay. It's a > bit unfortunate that the regression was not tagged in advance of the release, > but discovering issues and fixing them is exactly what the pre-release > process is for. > >> Problem 2: The fix for the regression was not merged into pre1, the release >> was rerolled from trunk. This meant some pyOpenSSL and TLS improvements got >> into the 14.0 release from pre2 onwards, but introduced new regressions. >> What we can do better next time: Do not reroll from trunk to get bug fixes - >> merge them into the release branch. > > Another problem here, that I can take full blame for, was that the > communication involved was fragmented and not terribly consistent. HawkOwl > would ask a question on IRC, I would give an answer, then a couple of hours > later someone else would give an apparently contradictory answer to a > follow-up question. I don't think that we were actually disagreeing all that > much, but at a number of points, it became a game of telephone. Also, I'd > sometimes ask a question about the release process, and someone would tell me > something they thought HawkOwl had said or a guess as to what might come > next, which I took to be the actual plan. > > Particularly, I was very confused at various points as to whether the next > prerelease was going to have things backported, which things were going to be > backported, or whether we were re-rolling from trunk. I think that, > similarly, HawkOwl was very confused as to what I _wanted_ to happen. > > In the future, when we're communicating about the release process, we should > probably try harder than usual to have all the discussion in a persistent > forum so that it's obvious where the state of things is. Maybe that means > the mailing list, maybe the release ticket, but IRC has proven to be a > particularly inappropriate and unreliable channel for this kind of discussion. > > If we _do_ have a discussion on IRC, following the precedent that some more > responsible members of the community have set, and copying a summary or > trimmed transcript of the relevant conclusions into the ticket or to the list > should be a requirement. > > To get a head start on this, I have put a link to this very discussion on the > ticket. <https://twistedmatrix.com/trac/ticket/7039#comment:23> > > And a final point on communication: on release branches, sensible commit > messages are particularly important. On most branches, individual commit > messages can be a bit less than helpful because they're eventually all > bundled up into a squash commit (hopefully one day a proper merge commit) > with its own useful commit message. That commit message can fill in any gaps > left by unhelpful individual commits. > > On release branches, however, every individual commit has release > implications, so explaining why things are being done is extra important. > For example, this sequence of events is confusing: > <https://twistedmatrix.com/trac/changeset/42616> > <https://twistedmatrix.com/trac/changeset/42617>. Which merge is being > reverted? (I can kinda guess it's the immediately preceding commit, but...) > Did a build fail or something? Which build? Were some commits merged > incorrectly? Not hypothetical questions, by the way, I am seriously > wondering what happened there :-). That was me screwing up the merge of 7097 - which was causing conflicts and all sorts of weirdness. > >> Problem 3: The fixes for the regressions were finished after some delay, >> since the fixes had to be written and reviewed. This introduced delays into >> the 14.0 release cycle. >> What we can do better next time: Rather than fix regressions introduced, the >> ticket that introduced them should be reverted. > > Yup. > >> Problem 4: The fixes for the regressions did not merge cleanly with the >> release branch. Some 35+ tickets were merged between pre1 and the release of >> the regression fix into trunk. > > The fact that PyCon was happening at the same time definitely did not help. > For what it's worth, I _really_ tried as hard as I could to finish that stuff > before the sprints. But 14.0 probably should have just come out before then > anyway :-). > >> What we can do better next time: Bug fixes should be based off the release >> branch, not trunk. This reduces the likelihood of code churn or unknown >> dependencies causing problems during the merge. > > This was one of the aforementioned problems with communication. > >> Problem 5: There was mixed communication whether one of the regression fixes >> was to be introduced in 14.0 or in a bug fix release (14.0.1). >> What we can do better: If a fix is intended for merging in to a prerelease, >> it should be raised on the mailing list, so that there is more visibility >> for its intentions. > > There should probably also be a comment on the release ticket. > >> Problem 6: I personally made several mistakes along the way - from screwing >> up svn merges to interpreting the “abort the release and incorporate the >> bugfix” to apply the initial regression fix. Since the TLS changes were >> topical, I decided that having them out ASAP would be better than not. > > Again: communication, communication, communication. I didn't know about any > screwed-up SVN merges and wasn't super clear on when releases were aborted. > I would have tried to help more if I knew about the issues with the release > branch as they were occurring. The merge problems was why we have 4 14.0 release branches, remember? :) > >> What we can do better: Improved docs/automation to reduce the margin for RM >> error, and better automation to make a new release to get out important >> features really easy. > > The release process _is_ getting easier and easier, but sometimes we still > act like it's really hard and thereby introduce additional complexity and > difficulties. > >> These are the major problems which I have identified - I’m sure there’s >> plenty more, and I would like people to list them if I have not - even if >> they make me look like an idiot ;). We can learn from it, I’m sure. >> >> So, this leaves where to from now. I see a few options, with my estimates >> for work and risk that it’ll explode: >> >> 1 - Most work, high risk - Work on making the regression fixes merge cleanly >> with 14.0.0pre5. This is big-ish task with room for error, since there was >> some underlying code churn. > > Just to be clear, "the regression" that we're talking about is > <https://twistedmatrix.com/trac/ticket/7097>, right? Yes. > >> 2 - Some work, medium risk - Release 14.0.0pre5 as 14.0 final, > > I would most prefer this option. Embarrassing as the errors in the message > fixed by 7097 are, I think it's acceptable to say that this is not a > particularly meaningful regression. For me personally it stretches the > definition of "regression" a little bit, because it's information about new > functionality, not a change or break in old functionality. And emitting a > new warning is (pretty much by definition) never a "regression" because part > of our compatibility policy contract is that your code has to be tolerant to > warnings being emitted. > > To be fair, it stretches the definition, but it still technically adheres to > it. Importing twisted's TLS support without service_identity installed is a > supported thing, it used to do something "correct", it's moved to do > something "incorrect" because there is incorrect text emitted. Still, if I > had to classify it without input from anyone else I'd probably call it a "new > bug". > > Critically, users applications won't be broken by this. They'll see some > ugly or possibly incorrect text which will be fixed in an update which will > hopefully follow on pretty quickly. Not to mention that there's an easy fix > for this by installing the relevant dependency. Now that I’ve slept on it, I’m thinking #2 might actually be the best way forward. > >> and I (or another RM if I’m no longer trusted ;) ) > > Honestly, at this point, I trust you a bit more with the release process. Up > until this point, you've had only easy successes, which (as you can see!) is > a little dangerous ;-). An experience of a failure that you have clearly > articulated the reasons for strikes me as a very useful skill-building > exercise. > Hopefully a skill I won’t have to use again, but… ;) >> initiate the 14.1 release immediately. > > More releases are always better! > True! >> 3 - Least work, highish risk - Scrap 14.0, begin the 14.1 release >> immediately. since 14.0 tags become 14.1 tags, and we just have to hope that >> there’s no regressions in the 39 tickets fixed between pre1 and now. This >> may introduce issues (since 14.0 is an un-release, and there are questions >> about what this does to our deprecation windows). > > I think that trying to cram in more features to 14.0 got us into a mess in > the first place, so throwing our hands up at this point and trying to > shepherd 39 _more_ features into this release, potentially delaying things > even longer, does not strike me as a good idea. > >> If I am to be honest, I much prefer option #3, but I would like opinions >> from other developers, before I go causing more problems than I already have >> :) > > I can see why #3 is tempting, but trunk has got a lot of churn on it right > now and I'm relieved we didn't attempt to re-roll post-PyCon despite the > merge difficulties. > > More than I'd prefer option 2 though, I'd prefer that everyone interested > weigh in and we make a decision quickly so that the release process doesn't > drag on further; I should reiterate that I still trust our glorious release > manager HawkOwl to make this decision and be responsible for it, so I'm > providing input but I'm not giving any orders here. Agreed. I’m going to give this another work day for people to weigh in on. Otherwise, I will go with option #2, get pre5-as-14.0 out the door, cut a 14.1 prerelease, and get that ball rolling. Now that I’ve had some rest between worrying about how much I’ve screwed up the release, that seems like the best way forward :) But for now, I’m off to play Ingress in the rain before work! :) - hawkie > > -glyph > > _______________________________________________ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python