On 8 May 2014, at 5:40, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote:

> 
> On May 7, 2014, at 7:07 AM, HawkOwl <hawk...@atleastfornow.net> wrote:
> 
>> Hi everyone,
> 
> Hi HawkOwl,
> 
>> I’m sure that some of you have been following the past seven or so weeks of 
>> Twisted 14.0 release shenanigans, and this email hopes to explain what went 
>> wrong,
> 
> Given that there does not appear to be a 14.0 final, shouldn't this be "what 
> is still going wrong"?  This is more like a death rattle, not a post mortem 
> ;-).

Pre-post-mortem! :)

> 
>> what we can do better next time, and where we can go from here.
> 
> Thank so much for doing this.  I'm sorry the 14.0 release process has been a 
> tough one, and that its toughness has been partially my fault.
> 
> However, I'm glad that this has provoked some reflection and discussion.  The 
> fact that you've done such a thorough analysis almost makes a challenging 
> release cycle worth it :).
> 
>> Problem 1: Twisted 14.0.0pre1 had a regression. This was not noticed in the 
>> prerelease stage because it was not marked as a regression, where the RM 
>> does a check for open regressions on the milestone.
> 
> When you say it was "not noticed in the prerelease stage", do you just mean 
> it didn't show up before the pre-release was made?
> 
> Also, in the future, can you always include specific links to the tickets 
> involved in the problems encountered?  I'm not exactly sure which regressions 
> we're talking about in pre1.

This regression was https://twistedmatrix.com/trac/ticket/6926 - ie. that all 
our docs would be wrong.

> 
>> What we can do better next time: Tickets that are regressions need to be 
>> marked as regressions and applied to the release milestone. If you think it 
>> might be a regression - even slightly - mark it as such, and comment that 
>> you are not sure. It’s easier to find the ticket later and decide it is not 
>> actually a regression than have to abort a release because it’s come up 
>> after a prerelease.
> 
> At the same time, I feel like I should stress like this, by itself, was not a 
> huge problem.  Specifically, rolling a second pre-release is okay.  It's a 
> bit unfortunate that the regression was not tagged in advance of the release, 
> but discovering issues and fixing them is exactly what the pre-release 
> process is for.
> 
>> Problem 2: The fix for the regression was not merged into pre1, the release 
>> was rerolled from trunk. This meant some pyOpenSSL and TLS improvements got 
>> into the 14.0 release from pre2 onwards, but introduced new regressions.
>> What we can do better next time: Do not reroll from trunk to get bug fixes - 
>> merge them into the release branch. 
> 
> Another problem here, that I can take full blame for, was that the 
> communication involved was fragmented and not terribly consistent.  HawkOwl 
> would ask a question on IRC, I would give an answer, then a couple of hours 
> later someone else would give an apparently contradictory answer to a 
> follow-up question.  I don't think that we were actually disagreeing all that 
> much, but at a number of points, it became a game of telephone.  Also, I'd 
> sometimes ask a question about the release process, and someone would tell me 
> something they thought HawkOwl had said or a guess as to what might come 
> next, which I took to be the actual plan.
> 
> Particularly, I was very confused at various points as to whether the next 
> prerelease was going to have things backported, which things were going to be 
> backported, or whether we were re-rolling from trunk.  I think that, 
> similarly, HawkOwl was very confused as to what I _wanted_ to happen.
> 
> In the future, when we're communicating about the release process, we should 
> probably try harder than usual to have all the discussion in a persistent 
> forum so that it's obvious where the state of things is.  Maybe that means 
> the mailing list, maybe the release ticket, but IRC has proven to be a 
> particularly inappropriate and unreliable channel for this kind of discussion.
> 
> If we _do_ have a discussion on IRC, following the precedent that some more 
> responsible members of the community have set, and copying a summary or 
> trimmed transcript of the relevant conclusions into the ticket or to the list 
> should be a requirement.
> 
> To get a head start on this, I have put a link to this very discussion on the 
> ticket. <https://twistedmatrix.com/trac/ticket/7039#comment:23>
> 
> And a final point on communication: on release branches, sensible commit 
> messages are particularly important.  On most branches, individual commit 
> messages can be a bit less than helpful because they're eventually all 
> bundled up into a squash commit (hopefully one day a proper merge commit) 
> with its own useful commit message.  That commit message can fill in any gaps 
> left by unhelpful individual commits.
> 
> On release branches, however, every individual commit has release 
> implications, so explaining why things are being done is extra important.  
> For example, this sequence of events is confusing: 
> <https://twistedmatrix.com/trac/changeset/42616> 
> <https://twistedmatrix.com/trac/changeset/42617>.  Which merge is being 
> reverted?  (I can kinda guess it's the immediately preceding commit, but...) 
> Did a build fail or something?  Which build?  Were some commits merged 
> incorrectly?  Not hypothetical questions, by the way, I am seriously 
> wondering what happened there :-).

That was me screwing up the merge of 7097 - which was causing conflicts and all 
sorts of weirdness.

> 
>> Problem 3: The fixes for the regressions were finished after some delay, 
>> since the fixes had to be written and reviewed. This introduced delays into 
>> the 14.0 release cycle.
>> What we can do better next time: Rather than fix regressions introduced, the 
>> ticket that introduced them should be reverted.
> 
> Yup.
> 
>> Problem 4: The fixes for the regressions did not merge cleanly with the 
>> release branch. Some 35+ tickets were merged between pre1 and the release of 
>> the regression fix into trunk.
> 
> The fact that PyCon was happening at the same time definitely did not help.  
> For what it's worth, I _really_ tried as hard as I could to finish that stuff 
> before the sprints.  But 14.0 probably should have just come out before then 
> anyway :-).
> 
>> What we can do better next time: Bug fixes should be based off the release 
>> branch, not trunk. This reduces the likelihood of code churn or unknown 
>> dependencies causing problems during the merge.
> 
> This was one of the aforementioned problems with communication.
> 
>> Problem 5: There was mixed communication whether one of the regression fixes 
>> was to be introduced in 14.0 or in a bug fix release (14.0.1).
>> What we can do better: If a fix is intended for merging in to a prerelease, 
>> it should be raised on the mailing list, so that there is more visibility 
>> for its intentions.
> 
> There should probably also be a comment on the release ticket.
> 
>> Problem 6: I personally made several mistakes along the way - from screwing 
>> up svn merges to interpreting the “abort the release and incorporate the 
>> bugfix” to apply the initial regression fix. Since the TLS changes were 
>> topical, I decided that having them out ASAP would be better than not.
> 
> Again: communication, communication, communication.  I didn't know about any 
> screwed-up SVN merges and wasn't super clear on when releases were aborted.  
> I would have tried to help more if I knew about the issues with the release 
> branch as they were occurring.

The merge problems was why we have 4 14.0 release branches, remember? :)

> 
>> What we can do better: Improved docs/automation to reduce the margin for RM 
>> error, and better automation to make a new release to get out important 
>> features really easy.
> 
> The release process _is_ getting easier and easier, but sometimes we still 
> act like it's really hard and thereby introduce additional complexity and 
> difficulties.
> 
>> These are the major problems which I have identified - I’m sure there’s 
>> plenty more, and I would like people to list them if I have not - even if 
>> they make me look like an idiot ;). We can learn from it, I’m sure.
>> 
>> So, this leaves where to from now. I see a few options, with my estimates 
>> for work and risk that it’ll explode:
>> 
>> 1 - Most work, high risk - Work on making the regression fixes merge cleanly 
>> with 14.0.0pre5. This is big-ish task with room for error, since there was 
>> some underlying code churn.
> 
> Just to be clear, "the regression" that we're talking about is 
> <https://twistedmatrix.com/trac/ticket/7097>, right?

Yes.

> 
>> 2 - Some work, medium risk - Release 14.0.0pre5 as 14.0 final,
> 
> I would most prefer this option.  Embarrassing as the errors in the message 
> fixed by 7097 are, I think it's acceptable to say that this is not a 
> particularly meaningful regression.  For me personally it stretches the 
> definition of "regression" a little bit, because it's information about new 
> functionality, not a change or break in old functionality.  And emitting a 
> new warning is (pretty much by definition) never a "regression" because part 
> of our compatibility policy contract is that your code has to be tolerant to 
> warnings being emitted.
> 
> To be fair, it stretches the definition, but it still technically adheres to 
> it.  Importing twisted's TLS support without service_identity installed is a 
> supported thing, it used to do something "correct", it's moved to do 
> something "incorrect" because there is incorrect text emitted.  Still, if I 
> had to classify it without input from anyone else I'd probably call it a "new 
> bug".
> 
> Critically, users applications won't be broken by this.  They'll see some 
> ugly or possibly incorrect text which will be fixed in an update which will 
> hopefully follow on pretty quickly.  Not to mention that there's an easy fix 
> for this by installing the relevant dependency.

Now that I’ve slept on it, I’m thinking #2 might actually be the best way 
forward.

> 
>> and I (or another RM if I’m no longer trusted ;) )
> 
> Honestly, at this point, I trust you a bit more with the release process.  Up 
> until this point, you've had only easy successes, which (as you can see!) is 
> a little dangerous ;-).  An experience of a failure that you have clearly 
> articulated the reasons for strikes me as a very useful skill-building 
> exercise.
> 

Hopefully a skill I won’t have to use again, but… ;)

>> initiate the 14.1 release immediately.
> 
> More releases are always better!
> 

True!

>> 3 - Least work, highish risk - Scrap 14.0, begin the 14.1 release 
>> immediately. since 14.0 tags become 14.1 tags, and we just have to hope that 
>> there’s no regressions in the 39 tickets fixed between pre1 and now. This 
>> may introduce issues (since 14.0 is an un-release, and there are questions 
>> about what this does to our deprecation windows).
> 
> I think that trying to cram in more features to 14.0 got us into a mess in 
> the first place, so throwing our hands up at this point and trying to 
> shepherd 39 _more_ features into this release, potentially delaying things 
> even longer, does not strike me as a good idea.
> 
>> If I am to be honest, I much prefer option #3, but I would like opinions 
>> from other developers, before I go causing more problems than I already have 
>> :)
> 
> I can see why #3 is tempting, but trunk has got a lot of churn on it right 
> now and I'm relieved we didn't attempt to re-roll post-PyCon despite the 
> merge difficulties.
> 
> More than I'd prefer option 2 though, I'd prefer that everyone interested 
> weigh in and we make a decision quickly so that the release process doesn't 
> drag on further; I should reiterate that I still trust our glorious release 
> manager HawkOwl to make this decision and be responsible for it, so I'm 
> providing input but I'm not giving any orders here.

Agreed.

I’m going to give this another work day for people to weigh in on. Otherwise, I 
will go with option #2, get pre5-as-14.0 out the door, cut a 14.1 prerelease, 
and get that ball rolling. Now that I’ve had some rest between worrying about 
how much I’ve screwed up the release, that seems like the best way forward :)

But for now, I’m off to play Ingress in the rain before work! :)

- hawkie

> 
> -glyph
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to