On May 7, 2014, at 7:07 AM, HawkOwl <hawk...@atleastfornow.net> wrote:

> Hi everyone,

Hi HawkOwl,

> I’m sure that some of you have been following the past seven or so weeks of 
> Twisted 14.0 release shenanigans, and this email hopes to explain what went 
> wrong,

Given that there does not appear to be a 14.0 final, shouldn't this be "what is 
still going wrong"?  This is more like a death rattle, not a post mortem ;-).

> what we can do better next time, and where we can go from here.

Thank so much for doing this.  I'm sorry the 14.0 release process has been a 
tough one, and that its toughness has been partially my fault.

However, I'm glad that this has provoked some reflection and discussion.  The 
fact that you've done such a thorough analysis almost makes a challenging 
release cycle worth it :).

> Problem 1: Twisted 14.0.0pre1 had a regression. This was not noticed in the 
> prerelease stage because it was not marked as a regression, where the RM does 
> a check for open regressions on the milestone.

When you say it was "not noticed in the prerelease stage", do you just mean it 
didn't show up before the pre-release was made?

Also, in the future, can you always include specific links to the tickets 
involved in the problems encountered?  I'm not exactly sure which regressions 
we're talking about in pre1.

> What we can do better next time: Tickets that are regressions need to be 
> marked as regressions and applied to the release milestone. If you think it 
> might be a regression - even slightly - mark it as such, and comment that you 
> are not sure. It’s easier to find the ticket later and decide it is not 
> actually a regression than have to abort a release because it’s come up after 
> a prerelease.

At the same time, I feel like I should stress like this, by itself, was not a 
huge problem.  Specifically, rolling a second pre-release is okay.  It's a bit 
unfortunate that the regression was not tagged in advance of the release, but 
discovering issues and fixing them is exactly what the pre-release process is 
for.

> Problem 2: The fix for the regression was not merged into pre1, the release 
> was rerolled from trunk. This meant some pyOpenSSL and TLS improvements got 
> into the 14.0 release from pre2 onwards, but introduced new regressions.
> What we can do better next time: Do not reroll from trunk to get bug fixes - 
> merge them into the release branch. 

Another problem here, that I can take full blame for, was that the 
communication involved was fragmented and not terribly consistent.  HawkOwl 
would ask a question on IRC, I would give an answer, then a couple of hours 
later someone else would give an apparently contradictory answer to a follow-up 
question.  I don't think that we were actually disagreeing all that much, but 
at a number of points, it became a game of telephone.  Also, I'd sometimes ask 
a question about the release process, and someone would tell me something they 
thought HawkOwl had said or a guess as to what might come next, which I took to 
be the actual plan.

Particularly, I was very confused at various points as to whether the next 
prerelease was going to have things backported, which things were going to be 
backported, or whether we were re-rolling from trunk.  I think that, similarly, 
HawkOwl was very confused as to what I _wanted_ to happen.

In the future, when we're communicating about the release process, we should 
probably try harder than usual to have all the discussion in a persistent forum 
so that it's obvious where the state of things is.  Maybe that means the 
mailing list, maybe the release ticket, but IRC has proven to be a particularly 
inappropriate and unreliable channel for this kind of discussion.

If we _do_ have a discussion on IRC, following the precedent that some more 
responsible members of the community have set, and copying a summary or trimmed 
transcript of the relevant conclusions into the ticket or to the list should be 
a requirement.

To get a head start on this, I have put a link to this very discussion on the 
ticket. <https://twistedmatrix.com/trac/ticket/7039#comment:23>

And a final point on communication: on release branches, sensible commit 
messages are particularly important.  On most branches, individual commit 
messages can be a bit less than helpful because they're eventually all bundled 
up into a squash commit (hopefully one day a proper merge commit) with its own 
useful commit message.  That commit message can fill in any gaps left by 
unhelpful individual commits.

On release branches, however, every individual commit has release implications, 
so explaining why things are being done is extra important.  For example, this 
sequence of events is confusing: 
<https://twistedmatrix.com/trac/changeset/42616> 
<https://twistedmatrix.com/trac/changeset/42617>.  Which merge is being 
reverted?  (I can kinda guess it's the immediately preceding commit, but...) 
Did a build fail or something?  Which build?  Were some commits merged 
incorrectly?  Not hypothetical questions, by the way, I am seriously wondering 
what happened there :-).

> Problem 3: The fixes for the regressions were finished after some delay, 
> since the fixes had to be written and reviewed. This introduced delays into 
> the 14.0 release cycle.
> What we can do better next time: Rather than fix regressions introduced, the 
> ticket that introduced them should be reverted.

Yup.

> Problem 4: The fixes for the regressions did not merge cleanly with the 
> release branch. Some 35+ tickets were merged between pre1 and the release of 
> the regression fix into trunk.

The fact that PyCon was happening at the same time definitely did not help.  
For what it's worth, I _really_ tried as hard as I could to finish that stuff 
before the sprints.  But 14.0 probably should have just come out before then 
anyway :-).

> What we can do better next time: Bug fixes should be based off the release 
> branch, not trunk. This reduces the likelihood of code churn or unknown 
> dependencies causing problems during the merge.

This was one of the aforementioned problems with communication.

> Problem 5: There was mixed communication whether one of the regression fixes 
> was to be introduced in 14.0 or in a bug fix release (14.0.1).
> What we can do better: If a fix is intended for merging in to a prerelease, 
> it should be raised on the mailing list, so that there is more visibility for 
> its intentions.

There should probably also be a comment on the release ticket.

> Problem 6: I personally made several mistakes along the way - from screwing 
> up svn merges to interpreting the “abort the release and incorporate the 
> bugfix” to apply the initial regression fix. Since the TLS changes were 
> topical, I decided that having them out ASAP would be better than not.

Again: communication, communication, communication.  I didn't know about any 
screwed-up SVN merges and wasn't super clear on when releases were aborted.  I 
would have tried to help more if I knew about the issues with the release 
branch as they were occurring.

> What we can do better: Improved docs/automation to reduce the margin for RM 
> error, and better automation to make a new release to get out important 
> features really easy.

The release process _is_ getting easier and easier, but sometimes we still act 
like it's really hard and thereby introduce additional complexity and 
difficulties.

> These are the major problems which I have identified - I’m sure there’s 
> plenty more, and I would like people to list them if I have not - even if 
> they make me look like an idiot ;). We can learn from it, I’m sure.
> 
> So, this leaves where to from now. I see a few options, with my estimates for 
> work and risk that it’ll explode:
> 
> 1 - Most work, high risk - Work on making the regression fixes merge cleanly 
> with 14.0.0pre5. This is big-ish task with room for error, since there was 
> some underlying code churn.

Just to be clear, "the regression" that we're talking about is 
<https://twistedmatrix.com/trac/ticket/7097>, right?

> 2 - Some work, medium risk - Release 14.0.0pre5 as 14.0 final,

I would most prefer this option.  Embarrassing as the errors in the message 
fixed by 7097 are, I think it's acceptable to say that this is not a 
particularly meaningful regression.  For me personally it stretches the 
definition of "regression" a little bit, because it's information about new 
functionality, not a change or break in old functionality.  And emitting a new 
warning is (pretty much by definition) never a "regression" because part of our 
compatibility policy contract is that your code has to be tolerant to warnings 
being emitted.

To be fair, it stretches the definition, but it still technically adheres to 
it.  Importing twisted's TLS support without service_identity installed is a 
supported thing, it used to do something "correct", it's moved to do something 
"incorrect" because there is incorrect text emitted.  Still, if I had to 
classify it without input from anyone else I'd probably call it a "new bug".

Critically, users applications won't be broken by this.  They'll see some ugly 
or possibly incorrect text which will be fixed in an update which will 
hopefully follow on pretty quickly.  Not to mention that there's an easy fix 
for this by installing the relevant dependency.

> and I (or another RM if I’m no longer trusted ;) )

Honestly, at this point, I trust you a bit more with the release process.  Up 
until this point, you've had only easy successes, which (as you can see!) is a 
little dangerous ;-).  An experience of a failure that you have clearly 
articulated the reasons for strikes me as a very useful skill-building exercise.

> initiate the 14.1 release immediately.

More releases are always better!

> 3 - Least work, highish risk - Scrap 14.0, begin the 14.1 release 
> immediately. since 14.0 tags become 14.1 tags, and we just have to hope that 
> there’s no regressions in the 39 tickets fixed between pre1 and now. This may 
> introduce issues (since 14.0 is an un-release, and there are questions about 
> what this does to our deprecation windows).

I think that trying to cram in more features to 14.0 got us into a mess in the 
first place, so throwing our hands up at this point and trying to shepherd 39 
_more_ features into this release, potentially delaying things even longer, 
does not strike me as a good idea.

> If I am to be honest, I much prefer option #3, but I would like opinions from 
> other developers, before I go causing more problems than I already have :)

I can see why #3 is tempting, but trunk has got a lot of churn on it right now 
and I'm relieved we didn't attempt to re-roll post-PyCon despite the merge 
difficulties.

More than I'd prefer option 2 though, I'd prefer that everyone interested weigh 
in and we make a decision quickly so that the release process doesn't drag on 
further; I should reiterate that I still trust our glorious release manager 
HawkOwl to make this decision and be responsible for it, so I'm providing input 
but I'm not giving any orders here.

-glyph

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to