Good point. The SHOULD  is there and after discussions with the ASF
folks on JIRAs and members@a.o list - if there is a good reason, we
can shorten it. Critical security fix is definitely a good reason for
it (happened in Log4J for example) but the community (us) might decide
on releasing it earlier.

This has also been a similar issue for Providers. I would even extend
it not only to RCs but also to "follow-up" releases when we discover
bugs "very quickly" after release. While we have quite a lot of
testing by contributors (as usual big thanks!), we have sometimes
cases (like with this wave in November) that we had quite a big
common.sql refactor and some errors were found **just** after
releasing them.

And also in case of providers, It's crucial in such cases that fixes
are released **just** before releasing Airflow and generating
constraints. There is no need to have "sequential" voting  - it should
be perfectly safe to release providers, regenerate constraints and
release Airflow right after.

So on top of this issue described by Ash, I think we should make a
little - non-blocking but considered - dependency: (fixing buggy)
Providers  -> Airflow  (and also speed up Provider's release in cases
like that that we found rather serious - but easy to fix - problem
just before releasing Airflow - (they are rare cases, but happened
already few times when it would be useful).

I also think we should not **just** use RC1 timing.

Especially when we are close or just about to finish the voting time,
we should give at least a few people a chance to test and review those
changes. I think we should give at least 24 hrs from releasing the new
RC to allow that (unless we have REALLY good reason - i.e. critical
bugfix).

It is tempting to speed it up, more, but we've been already called out
by the ASF board by rushing a release when it was not really justified
(when we released 1.10.8
https://airflow.apache.org/announcements/#feb-7-2020 because of
Werkzeug release breaking our release thread here:
https://lists.apache.org/thread/f4ktxwqpy25olbspgncsh2lg3oqq4gfv.
Shorter vote thread here:
https://lists.apache.org/thread/5wp2zfx5qs5rr8ggct341bx398qj8v5z

I vaguely remember (but unfortunately cannot find it easily, quickly -
maybe others will) the Apache Board complained that the decision was
not really justified back then to do it "immediately" because there
was no need for immediacy.

Summarizing: I am all for speeding up follow-up releases, but I think
we should also leave some (much shorter) room/time for voting - at
least giving a chance someone who is sleeping to raise some concern:

So my proposal:

1) Yep. let's shorten it
2) But, if there are less than 24 hrs left let's leave at least 24 hrs
3) Let's apply it also to bugfix release providers (when they might
fix a buggy provider just before the release).

And if we are ok with that - and yeah - if this is 24hrs, we could
apply it for current release and ask for lazy consensus in parallel
for this policy. I think 24hr should be enough for lazy consensus and
voting on release running in parallel.



J.

On Wed, Nov 30, 2022 at 11:23 AM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> To follw up on why I think this is a worth while to adopt:
>
> It essentially comes down to attempting to reduce the workload and effort on 
> the release manager (which is already a pretty hidden and in someways 
> thankless job!)
>
> When we discover a last minute bug like we did here it's quite stressful for 
> us as RMs, because it means another three days elapsed time with the release 
> "hanging over" our heads, with no guarnatee that it won't happen again right 
> at the end of the vote next time. (My mind goes back to the 1.9.0 release 
> where we made it to an RC8! Thankfully we've gotten a lot lot better since 
> then. Read: automated more)
>
> By adopting this sort of policy it will mean that we can more easily fix 
> things discovered during the release process and have higher quality .0 
> releases (the last.. 2? 3? minor releases have all needed a .1 follow up 
> almost straight away afterwards)
>
> So by allowing shorter follow-on votes we can fix things quicker and 
> hopefully as a result we end up publishing higher-quality releases.
>
> One option on the a) b) questions might be to allow short vote for the next 
> two releases, and then examine if it helped or not once 2.6 is out (the next 
> release).
>
> On Nov 30 2022, at 9:57 am, Ephraim Anierobi <ephraimanier...@apache.org> 
> wrote:
>
> +1 to both (a) and (b) since RC2 up is usually a few commits
>
> On 2022/11/30 09:47:26 Ash Berlin-Taylor wrote:
> > Hi All,
> >
> > We've just had a case where a 11th hour bug on 2.5.0rc2 (well technically, 
> > 12:01 as the vote time had finished, but we hadn't closed it yet/wouldn't 
> > have released anyway) https://github.com/apache/airflow/issues/28002.
> > The fix is easy (it's a two line change, plus a bit of tidy up) but we have 
> > to prepare a new RC and have a new vote on it. The "annoyance" is that we 
> > currently have a 72 hour vote window.
> > The reason for a long vote window is to give people in various time zones 
> > the chance to engage which makes sense, espeically in the case of a big 
> > release where there might be a lot to test or look over. But I think in the 
> > case of follow up RCs where the changes should only ever be small on top of 
> > the the already voted-RC.
> > The ASF policies say: "Release votes SHOULD remain open for at least 72 
> > hours." Which means that we as a project can change it if we decide it is 
> > appropriate.
> > To be more concrete about what I propose we adopt:
> > Voting periods for subsequent RCs (i.e. RC2 and above) of a release can be 
> > reduced to be 72 hours since the start of the vote for RC1. The 
> > requirements for 3 new binding votes remains. This should only be used when 
> > the difference between the RCs is small (as judged by the release manager)
> > To give an example:
> > In this case (2.5.0rc2 vote), if we cancelled the vote and had an RC3, 
> > since the 72hour voting period has already elapsed the vote for RC3 would 
> > only run until the three binding votes are recieved.
> > or another case;
> > If we cancel a vote for RC1 after 24 hours, then the vote for RC2 would 
> > only need to run for 48hours, rather than the "usual" 72.
> > This is important enough that I think it needs a vote, and shouldn't be 
> > something we use lazy consensus to achive.
> > So the questions:
> > a) Do you think this is a policy we should formally adopt?
> > b) Can we use this for the about-to-be-voted on 2.5.0rc3?
> >
> > -ash
> >``is

Reply via email to