Comments inline. On Thu, Jan 21, 2021 at 6:19 AM Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote:
> On Thu, Jan 21, 2021 at 8:11 AM Sutou Kouhei <k...@clear-code.com> wrote: > > > > Hi, > > > > I'm not sure how much this change will improve our release > > process but I'm OK with this try. > > > > Here are technical blockers for this try: > > > > * Java packaging: WIP: https://github.com/apache/arrow/pull/9155 > > * It takes 10m+. > > * It may be failed because a release manager needs to prepare > > local environment to do this. > Preferably we should dockerize this step as well. > > > > * GLib source archive preparation: > > > https://github.com/apache/arrow/blob/master/dev/release/source/build.sh > > * It takes 1m+. > > * It may not be failed because most tasks are done in Docker. > > But it means that a release manager needs to prepare Docker. > I had multiple failures during this step before containerization, > since then it never fails. > > > > There are still some small tasks(*) to build source archive > > but they aren't blockers. > > > > (*) > https://github.com/apache/arrow/blob/master/dev/release/02-source.sh#L84-L97 > > > > We can avoid GLib source archive preparation by dropping > > support for GNU Autotools. They are used on CentOS 7 and > > Ubuntu 16.04. We can use alternative build system (Meson) on > > CentOS 7. We'll drop support for Ubuntu 16.04 soon. (Ubuntu > > 16.04's EOL is 2021-04.) > > > > > > > I'll start a new rc, it'll be done in 12 hours > > > > As my past release manager experience, here are time > > consumption tasks: > > > > 1. Fixing nightly builds > > * Generally, we always have failure builds. > > * I needed 2~3 days for this. > > * I'm still working on this even when I'm not a release manager. > > > > 2. Build source including Java packages preparation > > * I always failed this with some problems and retried > > multiple times. > I experienced the same and each iteration takes 10+ minutes. > > * For example: https://issues.apache.org/jira/browse/ARROW-5764 > > [Java] Failed to build document with OpenJDK 11 > > (This is not fixed yet.) > > * I can't go to the next step while this task isn't completed. > > > > 3. Building binary packages > > * I just need to wait 1~2 hours. > It usually took around 3 hours. Appveyor was the slowest component > here because it offered no parallelization, so we had to wait 4 wheel > builds each taking around 50 minutes. > This is the first release where we build the windows wheels on github > actions, now the overall time to build the binaries is just a bit > above one hour. > > * We'll be able to speed up this by using cache such as > > ccache for C++ in Crossbow tasks: 1~2 hours -> 10~20 minutes > We always create new branches, so it would require tricky workaround > to utilize github actions cache plugin, see the cache scope at > https://github.com/actions/cache#cache-scopes > > * Generally, this isn't failed because nightly builds are fixed. > > > > 4. Downloading built binary packages and uploading binary packages > > * It takes 1~2 hours because we have many files. > Downloading takes 10-15 minutes on a 500Mbit/s network with a single > thread. > I tried to parallelize it before, but quickly hit the github api abuse > limit, see > https://docs.github.com/en/rest/overview/resources-in-the-rest-api#abuse-rate-limits > > Uploading binaries is the slowest part of the process, it takes around > 2 hours despite that we upload the binaries concurrently. Bintray also > tends to reject requests so I need to restart the uploading script > multiple times before completion. Occasionally I switch to cellular > network to make the uploading process slower but more stable. > These hours add up. And a big reason you have been the release manager for the last 10 releases is that it's too much of a commitment for most other PMCs to sign onto. But strip these away and the release manager can be more of a "manager" and just make sure that the work gets done by someone. > > > 5. Verifying RC before starting vote > > * I can start source verification while building binary packages. > > * It takes 1~2 hours. > > * Generally, I find some problems and fix them with the first RC. > > * Most problems are caused by outdated verification script. > > * It takes +0.5-1 hour per problem. > > * I'm still working on this even when I'm not a release manager. > This caused the current release to take more time. > > > > This proposal will defer costs of 3., 4. and part of 5. > > 1. still exists because we can't keep green nightly builds > > for now. > I see it as a mix of deferring and decentralizing those costs. For example, currently the Homebrew formula is already outside of the release process, it's a post-release task. On several occasions, we've had to add a patch step to the formula after the release to fix some issue that only happens in the Homebrew environment. That's fine, it has no bearing on the release, and those in the community who care to ensure that we have an up-to-date formula take care of it. Likewise with conda. This proposal would move wheels et al. to have the same status as those packages. I get the impression that the two of you (Kou and Krisztián) aren't envisioning much benefit to this proposal because you think "well I already spend all this time fixing the Linux packages/Python wheels, and I'll still have to do that even if the vote is only on the source". That may be true because you are stakeholders who care about ensuring those packages exist. What the proposal entails is that you don't also have to own all of the other packages at release time. > > > > > > > It also solves questions such as "Why should the Rust > > > release be blocked just because we're having a problem > > > building Python wheels on macOS?" > > > > It solves the question only when the problem is only related > > to packaging. If we have a non-packaging problem such as > > integration test failure, our release will be blocked. > Indeed. One of my points with this proposal is that we have too many potential release blockers, and as the project grows, we only add more blockers (packaging systems, platforms, languages). It would help if we could scale back the number of truly blocking issues. > > > > > > > > I sill think that implementing continuous (nightly) release > > verification is needed and maintained. If we keep green > > release verification, we'll always be able to cut a RC > > without problems. > I agree completely! And this is not mutually exclusive with my proposal. I would like this approach more. If we could simulate the release > process and its verification in a nightly bases then we shouldn't have > any major surprises. > In my ideal world, cutting a release is a formality, and release verification never fails because we essentially release and verify every night. We should work towards that. However, we should also acknowledge that after every release, we collect a set of JIRAs that would improve our release verification, we do some of them but not all of them, and then we add more after the next release. And although there certainly have been improvements made, I don't think the overall cost of doing a release has gone down since I joined the project two years ago. So while I would like us to pursue technical improvements, I'm less optimistic that we can code our way out of these problems. Sometimes the most efficient solution to a problem is to redefine it. > > > > > > Thanks, > > -- > > kou > > > > In <CAOCv4hg_usTK-4WvNDyRtTEUW6BiS7wtN3s=HOVa=p4cfgb...@mail.gmail.com> > > "[Proposal] Modify release process to vote only on source release" on > Tue, 19 Jan 2021 15:16:20 -0800, > > Neal Richardson <neal.p.richard...@gmail.com> wrote: > > > > > Hi all, > > > Over the past year, there's been a lot of discussion around the > challenges > > > we face as a project in doing releases. Because they are costly to do, > we > > > don't do them often; because we don't do them often, they become even > > > costlier. > > > > > > There are only a small number of people (PMC members with GPG keys > > > registered with ASF) who could possibly be release manager, and > because of > > > the amount of time required (I saw Krisztián say on the 3.0 release > thread > > > something like "I'll start a new rc, it'll be done in 12 hours), even > fewer > > > people could be expected to take on the burden. Indeed, this is > Krisztián's > > > 10th release in a row as release manager, and over the course of the > > > project, 2/3 of all release candidates have been made by just 2 people. > > > > > > I'd like to propose a change to our release procedure: instead of > having > > > the release candidate vote include Python wheels, Linux system > packages, or > > > any other binary packages, we should only vote on the source release. > > > Binary artifacts would be produced as post-release tasks, using the > > > official source release. > > > > > > This would greatly reduce the time and effort it takes to produce a > release > > > candidate--tar, sign, and upload, that's it--and it would remove a > bunch of > > > points of failure from the release-candidate making process (timeouts, > CI > > > flakiness, etc.). It would also mean fewer release-blocking issues--we > > > still have to fix the packaging builds, but doing so can happen in > parallel > > > with the verification process. If we found problems in the packaging > > > scripts, fixes could either be applied as patch steps to the binary > > > artifact build scripts, or if fixes can be produced quickly, we collect > > > them and cut another (cheap) release candidate. Right now, our only > option > > > is the latter, which makes for a slow, stressful release process where > > > there are so many places where a simple issue can block the whole > release > > > or set us back an additional week (a full day to produce a release > > > candidate plus another three to vote). > > > > > > If we went this direction, we could still choose to vote separately on > > > binary packages like wheels, though I'm not sure that's worth the > effort. > > > Many of the packages that people use (conda, homebrew, CRAN, etc.) are > > > already "unofficial" releases because they're packaged by someone > else, and > > > I don't think the distinction is meaningful to our users. > > > > > > To be clear, this doesn't reduce the general maintenance burden of the > > > project. We still have to monitor nightly builds, fix packaging scripts > > > that break, and deal with CI service interruptions. This change would > just > > > reduce the burden on the release manager and allow us to spread more > > > broadly the costs of packaging and releasing. It also solves questions > such > > > as "Why should the Rust release be blocked just because we're having a > > > problem building Python wheels on macOS?" > > > > > > There are also other things we could do that would, on a technical > level, > > > improve our ability to make releases more efficiently. Andy Grove's > change > > > in the use of maven in the release process will help, as would a > number of > > > CI/CD improvements. I view these as complementary to this proposal, > which > > > is a governance question with technical/logistical implications. > > > > > > Thoughts? > > > > > > Neal >