Re: Pulsar CI congested, master branch build broken

Lari Hotari Tue, 30 Aug 2022 05:39:29 -0700

Pulsar CI continues to be congested, and the build queue is long.

I would strongly advice everyone to use "personal CI" to mitigate the issue of 
the long delay of CI feedback. You can simply open a PR to your own personal 
fork of apache/pulsar to run the builds in your "personal CI". There's more 
details in the previous email in this thread.

Some updates:

There has been a discussion with Gavin McDonald from ASF infra on the-asf slack 
about getting usage reports from GitHub to support the investigation. Slack 
thread is the same one mentioned in the previous email, 
https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . Gavin already 
requested the usage report in GitHub UI, but it produced invalid results.

I made a change to mitigate a source of additional GitHub Actions overhead. 
In the past, each cherry-picked commit to a maintenance branch of Pulsar has 
triggered a lot of workflow runs. 

The solution for cancelling duplicate builds automatically is to add this 
definition to the workflow definition:
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

I added this to all maintenance branch GitHub Actions workflows:

branch-2.10 change:
https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7
branch-2.9 change:
https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b
branch-2.8 change:
https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54
branch-2.7:
https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630

branch-2.11 already contains the necessary config for cancelling duplicate 
builds.

The benefit of the above change is that when multiple commits are cherry-picked 
to a branch at once, only the build of the last commit will get run eventually. 
The builds for the intermediate commits will get cancelled. Obviously there's a 
tradeoff here that we don't get the information if one of the earlier commits 
breaks the build. It's the cost that we need to pay. Nevertheless our build is 
so flaky that it's hard to determine whether a failed build result is only 
caused by bad flaky test or whether it's an actual failure. Because of this we 
don't lose anything by cancelling builds. It's more important to save build 
resources. In the maintenance branches for 2.10 and older, the average total 
build time consumed is around 20 hours which is a lot.

At this time, the overhead of maintenance branch builds doesn't seem to be the 
source of the problems. There must be some other issue which is possibly 
related to exceeding a usage quota. Hopefully we get the CI slowness issue 
solved asap.

BR,

Lari

On 2022/08/26 12:00:20 Lari Hotari wrote:
> Hi,
> 
> GitHub Actions builds have been piling up in the build queue in the last few 
> days.
> I posted on bui...@apache.org 
> https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s and created 
> INFRA ticket https://issues.apache.org/jira/browse/INFRA-23633 about this 
> issue.
> There's also a thread on the-asf slack, 
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
> 
> It seems that our build queue is finally getting picked up, but it would be 
> great to see if we hit quota and whether that is the cause of pauses. 
> 
> Another issue is that the master branch broke after merging 2 conflicting 
> PRs. 
> The fix is in https://github.com/apache/pulsar/pull/17300 . 
> 
> Merging PRs will be slow until we have these 2 problems solved and existing 
> PRs rebased over the changes. Let's prioritize merging #17300 before pushing 
> more changes.
> 
> I'd like to point out that a good way to get build feedback before sending a 
> PR, is to run builds on your personal GitHub Actions CI. The benefit of this 
> is that it doesn't consume the shared quota and builds usually start 
> instantly.
> There are instructions in the contributors guide about this. 
> https://pulsar.apache.org/contributing/#ci-testing-in-your-fork
> You simply open PRs to your own fork of apache/pulsar to run builds on your 
> personal GitHub Actions CI.
> 
> BR,
> 
> Lari
> 
> 
> 
> 
> 
> 
> 
>

Re: Pulsar CI congested, master branch build broken

Reply via email to