Re: Re-running failed flaky builds in refactored Pulsar CI GitHub Actions workflow

Lari Hotari Thu, 21 Apr 2022 02:46:01 -0700

I have made a fix to the problem described below.
Please review https://github.com/apache/pulsar-test-infra/pull/33 .


After this change is merged, closing and reopening PRs could be used to pick up 
most recent change from the master branch and "/pulsarbot rerun-failure-checks" 
will be able to rerun the failed jobs.

-Lari

On 2022/04/01 14:34:02 Lari Hotari wrote:
> I now realized that my advice to close & reopen PRs to pick up master branch 
> changes is problematic. This will cause issues with "/pulsarbot 
> rerun-failure-checks". The script currently looks for the build to restart 
> with the PR's head commit sha. If closing and reopening is used to start new 
> PR build jobs, all build jobs will have the same head commit sha attached to 
> them. When checking for that failed builds, the script will find also old 
> builds with the same head commit sha and also restart them.
> 
> Please rebased your PR (or merge master branch changes to it) to pick up 
> changes from master. Don't close & reopen PRs as I had advised earlier since 
> it causes problems. The wrong builds will be run and that adds up in the 
> build queue.
> 
> -Lari
> 
> 
> 
> On 2022/04/01 08:38:54 Lari Hotari wrote:
> > Hi all,
> > 
> > There's a small limitation in re-running failed jobs (builds that fail 
> > because of flaky tests) in the refactored Pulsar CI workflow which combines 
> > multiple jobs into a single workflow.
> > 
> > The limitation is that you need to wait for all jobs to complete before 
> > failed jobs can be re-run.
> > Yesterday there was some issue with GitHub Actions and the build queue was 
> > several hours long. When there's enough build capacity and no build queue, 
> > the new workflow finishes in about 1 hour 20 minutes.
> > 
> > Re-running failed jobs can be requested by commenting "/pulsarbot 
> > rerun-failure-checks" on the  PR. This won't do anything if one of the jobs 
> > in the workflow is still executing.
> > 
> > Another confusion has been the new test reporting, which shows all test 
> > results and test failures as checks and annotations in the GitHub UI. 
> > 
> > Here's an example:
> > https://github.com/apache/pulsar/pull/14805/checks?check_run_id=5777139002
> > 
> > There's a limitation in GitHub Actions that the test reports get attached 
> > to the first workflow when a PR triggers more than one workflow. We still 
> > have multiple workflows and the test reports get attached to the "CI - CPP, 
> > Python Tests" workflow. Failed tests will show up as red check marks and in 
> > the case of retries, the test might have succeeded in a later attempt, but 
> > the check shows as failed. This won't prevent merging the PR. Please keep 
> > this small detail in mind when interpreting the build results.
> > 
> > The test reports are very verbose at the moment. This is a problem when 
> > checking the PR build results on GitHub Mobile app. I have created a PR to 
> > reduce test reporting to GitHub Actions UI in this PR: 
> > https://github.com/apache/pulsar/pull/14959
> > 
> > Please let me know if there are any other questions or problems that have 
> > come up with the new refactored Pulsar CI GitHub Actions workflow.
> > 
> > -Lari
> > 
>

Re: Re-running failed flaky builds in refactored Pulsar CI GitHub Actions workflow

Reply via email to