Unfortunately, the previous change had issues restarting more than 1 build job. The problem has been resolved now. The change was https://github.com/apache/pulsar-test-infra/pull/34 . I merged the change, so please do post-merge reviews.
"/pulsarbot rerun-failure-checks" should work now. I'm sorry for the inconvenience that it caused when it wasn't working for all cases. Please let me know if there are any remaining issues. -Lari On 2022/04/21 09:45:37 Lari Hotari wrote: > I have made a fix to the problem described below. > Please review https://github.com/apache/pulsar-test-infra/pull/33 . > > After this change is merged, closing and reopening PRs could be used to pick > up most recent change from the master branch and "/pulsarbot > rerun-failure-checks" will be able to rerun the failed jobs. > > -Lari > > On 2022/04/01 14:34:02 Lari Hotari wrote: > > I now realized that my advice to close & reopen PRs to pick up master > > branch changes is problematic. This will cause issues with "/pulsarbot > > rerun-failure-checks". The script currently looks for the build to restart > > with the PR's head commit sha. If closing and reopening is used to start > > new PR build jobs, all build jobs will have the same head commit sha > > attached to them. When checking for that failed builds, the script will > > find also old builds with the same head commit sha and also restart them. > > > > Please rebased your PR (or merge master branch changes to it) to pick up > > changes from master. Don't close & reopen PRs as I had advised earlier > > since it causes problems. The wrong builds will be run and that adds up in > > the build queue. > > > > -Lari > > > > > > > > On 2022/04/01 08:38:54 Lari Hotari wrote: > > > Hi all, > > > > > > There's a small limitation in re-running failed jobs (builds that fail > > > because of flaky tests) in the refactored Pulsar CI workflow which > > > combines multiple jobs into a single workflow. > > > > > > The limitation is that you need to wait for all jobs to complete before > > > failed jobs can be re-run. > > > Yesterday there was some issue with GitHub Actions and the build queue > > > was several hours long. When there's enough build capacity and no build > > > queue, the new workflow finishes in about 1 hour 20 minutes. > > > > > > Re-running failed jobs can be requested by commenting "/pulsarbot > > > rerun-failure-checks" on the PR. This won't do anything if one of the > > > jobs in the workflow is still executing. > > > > > > Another confusion has been the new test reporting, which shows all test > > > results and test failures as checks and annotations in the GitHub UI. > > > > > > Here's an example: > > > https://github.com/apache/pulsar/pull/14805/checks?check_run_id=5777139002 > > > > > > There's a limitation in GitHub Actions that the test reports get attached > > > to the first workflow when a PR triggers more than one workflow. We still > > > have multiple workflows and the test reports get attached to the "CI - > > > CPP, Python Tests" workflow. Failed tests will show up as red check marks > > > and in the case of retries, the test might have succeeded in a later > > > attempt, but the check shows as failed. This won't prevent merging the > > > PR. Please keep this small detail in mind when interpreting the build > > > results. > > > > > > The test reports are very verbose at the moment. This is a problem when > > > checking the PR build results on GitHub Mobile app. I have created a PR > > > to reduce test reporting to GitHub Actions UI in this PR: > > > https://github.com/apache/pulsar/pull/14959 > > > > > > Please let me know if there are any other questions or problems that have > > > come up with the new refactored Pulsar CI GitHub Actions workflow. > > > > > > -Lari > > > > > >