With the new GitHub Actions CI workflow there are cases where you see a red mark as a failure, but there's no need to rerun failed jobs since the red failure marks are a result of failed test reports (usually from failed flaky tests).
The new Pulsar CI workflow renders Junit xml test reports and integrates them to the GitHub UI. There are multiple benefits of this. The test failures will be shown directly in the PR review. You will see red failure marks without a failed job when flaky tests fail, but later pass in a retry. The failed test result will get recorded to a test report, but there's no need to rerun failed jobs. This doesn't block merging, but will show up so that the failures can be inspected. This can be confusing at first, since everyone has been used to rerunning jobs when there's a red failure mark shown in the PR. It might appear that "/pulsarbot rerun-failure-checks" is broken. That's not the case. Usually the issue is that there's no failed job or the workflow where a job has failed is still executing. A failed job in a workflow can only be rerun after the complete workflow completes. That's explained in an earlier message in this thread. With test reports, there's an additional confusion, since GitHub Actions has a bug that the test reports get attached randomly to a workflow when multiple workflows are executing. It's a known issue and once GitHub fixes the bug, it will be resolved. (here's a link to one of the reports about the GitHub Actions bug: https://github.community/t/github-actions-status-checks-created-on-incorrect-check-suite-id/16685) Please let me know if you have trouble with the new Pulsar CI GitHub Actions workflow and let's try to resolve the issues together. I'll try to find a place to document the details that are mentioned in this email thread. -Lari On 2022/04/01 14:34:02 Lari Hotari wrote: > I now realized that my advice to close & reopen PRs to pick up master branch > changes is problematic. This will cause issues with "/pulsarbot > rerun-failure-checks". The script currently looks for the build to restart > with the PR's head commit sha. If closing and reopening is used to start new > PR build jobs, all build jobs will have the same head commit sha attached to > them. When checking for that failed builds, the script will find also old > builds with the same head commit sha and also restart them. > > Please rebased your PR (or merge master branch changes to it) to pick up > changes from master. Don't close & reopen PRs as I had advised earlier since > it causes problems. The wrong builds will be run and that adds up in the > build queue. > > -Lari > > > > On 2022/04/01 08:38:54 Lari Hotari wrote: > > Hi all, > > > > There's a small limitation in re-running failed jobs (builds that fail > > because of flaky tests) in the refactored Pulsar CI workflow which combines > > multiple jobs into a single workflow. > > > > The limitation is that you need to wait for all jobs to complete before > > failed jobs can be re-run. > > Yesterday there was some issue with GitHub Actions and the build queue was > > several hours long. When there's enough build capacity and no build queue, > > the new workflow finishes in about 1 hour 20 minutes. > > > > Re-running failed jobs can be requested by commenting "/pulsarbot > > rerun-failure-checks" on the PR. This won't do anything if one of the jobs > > in the workflow is still executing. > > > > Another confusion has been the new test reporting, which shows all test > > results and test failures as checks and annotations in the GitHub UI. > > > > Here's an example: > > https://github.com/apache/pulsar/pull/14805/checks?check_run_id=5777139002 > > > > There's a limitation in GitHub Actions that the test reports get attached > > to the first workflow when a PR triggers more than one workflow. We still > > have multiple workflows and the test reports get attached to the "CI - CPP, > > Python Tests" workflow. Failed tests will show up as red check marks and in > > the case of retries, the test might have succeeded in a later attempt, but > > the check shows as failed. This won't prevent merging the PR. Please keep > > this small detail in mind when interpreting the build results. > > > > The test reports are very verbose at the moment. This is a problem when > > checking the PR build results on GitHub Mobile app. I have created a PR to > > reduce test reporting to GitHub Actions UI in this PR: > > https://github.com/apache/pulsar/pull/14959 > > > > Please let me know if there are any other questions or problems that have > > come up with the new refactored Pulsar CI GitHub Actions workflow. > > > > -Lari > > >