One more comment: you should take `/pulsarbot run-failure-checks` into consideration. It's now triggered by any actors and signals a rerun on behalf of @codelipenghui. Following your proposal I suggest this manner should be restricted also. And it actually means that our committers should be more actively handling PRs.
Best, tison. tison <wander4...@gmail.com> 于2022年9月15日周四 17:22写道: > Hi Lari, > > Thanks for starting this discussion. The overall proposal looks good and > it's really great that you can spend some time on such a significant > infrastructure. > > One comment here is that we can start with all "authorized" users to > trigger the CI in the committer group instead of introducing a new concept > "reviewer" - it will be another topic to discuss and I generally prefer > more committership to encourage participation instead of a complicated > membership structure. > > Besides, a quick fixup for reducing traffic is setting "Fork pull request > workflows from outside collaborators" option[1] as "Require approval for > all outside collaborators". This is provided out-of-the-box by GitHub and > requires NO development[2]. Although it doesn't restrict people who are > already apache org members but are not Pulsar committers, I believe the > trust level is acceptable. An INFRA team member will be asked to perform > the settings change if we want this. > > Best, > tison. > > [1] https://github.com/apache/pulsar/settings/actions > [2] > https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks > > > Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 16:36写道: > >> Hi all, >> >> The GitHub Actions based Pulsar CI has been experiencing issues for >> multiple weeks. The condition is currently better, but the resource >> shortage issue remains. CI builds will take a long time to complete even >> after many optimizations have been made. >> >> There's a long email thread with some details about the past issues: >> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf >> >> I have filed an issue to GitHub support about the CI issues over a week >> ago, and I finally received an answer a few hours ago. However the >> GitHub support person didn't reply to my questions at all, but instead >> suggested that there's a beta program where it's possible to pay for >> more resources. That solution isn't suitable for our case, since it >> doesn't seem to be possible to assign GitHub Actions Runner VM resources >> only for a specific Apache project. I'll follow up with GitHub support, >> but >> I don't expect that to resolve our problems in the near term. We need >> to make changes in our CI resource consumption. >> >> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov >> suggested: "Apache Spark project requires that all PRs are executed in >> the contributor's GHA quota. Maybe Pulsar can do the same ?!" >> >> The Apache Spark contributing guide contains details about this in the >> "Pull request" section, https://spark.apache.org/contributing.html . >> >> "Before creating a pull request in Apache Spark, it is important to >> check if tests can pass on your branch because our GitHub Actions >> workflows automatically run tests for your pull request/following >> commits and every run burdens the limited resources of GitHub Actions in >> Apache Spark repository. " >> >> In Pulsar, we will need to do the same. As a solution to this, Tison >> suggested that we would not run all tests for the PR unless there's a >> "ready-to-test" label on the PR. >> >> I think this is a good suggestion. We could extend the existing >> "pulsarbot" to help with the automation. >> >> A reviewer could comment "/pulsarbot ready-to-test" on the PR and >> pulsarbot would add the label and also restart the CI workflow to make >> it proceed and run the tests. >> pulsarbot would check for authorized users. One simple >> approach would be to add a file ".pulsarci.yaml" in apache/pulsar >> repository with the relevant information: >> >> committer_github_ids: >> - committer1 >> - committer2 >> ... >> >> ready_to_test: >> authorized_github_ids: >> - userid1 >> - userid2 >> ... >> >> We would have a script to synchronize all Pulsar committers to this file >> peridiotically (manual step after there's a new committer). ASF provides >> public json files for project members at >> https://whimsy.apache.org/public/public_ldap_projects.json , however the >> mapping to github user names seems to be missing. That could be done >> with a custom script since ASF LDAP contains the github username. >> >> All Pulsar committers would have access. In addition, there could be other >> users that are authorized for using "/pulsarbot ready-to-test". >> >> This solution would also require changes in the GitHub Actions workflows >> so that the workflow is failed in an early step unless there's a >> ready-to-test label for the PR. >> >> With the above solution, we would be able to cut the amount of >> unnecessary builds and get the excessive resource consumption issue >> under control. The PR authors would be instructed to run initial PR >> builds in their own fork and the reviewer should check that this is done >> before approving the PR for testing with "/pulsarbot ready-to-test". >> >> I would suggest proceeding quickly on this matter without separate PIPs >> or votes. We could follow the Apache lazy consensus >> (https://community.apache.org/committers/lazyConsensus.html) principle >> and make this happen if there aren't objections in the next 72 hours. >> The improvement suggestions to this proposal would obviously be taken >> into account and if someone objects, we wouldn't have reached lazy >> consensus and we wouldn't proceed. >> >> -Lari >> >> >> 1 - >> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8 >> >