Hi Lari, This proposal LGTM. But I have some questions about the details.
In short, IIUC, each contributor should: 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo Then a committer should run `/pulsarbot ready-to-test` after the PR in contributor's private repo passed all tests, right? Thanks, Yunze > On Sep 15, 2022, at 17:54, Lari Hotari <lhot...@apache.org> wrote: > > Thanks for the comment. > > The question isn't about trusting PRs. > The CI resource consumption problem is also caused by current committer PRs. > That's why it > is necessary to handle all PRs in the same way. > The benefit of the proposed solution is that we could decide to run some > light checks automatically before the step that requires the "ready-to-check" > label. > > After I sent the proposal, I found out that the Pulsar committer information > including the GitHub > user names is available in JSON format at > https://whimsy.apache.org/roster/committee/pulsar.json . > Pulsarbot can use this information for authorizing users who have access to > "/pulsarbot ready-to-test". > > I agree that we can skip adding a separate reviewer role, let's > simply use https://whimsy.apache.org/roster/committee/pulsar.json as the > source of truth > for authorization. > > -Lari > > On 2022/09/15 09:22:18 tison wrote: >> Hi Lari, >> >> Thanks for starting this discussion. The overall proposal looks good and >> it's really great that you can spend some time on such a significant >> infrastructure. >> >> One comment here is that we can start with all "authorized" users to >> trigger the CI in the committer group instead of introducing a new concept >> "reviewer" - it will be another topic to discuss and I generally prefer >> more committership to encourage participation instead of a complicated >> membership structure. >> >> Besides, a quick fixup for reducing traffic is setting "Fork pull request >> workflows from outside collaborators" option[1] as "Require approval for >> all outside collaborators". This is provided out-of-the-box by GitHub and >> requires NO development[2]. Although it doesn't restrict people who are >> already apache org members but are not Pulsar committers, I believe the >> trust level is acceptable. An INFRA team member will be asked to perform >> the settings change if we want this. >> >> Best, >> tison. >> >> [1] https://github.com/apache/pulsar/settings/actions >> [2] >> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks >> >> >> Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 16:36写道: >> >>> Hi all, >>> >>> The GitHub Actions based Pulsar CI has been experiencing issues for >>> multiple weeks. The condition is currently better, but the resource >>> shortage issue remains. CI builds will take a long time to complete even >>> after many optimizations have been made. >>> >>> There's a long email thread with some details about the past issues: >>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf >>> >>> I have filed an issue to GitHub support about the CI issues over a week >>> ago, and I finally received an answer a few hours ago. However the >>> GitHub support person didn't reply to my questions at all, but instead >>> suggested that there's a beta program where it's possible to pay for >>> more resources. That solution isn't suitable for our case, since it >>> doesn't seem to be possible to assign GitHub Actions Runner VM resources >>> only for a specific Apache project. I'll follow up with GitHub support, but >>> I don't expect that to resolve our problems in the near term. We need >>> to make changes in our CI resource consumption. >>> >>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov >>> suggested: "Apache Spark project requires that all PRs are executed in >>> the contributor's GHA quota. Maybe Pulsar can do the same ?!" >>> >>> The Apache Spark contributing guide contains details about this in the >>> "Pull request" section, https://spark.apache.org/contributing.html . >>> >>> "Before creating a pull request in Apache Spark, it is important to >>> check if tests can pass on your branch because our GitHub Actions >>> workflows automatically run tests for your pull request/following >>> commits and every run burdens the limited resources of GitHub Actions in >>> Apache Spark repository. " >>> >>> In Pulsar, we will need to do the same. As a solution to this, Tison >>> suggested that we would not run all tests for the PR unless there's a >>> "ready-to-test" label on the PR. >>> >>> I think this is a good suggestion. We could extend the existing >>> "pulsarbot" to help with the automation. >>> >>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and >>> pulsarbot would add the label and also restart the CI workflow to make >>> it proceed and run the tests. >>> pulsarbot would check for authorized users. One simple >>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar >>> repository with the relevant information: >>> >>> committer_github_ids: >>> - committer1 >>> - committer2 >>> ... >>> >>> ready_to_test: >>> authorized_github_ids: >>> - userid1 >>> - userid2 >>> ... >>> >>> We would have a script to synchronize all Pulsar committers to this file >>> peridiotically (manual step after there's a new committer). ASF provides >>> public json files for project members at >>> https://whimsy.apache.org/public/public_ldap_projects.json , however the >>> mapping to github user names seems to be missing. That could be done >>> with a custom script since ASF LDAP contains the github username. >>> >>> All Pulsar committers would have access. In addition, there could be other >>> users that are authorized for using "/pulsarbot ready-to-test". >>> >>> This solution would also require changes in the GitHub Actions workflows >>> so that the workflow is failed in an early step unless there's a >>> ready-to-test label for the PR. >>> >>> With the above solution, we would be able to cut the amount of >>> unnecessary builds and get the excessive resource consumption issue >>> under control. The PR authors would be instructed to run initial PR >>> builds in their own fork and the reviewer should check that this is done >>> before approving the PR for testing with "/pulsarbot ready-to-test". >>> >>> I would suggest proceeding quickly on this matter without separate PIPs >>> or votes. We could follow the Apache lazy consensus >>> (https://community.apache.org/committers/lazyConsensus.html) principle >>> and make this happen if there aren't objections in the next 72 hours. >>> The improvement suggestions to this proposal would obviously be taken >>> into account and if someone objects, we wouldn't have reached lazy >>> consensus and we wouldn't proceed. >>> >>> -Lari >>> >>> >>> 1 - >>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8 >>> >>