Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Lari Hotari Thu, 15 Sep 2022 04:20:23 -0700

> In short, IIUC, each contributor should:
> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
> repo
> 
> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> contributor's private repo passed all tests, right?


Exactly. One small detail: It should be the PR author's responsibility to 
follow up and request for a review and an approval after the tests pass.
If there are later changes in the PR after the "ready-to-test" label has been 
added, we could simply let the Pulsar CI handle the builds.

-Lari

On 2022/09/15 10:11:53 Yunze Xu wrote:
> Hi Lari,
> 
> This proposal LGTM. But I have some questions about the details.
> 
> In short, IIUC, each contributor should:
> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
> repo
> 
> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> contributor's private repo passed all tests, right?
> 
> Thanks,
> Yunze
> 
> 
> 
> 
> > On Sep 15, 2022, at 17:54, Lari Hotari <lhot...@apache.org> wrote:
> > 
> > Thanks for the comment.
> > 
> > The question isn't about trusting PRs.
> > The CI resource consumption problem is also caused by current committer 
> > PRs. That's why it
> > is necessary to handle all PRs in the same way.
> > The benefit of the proposed solution is that we could decide to run some 
> > light checks automatically before the step that requires the 
> > "ready-to-check" label.
> > 
> > After I sent the proposal, I found out that the Pulsar committer 
> > information including the GitHub
> > user names is available in JSON format at 
> > https://whimsy.apache.org/roster/committee/pulsar.json .
> > Pulsarbot can use this information for authorizing users who have access to 
> > "/pulsarbot ready-to-test". 
> > 
> > I agree that we can skip adding a separate reviewer role, let's
> > simply use https://whimsy.apache.org/roster/committee/pulsar.json as the 
> > source of truth
> > for authorization.
> > 
> > -Lari
> > 
> > On 2022/09/15 09:22:18 tison wrote:
> >> Hi Lari,
> >> 
> >> Thanks for starting this discussion. The overall proposal looks good and
> >> it's really great that you can spend some time on such a significant
> >> infrastructure.
> >> 
> >> One comment here is that we can start with all "authorized" users to
> >> trigger the CI in the committer group instead of introducing a new concept
> >> "reviewer" - it will be another topic to discuss and I generally prefer
> >> more committership to encourage participation instead of a complicated
> >> membership structure.
> >> 
> >> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> >> workflows from outside collaborators" option[1] as "Require approval for
> >> all outside collaborators". This is provided out-of-the-box by GitHub and
> >> requires NO development[2]. Although it doesn't restrict people who are
> >> already apache org members but are not Pulsar committers, I believe the
> >> trust level is acceptable. An INFRA team member will be asked to perform
> >> the settings change if we want this.
> >> 
> >> Best,
> >> tison.
> >> 
> >> [1] https://github.com/apache/pulsar/settings/actions
> >> [2]
> >> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> >> 
> >> 
> >> Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 16:36写道：
> >> 
> >>> Hi all,
> >>> 
> >>> The GitHub Actions based Pulsar CI has been experiencing issues for
> >>> multiple weeks. The condition is currently better, but the resource
> >>> shortage issue remains. CI builds will take a long time to complete even
> >>> after many optimizations have been made.
> >>> 
> >>> There's a long email thread with some details about the past issues:
> >>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >>> 
> >>> I have filed an issue to GitHub support about the CI issues over a week
> >>> ago, and I finally received an answer a few hours ago. However the
> >>> GitHub support person didn't reply to my questions at all, but instead
> >>> suggested that there's a beta program where it's possible to pay for
> >>> more resources. That solution isn't suitable for our case, since it
> >>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> >>> only for a specific Apache project. I'll follow up with GitHub support, 
> >>> but
> >>> I don't expect that to resolve our problems in the near term. We need
> >>> to make changes in our CI resource consumption.
> >>> 
> >>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> >>> suggested: "Apache Spark project requires that all PRs are executed in
> >>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >>> 
> >>> The Apache Spark contributing guide contains details about this in the
> >>> "Pull request" section, https://spark.apache.org/contributing.html .
> >>> 
> >>> "Before creating a pull request in Apache Spark, it is important to
> >>> check if tests can pass on your branch because our GitHub Actions
> >>> workflows automatically run tests for your pull request/following
> >>> commits and every run burdens the limited resources of GitHub Actions in
> >>> Apache Spark repository. "
> >>> 
> >>> In Pulsar, we will need to do the same. As a solution to this, Tison
> >>> suggested that we would not run all tests for the PR unless there's a
> >>> "ready-to-test" label on the PR.
> >>> 
> >>> I think this is a good suggestion. We could extend the existing
> >>> "pulsarbot" to help with the automation.
> >>> 
> >>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> >>> pulsarbot would add the label and also restart the CI workflow to make
> >>> it proceed and run the tests.
> >>> pulsarbot would check for authorized users. One simple
> >>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> >>> repository with the relevant information:
> >>> 
> >>> committer_github_ids:
> >>>  - committer1
> >>>  - committer2
> >>>  ...
> >>> 
> >>> ready_to_test:
> >>>  authorized_github_ids:
> >>>    - userid1
> >>>    - userid2
> >>>    ...
> >>> 
> >>> We would have a script to synchronize all Pulsar committers to this file
> >>> peridiotically (manual step after there's a new committer). ASF provides
> >>> public json files for project members at
> >>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> >>> mapping to github user names seems to be missing. That could be done
> >>> with a custom script since ASF LDAP contains the github username.
> >>> 
> >>> All Pulsar committers would have access. In addition, there could be other
> >>> users that are authorized for using "/pulsarbot ready-to-test".
> >>> 
> >>> This solution would also require changes in the GitHub Actions workflows
> >>> so that the workflow is failed in an early step unless there's a
> >>> ready-to-test label for the PR.
> >>> 
> >>> With the above solution, we would be able to cut the amount of
> >>> unnecessary builds and get the excessive resource consumption issue
> >>> under control. The PR authors would be instructed to run initial PR
> >>> builds in their own fork and the reviewer should check that this is done
> >>> before approving the PR for testing with "/pulsarbot ready-to-test".
> >>> 
> >>> I would suggest proceeding quickly on this matter without separate PIPs
> >>> or votes. We could follow the Apache lazy consensus
> >>> (https://community.apache.org/committers/lazyConsensus.html) principle
> >>> and make this happen if there aren't objections in the next 72 hours.
> >>> The improvement suggestions to this proposal would obviously be taken
> >>> into account and if someone objects, we wouldn't have reached lazy
> >>> consensus and we wouldn't proceed.
> >>> 
> >>> -Lari
> >>> 
> >>> 
> >>> 1 -
> >>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >>> 
> >> 
> 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Reply via email to