Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Lari Hotari Sun, 18 Sep 2022 23:17:42 -0700

I'll now merge the changes to master. The contributor docs will be updated in a 
separate pull request. Current in progress pull requests will get the changes 
when they are updated.


-Lari

On 2022/09/16 12:43:44 Lari Hotari wrote:
> On 2022/09/16 10:09:51 PengHui Li wrote:
> > After I go through all the comments here.
> > Do we really need a new label?
> 
> Good suggestion. It was also suggested yesterday by Matteo that a PR approval 
> should be sufficient. I have modified the solution in 
> https://github.com/apache/pulsar/pull/17693 so that either a PR approval or 
> the "ready-to-test" label is required for running tests in apache/pulsar.
> 
> I have retained the ready-to-test label solution since there might be cases 
> where the reviewer might want to choose to run the tests in apache/pulsar 
> repository before approving the PR. This is to ensure that we don't change 
> the meaning of the PR approval to trigger tests to run.
> 
> In the first phase, I won't be adding "/pulsarbot ready-to-test" at all. 
> Instead, we'd use use PR approval + "/pulsarbot rerun-failure-checks" to 
> trigger the build pipeline after an approval. The label can be added manually 
> in GitHub UI if that approach is used.
> 
> I hope we are ready to merge https://github.com/apache/pulsar/pull/17693 on 
> Monday. I'm confident that everyone will be much happier with the revised CI 
> where it's possible to get almost instant CI feedback without hours of delays 
> that slows down PR processing and merging.
> 
> -Lari
> 
> On 2022/09/16 10:09:51 PengHui Li wrote:
> > Thanks, Lari
> > 
> > After I go through all the comments here.
> > Do we really need a new label?
> > 
> > It looks like if a committer thinks we should trigger the CI in the Pulsar
> > repo
> > 
> > - Passed in the fork repo
> > - No request change for this PR.
> > - ...
> > 
> > Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
> > I think it makes sense that have a committer take a look at the PR first
> > and then trigger the CI, approval is not required.
> > 
> > During the PR review, the author could also push many commits to address
> > the comment.
> > After the comments have been addressed, we can trigger the CI again.
> > 
> > Maybe I missed something about the label approach.
> > 
> > Thanks,
> > Penghui
> > 
> > 
> > On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <[email protected]> wrote:
> > 
> > > I have created a draft PR for making the changes in Pulsar CI:
> > > https://github.com/apache/pulsar/pull/17693
> > >
> > > I'm looking forward to further practical improvements. I'd like to remind
> > > everyone that we must make this change to address the CI slowness. After
> > > this change, the experience of Pulsar CI will improve for everyone.
> > >
> > > In addition to the above PR, the contributor guide and Pulsarbot changes
> > > will be needed. I expect that we would be able to complete the changes
> > > during next week.
> > >
> > > -Lari
> > >
> > > On 2022/09/15 08:36:01 Lari Hotari wrote:
> > > > Hi all,
> > > >
> > > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > > multiple weeks. The condition is currently better, but the resource
> > > > shortage issue remains. CI builds will take a long time to complete even
> > > > after many optimizations have been made.
> > > >
> > > > There's a long email thread with some details about the past issues:
> > > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > > >
> > > > I have filed an issue to GitHub support about the CI issues over a week
> > > > ago, and I finally received an answer a few hours ago. However the
> > > > GitHub support person didn't reply to my questions at all, but instead
> > > > suggested that there's a beta program where it's possible to pay for
> > > > more resources. That solution isn't suitable for our case, since it
> > > > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > > > only for a specific Apache project. I'll follow up with GitHub support,
> > > but
> > > > I don't expect that to resolve our problems in the near term. We need
> > > > to make changes in our CI resource consumption.
> > > >
> > > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > > suggested: "Apache Spark project requires that all PRs are executed in
> > > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > > >
> > > > The Apache Spark contributing guide contains details about this in the
> > > > "Pull request" section, https://spark.apache.org/contributing.html .
> > > >
> > > > "Before creating a pull request in Apache Spark, it is important to
> > > > check if tests can pass on your branch because our GitHub Actions
> > > > workflows automatically run tests for your pull request/following
> > > > commits and every run burdens the limited resources of GitHub Actions in
> > > > Apache Spark repository. "
> > > >
> > > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > > suggested that we would not run all tests for the PR unless there's a
> > > > "ready-to-test" label on the PR.
> > > >
> > > > I think this is a good suggestion. We could extend the existing
> > > > "pulsarbot" to help with the automation.
> > > >
> > > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > > pulsarbot would add the label and also restart the CI workflow to make
> > > > it proceed and run the tests.
> > > > pulsarbot would check for authorized users. One simple
> > > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > > repository with the relevant information:
> > > >
> > > > committer_github_ids:
> > > >   - committer1
> > > >   - committer2
> > > >   ...
> > > >
> > > > ready_to_test:
> > > >   authorized_github_ids:
> > > >     - userid1
> > > >     - userid2
> > > >     ...
> > > >
> > > > We would have a script to synchronize all Pulsar committers to this file
> > > > peridiotically (manual step after there's a new committer). ASF provides
> > > > public json files for project members at
> > > > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > > > mapping to github user names seems to be missing. That could be done
> > > > with a custom script since ASF LDAP contains the github username.
> > > >
> > > > All Pulsar committers would have access. In addition, there could be
> > > other
> > > > users that are authorized for using "/pulsarbot ready-to-test".
> > > >
> > > > This solution would also require changes in the GitHub Actions workflows
> > > > so that the workflow is failed in an early step unless there's a
> > > > ready-to-test label for the PR.
> > > >
> > > > With the above solution, we would be able to cut the amount of
> > > > unnecessary builds and get the excessive resource consumption issue
> > > > under control. The PR authors would be instructed to run initial PR
> > > > builds in their own fork and the reviewer should check that this is done
> > > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > > >
> > > > I would suggest proceeding quickly on this matter without separate PIPs
> > > > or votes. We could follow the Apache lazy consensus
> > > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > > and make this happen if there aren't objections in the next 72 hours.
> > > > The improvement suggestions to this proposal would obviously be taken
> > > > into account and if someone objects, we wouldn't have reached lazy
> > > > consensus and we wouldn't proceed.
> > > >
> > > > -Lari
> > > >
> > > >
> > > > 1 -
> > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > > >
> > >
> > 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Reply via email to