Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Zixuan Liu Thu, 15 Sep 2022 08:59:40 -0700

Hi Lari,

This is a good idea, I agree with that.


Once the committer added a "ready-to-test" label to a PR, then the
contributor can run the Pulsar CI.

Thanks,
Zixuan

Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 23:30写道：

> On 2022/09/15 15:09:59 Yubiao Feng wrote:
> > Hi Lari:
> >
> > That is really a good way.
> > I think it is possible to add another button to cancel the running task.
> > because after the user submits the PR, he finds other problems that need
> to
> > be fixed. In this case, he can cancel the task by himself.
>
> Thanks for the feedback Yubiao,
>
> As explained in the proposal, we currently have a resource shortage and we
> have to cut GitHub Actions usage under the apache/pulsar project. When
> users run the majority of test runs in their own forks, it won't impact
> apache/pulsar project. Users have full access to cancel builds in their own
> forks. There's a cancel button available.
>
> > > we can start with all "authorized" users to trigger the CI
> >
> > I think all contributors need to get permission. If only commiters have
> > permission, this will hurt the enthusiasm of community contributors.
> Almost
> > PR submissions are submitted by mature contributors, and they will follow
> > the rules to save resources
>
> Committers are required for reviewing and merging PRs. I think this is
> well aligned with that.
> There's no reason to be hurt. Things will be better for everyone when
> everyone uses their own fork to run tests for PRs and only when the PR is
> reviewed, we proceed to run tests in apache/pulsar project.
>
> -Lari
>
> >
> > Thanks
> > Yubiao Feng
> >
> > On Thu, Sep 15, 2022 at 4:36 PM Lari Hotari <lhot...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > multiple weeks. The condition is currently better, but the resource
> > > shortage issue remains. CI builds will take a long time to complete
> even
> > > after many optimizations have been made.
> > >
> > > There's a long email thread with some details about the past issues:
> > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > >
> > > I have filed an issue to GitHub support about the CI issues over a week
> > > ago, and I finally received an answer a few hours ago. However the
> > > GitHub support person didn't reply to my questions at all, but instead
> > > suggested that there's a beta program where it's possible to pay for
> > > more resources. That solution isn't suitable for our case, since it
> > > doesn't seem to be possible to assign GitHub Actions Runner VM
> resources
> > > only for a specific Apache project. I'll follow up with GitHub
> support, but
> > > I don't expect that to resolve our problems in the near term. We need
> > > to make changes in our CI resource consumption.
> > >
> > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > suggested: "Apache Spark project requires that all PRs are executed in
> > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > >
> > > The Apache Spark contributing guide contains details about this in the
> > > "Pull request" section, https://spark.apache.org/contributing.html .
> > >
> > > "Before creating a pull request in Apache Spark, it is important to
> > > check if tests can pass on your branch because our GitHub Actions
> > > workflows automatically run tests for your pull request/following
> > > commits and every run burdens the limited resources of GitHub Actions
> in
> > > Apache Spark repository. "
> > >
> > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > suggested that we would not run all tests for the PR unless there's a
> > > "ready-to-test" label on the PR.
> > >
> > > I think this is a good suggestion. We could extend the existing
> > > "pulsarbot" to help with the automation.
> > >
> > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > pulsarbot would add the label and also restart the CI workflow to make
> > > it proceed and run the tests.
> > > pulsarbot would check for authorized users. One simple
> > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > repository with the relevant information:
> > >
> > > committer_github_ids:
> > >   - committer1
> > >   - committer2
> > >   ...
> > >
> > > ready_to_test:
> > >   authorized_github_ids:
> > >     - userid1
> > >     - userid2
> > >     ...
> > >
> > > We would have a script to synchronize all Pulsar committers to this
> file
> > > peridiotically (manual step after there's a new committer). ASF
> provides
> > > public json files for project members at
> > > https://whimsy.apache.org/public/public_ldap_projects.json , however
> the
> > > mapping to github user names seems to be missing. That could be done
> > > with a custom script since ASF LDAP contains the github username.
> > >
> > > All Pulsar committers would have access. In addition, there could be
> other
> > > users that are authorized for using "/pulsarbot ready-to-test".
> > >
> > > This solution would also require changes in the GitHub Actions
> workflows
> > > so that the workflow is failed in an early step unless there's a
> > > ready-to-test label for the PR.
> > >
> > > With the above solution, we would be able to cut the amount of
> > > unnecessary builds and get the excessive resource consumption issue
> > > under control. The PR authors would be instructed to run initial PR
> > > builds in their own fork and the reviewer should check that this is
> done
> > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > >
> > > I would suggest proceeding quickly on this matter without separate PIPs
> > > or votes. We could follow the Apache lazy consensus
> > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > and make this happen if there aren't objections in the next 72 hours.
> > > The improvement suggestions to this proposal would obviously be taken
> > > into account and if someone objects, we wouldn't have reached lazy
> > > consensus and we wouldn't proceed.
> > >
> > > -Lari
> > >
> > >
> > > 1 -
> > >
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > >
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Reply via email to