One more comment: you should take `/pulsarbot run-failure-checks` into
consideration. It's now triggered by any actors and signals a rerun on
behalf of @codelipenghui. Following your proposal I suggest this manner
should be restricted also. And it actually means that our committers should
be more actively handling PRs.

Best,
tison.


tison <wander4...@gmail.com> 于2022年9月15日周四 17:22写道:

> Hi Lari,
>
> Thanks for starting this discussion. The overall proposal looks good and
> it's really great that you can spend some time on such a significant
> infrastructure.
>
> One comment here is that we can start with all "authorized" users to
> trigger the CI in the committer group instead of introducing a new concept
> "reviewer" - it will be another topic to discuss and I generally prefer
> more committership to encourage participation instead of a complicated
> membership structure.
>
> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> workflows from outside collaborators" option[1] as "Require approval for
> all outside collaborators". This is provided out-of-the-box by GitHub and
> requires NO development[2]. Although it doesn't restrict people who are
> already apache org members but are not Pulsar committers, I believe the
> trust level is acceptable. An INFRA team member will be asked to perform
> the settings change if we want this.
>
> Best,
> tison.
>
> [1] https://github.com/apache/pulsar/settings/actions
> [2]
> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>
>
> Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 16:36写道:
>
>> Hi all,
>>
>> The GitHub Actions based Pulsar CI has been experiencing issues for
>> multiple weeks. The condition is currently better, but the resource
>> shortage issue remains. CI builds will take a long time to complete even
>> after many optimizations have been made.
>>
>> There's a long email thread with some details about the past issues:
>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>
>> I have filed an issue to GitHub support about the CI issues over a week
>> ago, and I finally received an answer a few hours ago. However the
>> GitHub support person didn't reply to my questions at all, but instead
>> suggested that there's a beta program where it's possible to pay for
>> more resources. That solution isn't suitable for our case, since it
>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>> only for a specific Apache project. I'll follow up with GitHub support,
>> but
>> I don't expect that to resolve our problems in the near term. We need
>> to make changes in our CI resource consumption.
>>
>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>> suggested: "Apache Spark project requires that all PRs are executed in
>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>
>> The Apache Spark contributing guide contains details about this in the
>> "Pull request" section, https://spark.apache.org/contributing.html .
>>
>> "Before creating a pull request in Apache Spark, it is important to
>> check if tests can pass on your branch because our GitHub Actions
>> workflows automatically run tests for your pull request/following
>> commits and every run burdens the limited resources of GitHub Actions in
>> Apache Spark repository. "
>>
>> In Pulsar, we will need to do the same. As a solution to this, Tison
>> suggested that we would not run all tests for the PR unless there's a
>> "ready-to-test" label on the PR.
>>
>> I think this is a good suggestion. We could extend the existing
>> "pulsarbot" to help with the automation.
>>
>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>> pulsarbot would add the label and also restart the CI workflow to make
>> it proceed and run the tests.
>> pulsarbot would check for authorized users. One simple
>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>> repository with the relevant information:
>>
>> committer_github_ids:
>>   - committer1
>>   - committer2
>>   ...
>>
>> ready_to_test:
>>   authorized_github_ids:
>>     - userid1
>>     - userid2
>>     ...
>>
>> We would have a script to synchronize all Pulsar committers to this file
>> peridiotically (manual step after there's a new committer). ASF provides
>> public json files for project members at
>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>> mapping to github user names seems to be missing. That could be done
>> with a custom script since ASF LDAP contains the github username.
>>
>> All Pulsar committers would have access. In addition, there could be other
>> users that are authorized for using "/pulsarbot ready-to-test".
>>
>> This solution would also require changes in the GitHub Actions workflows
>> so that the workflow is failed in an early step unless there's a
>> ready-to-test label for the PR.
>>
>> With the above solution, we would be able to cut the amount of
>> unnecessary builds and get the excessive resource consumption issue
>> under control. The PR authors would be instructed to run initial PR
>> builds in their own fork and the reviewer should check that this is done
>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>
>> I would suggest proceeding quickly on this matter without separate PIPs
>> or votes. We could follow the Apache lazy consensus
>> (https://community.apache.org/committers/lazyConsensus.html) principle
>> and make this happen if there aren't objections in the next 72 hours.
>> The improvement suggestions to this proposal would obviously be taken
>> into account and if someone objects, we wouldn't have reached lazy
>> consensus and we wouldn't proceed.
>>
>> -Lari
>>
>>
>> 1 -
>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>
>

Reply via email to