Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Yunze Xu Thu, 15 Sep 2022 04:45:21 -0700

Good explanation! I agree.

Thanks,
Yunze





> On Sep 15, 2022, at 19:42, Lari Hotari <[email protected]> wrote:
> 
>> I’m a little confused about what will CI do in this case? I think the 
>> “ready-to-test” label should
>> be removed in this case because the new code might not pass the tests. I 
>> thought the author
>> should request committers to add this label again after the tests passed in 
>> his own repo.
> 
> This is possible. I'd rather postpone it since it would require more
> effort to implement and perhaps just be an unnecessary burden.
> 
> The reviewer shouldn't be doing "/pulsarbot ready-to-test" for PRs
> that aren't ready.
> The PR author can continue to iterate the test runs in the fork until
> PR comments have been addressed.
> It's technically possible to have the same branch in PRs in
> apache/pulsar and your own fork. When the branch gets updated, the
> builds in the forked repository would run. That's why I think it's
> better to keep on iterating on the PR until it's really ready for
> final
> testing in apache/pulsar repository.
> 
> -Lari
> 
> On Thu, Sep 15, 2022 at 2:27 PM Yunze Xu <[email protected]> wrote:
>> 
>>> If there are later changes in the PR after the "ready-to-test" label has 
>>> been added, we could simply let the Pulsar CI handle the builds.
>> 
>> I’m a little confused about what will CI do in this case? I think the 
>> “ready-to-test” label should
>> be removed in this case because the new code might not pass the tests. I 
>> thought the author
>> should request committers to add this label again after the tests passed in 
>> his own repo.
>> 
>> Thanks,
>> Yunze
>> 
>> 
>> 
>> 
>>> On Sep 15, 2022, at 19:20, Lari Hotari <[email protected]> wrote:
>>> 
>>>> In short, IIUC, each contributor should:
>>>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork 
>>>> to
>>>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
>>>> repo
>>>> 
>>>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>>>> contributor's private repo passed all tests, right?
>>> 
>>> Exactly. One small detail: It should be the PR author's responsibility to 
>>> follow up and request for a review and an approval after the tests pass.
>>> If there are later changes in the PR after the "ready-to-test" label has 
>>> been added, we could simply let the Pulsar CI handle the builds.
>>> 
>>> -Lari
>>> 
>>> On 2022/09/15 10:11:53 Yunze Xu wrote:
>>>> Hi Lari,
>>>> 
>>>> This proposal LGTM. But I have some questions about the details.
>>>> 
>>>> In short, IIUC, each contributor should:
>>>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork 
>>>> to
>>>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
>>>> repo
>>>> 
>>>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>>>> contributor's private repo passed all tests, right?
>>>> 
>>>> Thanks,
>>>> Yunze
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Sep 15, 2022, at 17:54, Lari Hotari <[email protected]> wrote:
>>>>> 
>>>>> Thanks for the comment.
>>>>> 
>>>>> The question isn't about trusting PRs.
>>>>> The CI resource consumption problem is also caused by current committer 
>>>>> PRs. That's why it
>>>>> is necessary to handle all PRs in the same way.
>>>>> The benefit of the proposed solution is that we could decide to run some 
>>>>> light checks automatically before the step that requires the 
>>>>> "ready-to-check" label.
>>>>> 
>>>>> After I sent the proposal, I found out that the Pulsar committer 
>>>>> information including the GitHub
>>>>> user names is available in JSON format at 
>>>>> https://whimsy.apache.org/roster/committee/pulsar.json .
>>>>> Pulsarbot can use this information for authorizing users who have access 
>>>>> to
>>>>> "/pulsarbot ready-to-test".
>>>>> 
>>>>> I agree that we can skip adding a separate reviewer role, let's
>>>>> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the 
>>>>> source of truth
>>>>> for authorization.
>>>>> 
>>>>> -Lari
>>>>> 
>>>>> On 2022/09/15 09:22:18 tison wrote:
>>>>>> Hi Lari,
>>>>>> 
>>>>>> Thanks for starting this discussion. The overall proposal looks good and
>>>>>> it's really great that you can spend some time on such a significant
>>>>>> infrastructure.
>>>>>> 
>>>>>> One comment here is that we can start with all "authorized" users to
>>>>>> trigger the CI in the committer group instead of introducing a new 
>>>>>> concept
>>>>>> "reviewer" - it will be another topic to discuss and I generally prefer
>>>>>> more committership to encourage participation instead of a complicated
>>>>>> membership structure.
>>>>>> 
>>>>>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
>>>>>> workflows from outside collaborators" option[1] as "Require approval for
>>>>>> all outside collaborators". This is provided out-of-the-box by GitHub and
>>>>>> requires NO development[2]. Although it doesn't restrict people who are
>>>>>> already apache org members but are not Pulsar committers, I believe the
>>>>>> trust level is acceptable. An INFRA team member will be asked to perform
>>>>>> the settings change if we want this.
>>>>>> 
>>>>>> Best,
>>>>>> tison.
>>>>>> 
>>>>>> [1] https://github.com/apache/pulsar/settings/actions
>>>>>> [2]
>>>>>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>>>>>> 
>>>>>> 
>>>>>> Lari Hotari <[email protected]> 于2022年9月15日周四 16:36写道：
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> The GitHub Actions based Pulsar CI has been experiencing issues for
>>>>>>> multiple weeks. The condition is currently better, but the resource
>>>>>>> shortage issue remains. CI builds will take a long time to complete even
>>>>>>> after many optimizations have been made.
>>>>>>> 
>>>>>>> There's a long email thread with some details about the past issues:
>>>>>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>>>>>> 
>>>>>>> I have filed an issue to GitHub support about the CI issues over a week
>>>>>>> ago, and I finally received an answer a few hours ago. However the
>>>>>>> GitHub support person didn't reply to my questions at all, but instead
>>>>>>> suggested that there's a beta program where it's possible to pay for
>>>>>>> more resources. That solution isn't suitable for our case, since it
>>>>>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>>>>>>> only for a specific Apache project. I'll follow up with GitHub support, 
>>>>>>> but
>>>>>>> I don't expect that to resolve our problems in the near term. We need
>>>>>>> to make changes in our CI resource consumption.
>>>>>>> 
>>>>>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>>>>>>> suggested: "Apache Spark project requires that all PRs are executed in
>>>>>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>>>>>> 
>>>>>>> The Apache Spark contributing guide contains details about this in the
>>>>>>> "Pull request" section, https://spark.apache.org/contributing.html .
>>>>>>> 
>>>>>>> "Before creating a pull request in Apache Spark, it is important to
>>>>>>> check if tests can pass on your branch because our GitHub Actions
>>>>>>> workflows automatically run tests for your pull request/following
>>>>>>> commits and every run burdens the limited resources of GitHub Actions in
>>>>>>> Apache Spark repository. "
>>>>>>> 
>>>>>>> In Pulsar, we will need to do the same. As a solution to this, Tison
>>>>>>> suggested that we would not run all tests for the PR unless there's a
>>>>>>> "ready-to-test" label on the PR.
>>>>>>> 
>>>>>>> I think this is a good suggestion. We could extend the existing
>>>>>>> "pulsarbot" to help with the automation.
>>>>>>> 
>>>>>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>>>>>>> pulsarbot would add the label and also restart the CI workflow to make
>>>>>>> it proceed and run the tests.
>>>>>>> pulsarbot would check for authorized users. One simple
>>>>>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>>>>>>> repository with the relevant information:
>>>>>>> 
>>>>>>> committer_github_ids:
>>>>>>> - committer1
>>>>>>> - committer2
>>>>>>> ...
>>>>>>> 
>>>>>>> ready_to_test:
>>>>>>> authorized_github_ids:
>>>>>>>  - userid1
>>>>>>>  - userid2
>>>>>>>  ...
>>>>>>> 
>>>>>>> We would have a script to synchronize all Pulsar committers to this file
>>>>>>> peridiotically (manual step after there's a new committer). ASF provides
>>>>>>> public json files for project members at
>>>>>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>>>>>>> mapping to github user names seems to be missing. That could be done
>>>>>>> with a custom script since ASF LDAP contains the github username.
>>>>>>> 
>>>>>>> All Pulsar committers would have access. In addition, there could be 
>>>>>>> other
>>>>>>> users that are authorized for using "/pulsarbot ready-to-test".
>>>>>>> 
>>>>>>> This solution would also require changes in the GitHub Actions workflows
>>>>>>> so that the workflow is failed in an early step unless there's a
>>>>>>> ready-to-test label for the PR.
>>>>>>> 
>>>>>>> With the above solution, we would be able to cut the amount of
>>>>>>> unnecessary builds and get the excessive resource consumption issue
>>>>>>> under control. The PR authors would be instructed to run initial PR
>>>>>>> builds in their own fork and the reviewer should check that this is done
>>>>>>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>>>>>> 
>>>>>>> I would suggest proceeding quickly on this matter without separate PIPs
>>>>>>> or votes. We could follow the Apache lazy consensus
>>>>>>> (https://community.apache.org/committers/lazyConsensus.html) principle
>>>>>>> and make this happen if there aren't objections in the next 72 hours.
>>>>>>> The improvement suggestions to this proposal would obviously be taken
>>>>>>> into account and if someone objects, we wouldn't have reached lazy
>>>>>>> consensus and we wouldn't proceed.
>>>>>>> 
>>>>>>> -Lari
>>>>>>> 
>>>>>>> 
>>>>>>> 1 -
>>>>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Reply via email to