Re: GitHub Actions Concurrency Limits for Apache projects

Jarek Potiuk Tue, 27 Oct 2020 12:54:25 -0700

Hello everyone,

The queues have become unbearable during the last two days. This is not
sustainable long-term. I lost hope a bit that any kind of optimization will
help but we are trying anyway.


However, we are still trying :)

We are just about to merge and verify the PR that implements this "limited
matrix tests before approval solution. We implemented it with Tobiasz who
volunteered to help and once it works we will try to apply it to Apache
Beam as well. When it works we will be happy to share the solution with
everyone.

You can read more on how it works (with screenshot) here:
https://github.com/apache/airflow/pull/11828#issuecomment-717485938

We could not implement automated workflow run due to limitations of GitHub
Actions (you cannot rerun successful workflow via API) but we came up with
something even more flexible:

1) PRs before approval only run one default combination of matrix tests.
This in our case will save 50%-60% of build time for most PRs.
2) Once PR gets approved, it gets "okay to test" label and comment in PR
"The PR is ready to run all tests! Please rebase it to latest master or ask
committer to re-run it". It also gets an "in-progress" check in the PR
which turns the green "merge" button into a gray one to avoid accidental
merges. But commiter can still decide to merge at this point (for small,
low-risk changes).
3) Once the PR gets rebased or re-run it runs full-matrix tests and
everything follows as usual
4) We also have a special treatment for the case that Allen mentioned
earlier - the "small" "doc-only" PRs have a special treatment, after
approval, they get immediately "okay to merge" label and "The PR is ready
to be merged. No tests are needed!."  comment is added by the bot

Again - once we find it working, I am happy to describe how to add it to
your GitHub actions and share such information with all other projects
using Github Actions.

J.


On Fri, Oct 23, 2020 at 5:29 PM Jarek Potiuk <[email protected]>
wrote:

> Started working on this mini-solution for limiting non-approved
> matrix builds.
>
> I am working on it with a colleague of mine -  Tobiasz - who worked on
> Apache Beam infrastructure, so we might test it on two projects.
>
> I will let you know the progress
>
> Mini-design doc here:
>
> https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#
>
> J.
>
>
> On Thu, Oct 22, 2020 at 10:03 PM Jarek Potiuk <[email protected]>
> wrote:
>
>>
>>
>> I believe this problem cannot be really handled by one project, but I
>> have a proposal.
>>
>> I looked at the common pattern we have in the ASF projects and I think
>> there is a way that we can help each other.
>>
>> I think most of the problems come from many PRs submitted that run a
>> matrix of tests before even commiters have time to take a look at them. We
>> discussed how we can approach it and I think I have a proposal that we can
>> all adopt in the ASF projects. Something that will be easy to implement and
>> will not impact the process we have. I would love to hear your thoughts
>> about it - before I start implementing it :).
>>
>> My proposal is to create a GitHub Action that will allow to run only a
>> subset of "matrix" test for PRs that are not yet approved by committers.
>> This should be possible using the current GitHub Actions workflows and API.
>> It boils down to:
>> * If PR is not approved, only a subset of matrix (default value for each
>> matrix component) are run
>> * the committers can see the "green" mark of test passing and make a
>> review
>> * once the PR gets approved, automatically a new "full matrix" check is
>> triggered
>> * all future approved PR pushes run the "full matrix" check
>>
>> I think that might significantly reduce the strain on GA jobs we run, and
>> it should very naturally fit in the typical PR workflow for ASF projects.
>> But I am only guessing now, so I would love to hear what you think:
>>
>> I am willing (together with my colleagues) to implement this action and
>> add it to Apache Airflow to check it. Together with the
>> "cancel-workflow-action" I developed and we deployed it at Apache Airflow
>> and Apache Beam, I think that might help to keep the CI "pressure" much
>> lower - independently if any of the projects manages to get their credit
>> sponsors. I think I can have a working Action/implementation done over the
>> weekend:
>>
>> More details about the proposal here:
>> https://lists.apache.org/thread.html/r6f6f1420aa6346c9f81bf9d9fff8816e860e49224eb02e25d856c249%40%3Cdev.airflow.apache.org%3E
>>
>> J,
>>
>> On Mon, Oct 19, 2020 at 5:28 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>>> Yep. We still continuously optimize it and we are reaching out to get
>>> funding for self-hosted runners. And I think it would be great to see that
>>> happening. I am happy to help anyone who needs some help there - I've been
>>> already helping Apache Beam with their GitHub Actions settings.
>>>
>>> On Mon, Oct 19, 2020 at 6:12 AM Greg Stein <[email protected]> wrote:
>>>
>>>> This is some great news, Jarek.
>>>>
>>>> Given that GitHub build minutes are shared, we need more of this kind of
>>>> work from our many communities.
>>>>
>>>> Thanks,
>>>> Greg
>>>> InfraAdmin, ASF
>>>>
>>>>
>>>> On Sun, Oct 18, 2020 at 2:32 PM Jarek Potiuk <[email protected]>
>>>> wrote:
>>>>
>>>> > Hello Allen,
>>>> >
>>>> > I'd really love to give a try to Yetus - how it can actually make our
>>>> > approach better.
>>>> >
>>>> > I just merged the change I planned (finally we got to that), that
>>>> > implements the final optimisation that you mentioned. In the case of a
>>>> > single .md file change we got the build time down to about 1 minute,
>>>> most
>>>> > of it being GitHub Actions "workflow" overhead.
>>>> >
>>>> > We went-down with the incremental pre-commit tests to ~ 25s.
>>>> >
>>>> > Build here: https://github.com/potiuk/airflow/pull/128/checks. As
>>>> you can
>>>> > see here:
>>>> >
>>>> >
>>>> https://github.com/potiuk/airflow/pull/128/checks?check_run_id=1268353637#step:7:98
>>>> > in
>>>> > this case we run only the relevant static checks:
>>>> >
>>>> >    - "No-tabs checker"
>>>> >    - "Add license for all md files"
>>>> >    - "Add TOC for md files."
>>>> >    - "Check for merge conflicts"
>>>> >    - "Detect Private Key"
>>>> >    - "Fix End of Files"
>>>> >    - "Trim Trailing Whitespace"
>>>> >    - "Check for language that we do not accept as community",
>>>> >
>>>> > All the other checks, image building, and all the extra checks are
>>>> skipped
>>>> > (automatically as pre-commit determined them irrelevant).
>>>> >
>>>> > All this, while we keep really comprehensive tests and optimisation of
>>>> > image building for all the "serious steps". I tried to explain the
>>>> > philosophy and some basic assumptions behind our CI in
>>>> > https://github.com/apache/airflow/blob/master/CI.rst#ci-environment
>>>> - and
>>>> > I'd love to try to see how this plays together with the Yetus tool.
>>>> >
>>>> > Would it be possible to work together with the Yetus team on trying
>>>> to see
>>>> > how it can help to further optimise and possibly simplify the setup we
>>>> > have? I'd love to get some cooperation on those. I am nearly done
>>>> with all
>>>> > optimisations I planned, And we are for years (long before my tenure)
>>>> among
>>>> > top-3 Apache projects when it comes to CI-time use, so that might be
>>>> a good
>>>> > one if we can pull together some improvements.
>>>> >
>>>> >
>>>> > J.
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Oct 14, 2020 at 4:41 PM Jarek Potiuk <
>>>> [email protected]>
>>>> > wrote:
>>>> >
>>>> > > Exactly - > dialectic vs. dislectic for example.
>>>> > >
>>>> > > On Wed, Oct 14, 2020 at 4:40 PM Jarek Potiuk <
>>>> [email protected]>
>>>> > > wrote:
>>>> > >
>>>> > >> And really sorry about yatus vs. yetus - I am slightly dialectic
>>>> and
>>>> > when
>>>> > >> things are not in the dictionary, I tend to do many mistakes. I
>>>> hope
>>>> > it's
>>>> > >> not something that people can take as a sign of being "worse", but
>>>> if
>>>> > you
>>>> > >> felt offended by that - apologies.
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> On Wed, Oct 14, 2020 at 4:34 PM Jarek Potiuk <
>>>> [email protected]>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Hey Allen,
>>>> > >>>
>>>> > >>> I would be super happy if you could help us to do it properly at
>>>> > Airlfow
>>>> > >>> - would you like to work with us and get the yatus configuration
>>>> that
>>>> > >>> would work for us ? I am super happy to try it? Maybe you could
>>>> open PR
>>>> > >>> with some basic yatus implementation to start with and we could
>>>> work
>>>> > >>> together to get it simplified? I would love to learn how to do it.
>>>> > >>>
>>>> > >>> J
>>>> > >>>
>>>> > >>>
>>>> > >>> On Wed, Oct 14, 2020 at 3:37 PM Allen Wittenauer
>>>> > >>> <[email protected]> wrote:
>>>> > >>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> > On Oct 13, 2020, at 11:04 PM, Jarek Potiuk <
>>>> > [email protected]>
>>>> > >>>> wrote:
>>>> > >>>> > This is a logic
>>>> > >>>> > that we have to implement regardless - whether we use yatus or
>>>> > >>>> pre-commit
>>>> > >>>> > (please correct me if I am wrong).
>>>> > >>>>
>>>> > >>>>         I'm not sure about yatus, but for yetus, for the most
>>>> part,
>>>> > >>>> yes, one would like to need to implement custom rules in the
>>>> > personality to
>>>> > >>>> exactly duplicate the overly complicated and over engineered
>>>> airflow
>>>> > >>>> setup.  The big difference is that one wouldn't be starting from
>>>> > scratch.
>>>> > >>>> The difference engine is already there. The file filter is
>>>> already
>>>> > there.
>>>> > >>>> full build vs. PR handling is already there. etc etc etc
>>>> > >>>>
>>>> > >>>> > For all others, this is not a big issue because in total all
>>>> other
>>>> > >>>> > pre-commits take 2-3 minutes at best. And if we find that we
>>>> need to
>>>> > >>>> > optimize it further we can simply disable the '--all-files'
>>>> switch
>>>> > for
>>>> > >>>> > pre-commit and they will only run on the latest commit-changed
>>>> files
>>>> > >>>> > (pre-commit will only run the tests related to those changed
>>>> files).
>>>> > >>>> But
>>>> > >>>> > since they are pretty fast (except pylint/mypy/flake8) we think
>>>> > >>>> running
>>>> > >>>> > them all, for now, is not a problem.
>>>> > >>>>
>>>> > >>>>         That's what everyone thinks until they start aggregating
>>>> the
>>>> > >>>> time across all changes...
>>>> > >>>>
>>>> > >>>>
>>>> > >>>
>>>> > >>> --
>>>> > >>>
>>>> > >>> Jarek Potiuk
>>>> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> > >>>
>>>> > >>> M: +48 660 796 129 <+48660796129>
>>>> > >>> [image: Polidea] <https://www.polidea.com/>
>>>> > >>>
>>>> > >>>
>>>> > >>
>>>> > >> --
>>>> > >>
>>>> > >> Jarek Potiuk
>>>> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> > >>
>>>> > >> M: +48 660 796 129 <+48660796129>
>>>> > >> [image: Polidea] <https://www.polidea.com/>
>>>> > >>
>>>> > >>
>>>> > >
>>>> > > --
>>>> > >
>>>> > > Jarek Potiuk
>>>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> > >
>>>> > > M: +48 660 796 129 <+48660796129>
>>>> > > [image: Polidea] <https://www.polidea.com/>
>>>> > >
>>>> > >
>>>> >
>>>> > --
>>>> >
>>>> > Jarek Potiuk
>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> >
>>>> > M: +48 660 796 129 <+48660796129>
>>>> > [image: Polidea] <https://www.polidea.com/>
>>>> >
>>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: GitHub Actions Concurrency Limits for Apache projects

Reply via email to