Re: GitHub Actions Concurrency Limits for Apache projects

Jarek Potiuk Tue, 27 Oct 2020 13:17:26 -0700

BTW. And we are even stuck a bit with hosted runner - we just secured some
funds, but after closer inspection this is awfully dangerous to run
self-hosted runners on GitHub and official documentation from GitHub says
we should not do it:


https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories

So we are a bit stuck now and honestly, I am not sure we have any viable
option now. So some help from the Infra and guidance is I think necessary.

J.


On Tue, Oct 27, 2020 at 9:14 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> I tried to get this info from infrastructure but I honestly have no idea -
> maybe someone from the INFRA team could let us know about the stats (that
> was the case with Travis previously that we got some usage stats for all
> projects).
>
> I do agree, that it is not sustainable at all. I'd love some clarity on
> that. So far seems that there are a few projects that started using it
> months ago and all of a sudden we've started to hit the limits.
>
> But I am afraid no one but the infra team can have any stats on it (maybe
> even they do not have it).
>
> J.
>
>
> On Tue, Oct 27, 2020 at 9:09 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> How many projects are already using GitHub actions?
>>
>> It seems to be fairly new, and I find it concerning that we are already
>> hitting the limit. If only few projects are using it currently, then it
>> may be futile to rely on it because it would inevitably collapse if more
>> projects were to use it.
>> Unless there is some project using up most of the allocated minutes,
>> similarly to what is(was?) happening with Travis.
>>
>> Alternatively, maybe GitHub actions should be reserved for quick checks
>> and not actual CI pipelines.
>>
>> On 10/27/2020 8:53 PM, Jarek Potiuk wrote:
>> > Hello everyone,
>> >
>> > The queues have become unbearable during the last two days. This is not
>> > sustainable long-term. I lost hope a bit that any kind of optimization
>> will
>> > help but we are trying anyway.
>> >
>> > However, we are still trying :)
>> >
>> > We are just about to merge and verify the PR that implements this
>> "limited
>> > matrix tests before approval solution. We implemented it with Tobiasz
>> who
>> > volunteered to help and once it works we will try to apply it to Apache
>> > Beam as well. When it works we will be happy to share the solution with
>> > everyone.
>> >
>> > You can read more on how it works (with screenshot) here:
>> > https://github.com/apache/airflow/pull/11828#issuecomment-717485938
>> >
>> > We could not implement automated workflow run due to limitations of
>> GitHub
>> > Actions (you cannot rerun successful workflow via API) but we came up
>> with
>> > something even more flexible:
>> >
>> > 1) PRs before approval only run one default combination of matrix tests.
>> > This in our case will save 50%-60% of build time for most PRs.
>> > 2) Once PR gets approved, it gets "okay to test" label and comment in PR
>> > "The PR is ready to run all tests! Please rebase it to latest master or
>> ask
>> > committer to re-run it". It also gets an "in-progress" check in the PR
>> > which turns the green "merge" button into a gray one to avoid accidental
>> > merges. But commiter can still decide to merge at this point (for small,
>> > low-risk changes).
>> > 3) Once the PR gets rebased or re-run it runs full-matrix tests and
>> > everything follows as usual
>> > 4) We also have a special treatment for the case that Allen mentioned
>> > earlier - the "small" "doc-only" PRs have a special treatment, after
>> > approval, they get immediately "okay to merge" label and "The PR is
>> ready
>> > to be merged. No tests are needed!."  comment is added by the bot
>> >
>> > Again - once we find it working, I am happy to describe how to add it to
>> > your GitHub actions and share such information with all other projects
>> > using Github Actions.
>> >
>> > J.
>> >
>> >
>> > On Fri, Oct 23, 2020 at 5:29 PM Jarek Potiuk <jarek.pot...@polidea.com>
>> > wrote:
>> >
>> >> Started working on this mini-solution for limiting non-approved
>> >> matrix builds.
>> >>
>> >> I am working on it with a colleague of mine -  Tobiasz - who worked on
>> >> Apache Beam infrastructure, so we might test it on two projects.
>> >>
>> >> I will let you know the progress
>> >>
>> >> Mini-design doc here:
>> >>
>> >>
>> https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#
>> >>
>> >> J.
>> >>
>> >>
>> >> On Thu, Oct 22, 2020 at 10:03 PM Jarek Potiuk <
>> jarek.pot...@polidea.com>
>> >> wrote:
>> >>
>> >>>
>> >>> I believe this problem cannot be really handled by one project, but I
>> >>> have a proposal.
>> >>>
>> >>> I looked at the common pattern we have in the ASF projects and I think
>> >>> there is a way that we can help each other.
>> >>>
>> >>> I think most of the problems come from many PRs submitted that run a
>> >>> matrix of tests before even commiters have time to take a look at
>> them. We
>> >>> discussed how we can approach it and I think I have a proposal that
>> we can
>> >>> all adopt in the ASF projects. Something that will be easy to
>> implement and
>> >>> will not impact the process we have. I would love to hear your
>> thoughts
>> >>> about it - before I start implementing it :).
>> >>>
>> >>> My proposal is to create a GitHub Action that will allow to run only a
>> >>> subset of "matrix" test for PRs that are not yet approved by
>> committers.
>> >>> This should be possible using the current GitHub Actions workflows
>> and API.
>> >>> It boils down to:
>> >>> * If PR is not approved, only a subset of matrix (default value for
>> each
>> >>> matrix component) are run
>> >>> * the committers can see the "green" mark of test passing and make a
>> >>> review
>> >>> * once the PR gets approved, automatically a new "full matrix" check
>> is
>> >>> triggered
>> >>> * all future approved PR pushes run the "full matrix" check
>> >>>
>> >>> I think that might significantly reduce the strain on GA jobs we run,
>> and
>> >>> it should very naturally fit in the typical PR workflow for ASF
>> projects.
>> >>> But I am only guessing now, so I would love to hear what you think:
>> >>>
>> >>> I am willing (together with my colleagues) to implement this action
>> and
>> >>> add it to Apache Airflow to check it. Together with the
>> >>> "cancel-workflow-action" I developed and we deployed it at Apache
>> Airflow
>> >>> and Apache Beam, I think that might help to keep the CI "pressure"
>> much
>> >>> lower - independently if any of the projects manages to get their
>> credit
>> >>> sponsors. I think I can have a working Action/implementation done
>> over the
>> >>> weekend:
>> >>>
>> >>> More details about the proposal here:
>> >>>
>> https://lists.apache.org/thread.html/r6f6f1420aa6346c9f81bf9d9fff8816e860e49224eb02e25d856c249%40%3Cdev.airflow.apache.org%3E
>> >>>
>> >>> J,
>> >>>
>> >>> On Mon, Oct 19, 2020 at 5:28 PM Jarek Potiuk <
>> jarek.pot...@polidea.com>
>> >>> wrote:
>> >>>
>> >>>> Yep. We still continuously optimize it and we are reaching out to get
>> >>>> funding for self-hosted runners. And I think it would be great to
>> see that
>> >>>> happening. I am happy to help anyone who needs some help there -
>> I've been
>> >>>> already helping Apache Beam with their GitHub Actions settings.
>> >>>>
>> >>>> On Mon, Oct 19, 2020 at 6:12 AM Greg Stein <gst...@gmail.com> wrote:
>> >>>>
>> >>>>> This is some great news, Jarek.
>> >>>>>
>> >>>>> Given that GitHub build minutes are shared, we need more of this
>> kind of
>> >>>>> work from our many communities.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Greg
>> >>>>> InfraAdmin, ASF
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Oct 18, 2020 at 2:32 PM Jarek Potiuk <
>> jarek.pot...@polidea.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hello Allen,
>> >>>>>>
>> >>>>>> I'd really love to give a try to Yetus - how it can actually make
>> our
>> >>>>>> approach better.
>> >>>>>>
>> >>>>>> I just merged the change I planned (finally we got to that), that
>> >>>>>> implements the final optimisation that you mentioned. In the case
>> of a
>> >>>>>> single .md file change we got the build time down to about 1
>> minute,
>> >>>>> most
>> >>>>>> of it being GitHub Actions "workflow" overhead.
>> >>>>>>
>> >>>>>> We went-down with the incremental pre-commit tests to ~ 25s.
>> >>>>>>
>> >>>>>> Build here: https://github.com/potiuk/airflow/pull/128/checks. As
>> >>>>> you can
>> >>>>>> see here:
>> >>>>>>
>> >>>>>>
>> >>>>>
>> https://github.com/potiuk/airflow/pull/128/checks?check_run_id=1268353637#step:7:98
>> >>>>>> in
>> >>>>>> this case we run only the relevant static checks:
>> >>>>>>
>> >>>>>>     - "No-tabs checker"
>> >>>>>>     - "Add license for all md files"
>> >>>>>>     - "Add TOC for md files."
>> >>>>>>     - "Check for merge conflicts"
>> >>>>>>     - "Detect Private Key"
>> >>>>>>     - "Fix End of Files"
>> >>>>>>     - "Trim Trailing Whitespace"
>> >>>>>>     - "Check for language that we do not accept as community",
>> >>>>>>
>> >>>>>> All the other checks, image building, and all the extra checks are
>> >>>>> skipped
>> >>>>>> (automatically as pre-commit determined them irrelevant).
>> >>>>>>
>> >>>>>> All this, while we keep really comprehensive tests and
>> optimisation of
>> >>>>>> image building for all the "serious steps". I tried to explain the
>> >>>>>> philosophy and some basic assumptions behind our CI in
>> >>>>>>
>> https://github.com/apache/airflow/blob/master/CI.rst#ci-environment
>> >>>>> - and
>> >>>>>> I'd love to try to see how this plays together with the Yetus tool.
>> >>>>>>
>> >>>>>> Would it be possible to work together with the Yetus team on trying
>> >>>>> to see
>> >>>>>> how it can help to further optimise and possibly simplify the
>> setup we
>> >>>>>> have? I'd love to get some cooperation on those. I am nearly done
>> >>>>> with all
>> >>>>>> optimisations I planned, And we are for years (long before my
>> tenure)
>> >>>>> among
>> >>>>>> top-3 Apache projects when it comes to CI-time use, so that might
>> be
>> >>>>> a good
>> >>>>>> one if we can pull together some improvements.
>> >>>>>>
>> >>>>>>
>> >>>>>> J.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Oct 14, 2020 at 4:41 PM Jarek Potiuk <
>> >>>>> jarek.pot...@polidea.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Exactly - > dialectic vs. dislectic for example.
>> >>>>>>>
>> >>>>>>> On Wed, Oct 14, 2020 at 4:40 PM Jarek Potiuk <
>> >>>>> jarek.pot...@polidea.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> And really sorry about yatus vs. yetus - I am slightly dialectic
>> >>>>> and
>> >>>>>> when
>> >>>>>>>> things are not in the dictionary, I tend to do many mistakes. I
>> >>>>> hope
>> >>>>>> it's
>> >>>>>>>> not something that people can take as a sign of being "worse",
>> but
>> >>>>> if
>> >>>>>> you
>> >>>>>>>> felt offended by that - apologies.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Wed, Oct 14, 2020 at 4:34 PM Jarek Potiuk <
>> >>>>> jarek.pot...@polidea.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hey Allen,
>> >>>>>>>>>
>> >>>>>>>>> I would be super happy if you could help us to do it properly at
>> >>>>>> Airlfow
>> >>>>>>>>> - would you like to work with us and get the yatus configuration
>> >>>>> that
>> >>>>>>>>> would work for us ? I am super happy to try it? Maybe you could
>> >>>>> open PR
>> >>>>>>>>> with some basic yatus implementation to start with and we could
>> >>>>> work
>> >>>>>>>>> together to get it simplified? I would love to learn how to do
>> it.
>> >>>>>>>>>
>> >>>>>>>>> J
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Oct 14, 2020 at 3:37 PM Allen Wittenauer
>> >>>>>>>>> <a...@effectivemachines.com.invalid> wrote:
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> On Oct 13, 2020, at 11:04 PM, Jarek Potiuk <
>> >>>>>> jarek.pot...@polidea.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> This is a logic
>> >>>>>>>>>>> that we have to implement regardless - whether we use yatus or
>> >>>>>>>>>> pre-commit
>> >>>>>>>>>>> (please correct me if I am wrong).
>> >>>>>>>>>>          I'm not sure about yatus, but for yetus, for the most
>> >>>>> part,
>> >>>>>>>>>> yes, one would like to need to implement custom rules in the
>> >>>>>> personality to
>> >>>>>>>>>> exactly duplicate the overly complicated and over engineered
>> >>>>> airflow
>> >>>>>>>>>> setup.  The big difference is that one wouldn't be starting
>> from
>> >>>>>> scratch.
>> >>>>>>>>>> The difference engine is already there. The file filter is
>> >>>>> already
>> >>>>>> there.
>> >>>>>>>>>> full build vs. PR handling is already there. etc etc etc
>> >>>>>>>>>>
>> >>>>>>>>>>> For all others, this is not a big issue because in total all
>> >>>>> other
>> >>>>>>>>>>> pre-commits take 2-3 minutes at best. And if we find that we
>> >>>>> need to
>> >>>>>>>>>>> optimize it further we can simply disable the '--all-files'
>> >>>>> switch
>> >>>>>> for
>> >>>>>>>>>>> pre-commit and they will only run on the latest commit-changed
>> >>>>> files
>> >>>>>>>>>>> (pre-commit will only run the tests related to those changed
>> >>>>> files).
>> >>>>>>>>>> But
>> >>>>>>>>>>> since they are pretty fast (except pylint/mypy/flake8) we
>> think
>> >>>>>>>>>> running
>> >>>>>>>>>>> them all, for now, is not a problem.
>> >>>>>>>>>>          That's what everyone thinks until they start
>> aggregating
>> >>>>> the
>> >>>>>>>>>> time across all changes...
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Jarek Potiuk
>> >>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> >>>>>>>>>
>> >>>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>> --
>> >>>>>>>>
>> >>>>>>>> Jarek Potiuk
>> >>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>>>>
>> >>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>> Jarek Potiuk
>> >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>>>
>> >>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>
>> >>>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Jarek Potiuk
>> >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>>
>> >>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Jarek Potiuk
>> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>
>> >>>> M: +48 660 796 129 <+48660796129>
>> >>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>
>> >>>>
>> >>> --
>> >>>
>> >>> Jarek Potiuk
>> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>
>> >>> M: +48 660 796 129 <+48660796129>
>> >>> [image: Polidea] <https://www.polidea.com/>
>> >>>
>> >>>
>> >> --
>> >>
>> >> Jarek Potiuk
>> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>
>> >> M: +48 660 796 129 <+48660796129>
>> >> [image: Polidea] <https://www.polidea.com/>
>> >>
>> >>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: GitHub Actions Concurrency Limits for Apache projects

Reply via email to