Re: GA again unreasonably slow (again)

Jarek Potiuk Sun, 07 Feb 2021 16:44:59 -0800

For anyone following this thread - some update from the progress we have in
Airflow on building self-hosted infrastructure for GitHub actions.

Ash from Airflow is really close to finalizing the work on a nice
auto-scaling framework for self-hosted workers, but also we checked what is
the best value for money we can get.

I've run some analysis on the performance and tested my hypothesis (based
on earlier experiences) of significant  optimisations we can get.

I've finished my analysis of potential optimizations we can get on our CI
with the Self-Hosted runners that Ash created. I did some performance
testing and (very crude) comparison of "traditional approach" with Local
SSDs 2 CPU instances running the tests with something I already tested
several times on various CI arrangements - running tests with High-Memory
instances (8CPU 64 GB Mem) and running everything (including docker engine)
in "tmpfs" - huge ramdisk.
Seems that 1h 20 minutes of test running can be decreased 8x (!)using this
approach (and parallelising some tests) at the same time decreasing the
cost 2x (!). Yep. You heard right. We can have faster builds this way and
pay less for that. Seems that we will be able to decrease the time to run
all tests for one combination to 10 minutes from 1h20 minutes.
This is possible because Ash and his team did a great job on setting up
auto-scaling EC2 instance runners on our Amazon EC2 account (we have
credits from Amazon to run those jobs - also Astronomer offered donation to
keep it running ). Seems that by utilizing it  we can not only pay less but
also get much faster builds.

If you are interested - my document is here. Open for comments - happy to
add you as editors if you want (just send me your gmail address in priv).
It is rather crude, I had no time to put a bit more effort into it due to
some significant changes in my company, but it should be easy to compare
the values and see the actual improvements we can get. There are likely a
few shortcuts there and some of the numbers are "back-of-the-envelope" and
we are going to validate them even more when we implement all the
optimisations, but the conclusions should be pretty sound.

https://docs.google.com/document/d/1ZZeZ4BYMNX7ycGRUKAXv0s6etz1g-90Onn5nRQQHOfE/edit#

J.

On Fri, Jan 8, 2021 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:

>
> We should be able to make an efficient query via GraphQL API right? I found
>> the REST API for actions to be a little underwhelming.
>
>
> That was the first thing I checked when we started looking at the stats.
> Unfortunately last time that I checked (and I even opened an issue for
> that to
> Github support) there was not a Github Actions GraphQL API.
>
> I got a GH support answer "Yeah we know GH API does not have
> GraphQL support yet, sorry". I think it has not changed since.
>
>
> We have tried to make our builds faster with more caching but it's not easy
>> since it's an embedded systems project we need to target a lot of
>> configurations and most changes impact all builds.
>>
>
> Indeed, I know how much of my time was spent on optimising Airflow GH
> usage.
> I think we eventually decreased the usage 10x or more. But it never
> helped, for a
> long as currently anyone even accidentally could block all the slots in
> almost no
> time at all. We have no organisation-wide way to block this and this is
> the problem.
>
> Right now I could:
> a) mine cryptocurrency using PRs to any Apache project
> b) block the queue for everone
>
> I do not have to be even an Apache committer to do that. It's enough if
> just open one PR
> which is well crafted and spins of 180 jobs that run for 6 hours. It's
> super-flawed.
>
>
>>
>> We too would like to would like to take advantage of our own runners but
>> more for the ability to do Hardware In the Loop testing but have avoided
>> it
>> for the reasons already mentioned.
>>
>
> Self-hosted runner for now seems to be the only "Reasonable" option but
> the security
> issues with the current runner are not allowing us to do it.
>
>>
>> --Brennan
>>
>
>
> --
> +48 660 796 129
>

-- 
+48 660 796 129

Re: GA again unreasonably slow (again)

Reply via email to