Re: GA again unreasonably slow (again)

Chris Lambertus Mon, 08 Feb 2021 14:01:53 -0800

> On Feb 8, 2021, at 1:51 PM, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> This uses https://github.com/actions/runner/pull/783 to not have
> un-trusted users run code (security is based on the actors of the commit -
> commiter’s PRs and direct pushes  are allowed to run builds on self-hosted
> runners) on our hosts, and then a combination of a Github Application, AWS
> Lambda and an AWS Auto-Scaling Group


I’d be interested in additional details on how you’ve implemented Lambda and 
AWS Auto-scaling for this.

-Chris


> 
> pon., 8 lut 2021, 09:58 użytkownik Antoine Pitrou <anto...@python.org>
> napisał:
> 
>> 
>> Hi Jarek,
>> 
>> Thank you for the document.  Could you tell us more about the "custom
>> security layer" that you implemented?
>> 
>> Regards
>> 
>> Antoine.
>> 
>> 
>> Le 08/02/2021 à 01:44, Jarek Potiuk a écrit :
>>> For anyone following this thread - some update from the progress we have
>> in
>>> Airflow on building self-hosted infrastructure for GitHub actions.
>>> 
>>> Ash from Airflow is really close to finalizing the work on a nice
>>> auto-scaling framework for self-hosted workers, but also we checked what
>> is
>>> the best value for money we can get.
>>> 
>>> I've run some analysis on the performance and tested my hypothesis (based
>>> on earlier experiences) of significant  optimisations we can get.
>>> 
>>> I've finished my analysis of potential optimizations we can get on our CI
>>> with the Self-Hosted runners that Ash created. I did some performance
>>> testing and (very crude) comparison of "traditional approach" with Local
>>> SSDs 2 CPU instances running the tests with something I already tested
>>> several times on various CI arrangements - running tests with High-Memory
>>> instances (8CPU 64 GB Mem) and running everything (including docker
>> engine)
>>> in "tmpfs" - huge ramdisk.
>>> Seems that 1h 20 minutes of test running can be decreased 8x (!)using
>> this
>>> approach (and parallelising some tests) at the same time decreasing the
>>> cost 2x (!). Yep. You heard right. We can have faster builds this way and
>>> pay less for that. Seems that we will be able to decrease the time to run
>>> all tests for one combination to 10 minutes from 1h20 minutes.
>>> This is possible because Ash and his team did a great job on setting up
>>> auto-scaling EC2 instance runners on our Amazon EC2 account (we have
>>> credits from Amazon to run those jobs - also Astronomer offered donation
>> to
>>> keep it running ). Seems that by utilizing it  we can not only pay less
>> but
>>> also get much faster builds.
>>> 
>>> If you are interested - my document is here. Open for comments - happy to
>>> add you as editors if you want (just send me your gmail address in priv).
>>> It is rather crude, I had no time to put a bit more effort into it due to
>>> some significant changes in my company, but it should be easy to compare
>>> the values and see the actual improvements we can get. There are likely a
>>> few shortcuts there and some of the numbers are "back-of-the-envelope"
>> and
>>> we are going to validate them even more when we implement all the
>>> optimisations, but the conclusions should be pretty sound.
>>> 
>>> 
>> https://docs.google.com/document/d/1ZZeZ4BYMNX7ycGRUKAXv0s6etz1g-90Onn5nRQQHOfE/edit#
>>> 
>>> J.
>>> 
>>> 
>>> On Fri, Jan 8, 2021 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>> 
>>>> 
>>>> We should be able to make an efficient query via GraphQL API right? I
>> found
>>>>> the REST API for actions to be a little underwhelming.
>>>> 
>>>> 
>>>> That was the first thing I checked when we started looking at the stats.
>>>> Unfortunately last time that I checked (and I even opened an issue for
>>>> that to
>>>> Github support) there was not a Github Actions GraphQL API.
>>>> 
>>>> I got a GH support answer "Yeah we know GH API does not have
>>>> GraphQL support yet, sorry". I think it has not changed since.
>>>> 
>>>> 
>>>> We have tried to make our builds faster with more caching but it's not
>> easy
>>>>> since it's an embedded systems project we need to target a lot of
>>>>> configurations and most changes impact all builds.
>>>>> 
>>>> 
>>>> Indeed, I know how much of my time was spent on optimising Airflow GH
>>>> usage.
>>>> I think we eventually decreased the usage 10x or more. But it never
>>>> helped, for a
>>>> long as currently anyone even accidentally could block all the slots in
>>>> almost no
>>>> time at all. We have no organisation-wide way to block this and this is
>>>> the problem.
>>>> 
>>>> Right now I could:
>>>> a) mine cryptocurrency using PRs to any Apache project
>>>> b) block the queue for everone
>>>> 
>>>> I do not have to be even an Apache committer to do that. It's enough if
>>>> just open one PR
>>>> which is well crafted and spins of 180 jobs that run for 6 hours. It's
>>>> super-flawed.
>>>> 
>>>> 
>>>>> 
>>>>> We too would like to would like to take advantage of our own runners
>> but
>>>>> more for the ability to do Hardware In the Loop testing but have
>> avoided
>>>>> it
>>>>> for the reasons already mentioned.
>>>>> 
>>>> 
>>>> Self-hosted runner for now seems to be the only "Reasonable" option but
>>>> the security
>>>> issues with the current runner are not allowing us to do it.
>>>> 
>>>>> 
>>>>> --Brennan
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> +48 660 796 129
>>>> 
>>> 
>>> 
>>
Re: GA again unreasonably slow (again)

Reply via email to