Re: GA again unreasonably slow (again)

Jarek Potiuk Mon, 08 Feb 2021 13:51:35 -0800

 This uses https://github.com/actions/runner/pull/783 to not have
un-trusted users run code (security is based on the actors of the commit -
commiter’s PRs and direct pushes  are allowed to run builds on self-hosted
runners) on our hosts, and then a combination of a Github Application, AWS
Lambda and an AWS Auto-Scaling Group


pon., 8 lut 2021, 09:58 użytkownik Antoine Pitrou <anto...@python.org>
napisał:

>
> Hi Jarek,
>
> Thank you for the document.  Could you tell us more about the "custom
> security layer" that you implemented?
>
> Regards
>
> Antoine.
>
>
> Le 08/02/2021 à 01:44, Jarek Potiuk a écrit :
> > For anyone following this thread - some update from the progress we have
> in
> > Airflow on building self-hosted infrastructure for GitHub actions.
> >
> > Ash from Airflow is really close to finalizing the work on a nice
> > auto-scaling framework for self-hosted workers, but also we checked what
> is
> > the best value for money we can get.
> >
> > I've run some analysis on the performance and tested my hypothesis (based
> > on earlier experiences) of significant  optimisations we can get.
> >
> > I've finished my analysis of potential optimizations we can get on our CI
> > with the Self-Hosted runners that Ash created. I did some performance
> > testing and (very crude) comparison of "traditional approach" with Local
> > SSDs 2 CPU instances running the tests with something I already tested
> > several times on various CI arrangements - running tests with High-Memory
> > instances (8CPU 64 GB Mem) and running everything (including docker
> engine)
> > in "tmpfs" - huge ramdisk.
> > Seems that 1h 20 minutes of test running can be decreased 8x (!)using
> this
> > approach (and parallelising some tests) at the same time decreasing the
> > cost 2x (!). Yep. You heard right. We can have faster builds this way and
> > pay less for that. Seems that we will be able to decrease the time to run
> > all tests for one combination to 10 minutes from 1h20 minutes.
> > This is possible because Ash and his team did a great job on setting up
> > auto-scaling EC2 instance runners on our Amazon EC2 account (we have
> > credits from Amazon to run those jobs - also Astronomer offered donation
> to
> > keep it running ). Seems that by utilizing it  we can not only pay less
> but
> > also get much faster builds.
> >
> > If you are interested - my document is here. Open for comments - happy to
> > add you as editors if you want (just send me your gmail address in priv).
> > It is rather crude, I had no time to put a bit more effort into it due to
> > some significant changes in my company, but it should be easy to compare
> > the values and see the actual improvements we can get. There are likely a
> > few shortcuts there and some of the numbers are "back-of-the-envelope"
> and
> > we are going to validate them even more when we implement all the
> > optimisations, but the conclusions should be pretty sound.
> >
> >
> https://docs.google.com/document/d/1ZZeZ4BYMNX7ycGRUKAXv0s6etz1g-90Onn5nRQQHOfE/edit#
> >
> > J.
> >
> >
> > On Fri, Jan 8, 2021 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >>
> >> We should be able to make an efficient query via GraphQL API right? I
> found
> >>> the REST API for actions to be a little underwhelming.
> >>
> >>
> >> That was the first thing I checked when we started looking at the stats.
> >> Unfortunately last time that I checked (and I even opened an issue for
> >> that to
> >> Github support) there was not a Github Actions GraphQL API.
> >>
> >> I got a GH support answer "Yeah we know GH API does not have
> >> GraphQL support yet, sorry". I think it has not changed since.
> >>
> >>
> >> We have tried to make our builds faster with more caching but it's not
> easy
> >>> since it's an embedded systems project we need to target a lot of
> >>> configurations and most changes impact all builds.
> >>>
> >>
> >> Indeed, I know how much of my time was spent on optimising Airflow GH
> >> usage.
> >> I think we eventually decreased the usage 10x or more. But it never
> >> helped, for a
> >> long as currently anyone even accidentally could block all the slots in
> >> almost no
> >> time at all. We have no organisation-wide way to block this and this is
> >> the problem.
> >>
> >> Right now I could:
> >> a) mine cryptocurrency using PRs to any Apache project
> >> b) block the queue for everone
> >>
> >> I do not have to be even an Apache committer to do that. It's enough if
> >> just open one PR
> >> which is well crafted and spins of 180 jobs that run for 6 hours. It's
> >> super-flawed.
> >>
> >>
> >>>
> >>> We too would like to would like to take advantage of our own runners
> but
> >>> more for the ability to do Hardware In the Loop testing but have
> avoided
> >>> it
> >>> for the reasons already mentioned.
> >>>
> >>
> >> Self-hosted runner for now seems to be the only "Reasonable" option but
> >> the security
> >> issues with the current runner are not allowing us to do it.
> >>
> >>>
> >>> --Brennan
> >>>
> >>
> >>
> >> --
> >> +48 660 796 129
> >>
> >
> >
>

Re: GA again unreasonably slow (again)

Reply via email to