The preliminary description (Still lacking some recent changes and details)
is here:
https://cwiki.apache.org/confluence/display/INFRA/Self-hosted+GitHub+runners
and you can grab Ash as he mentioned in the comments if you want to get
some more details on it .

On Mon, Feb 8, 2021 at 11:01 PM Chris Lambertus <c...@apache.org> wrote:

>
>
> > On Feb 8, 2021, at 1:51 PM, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > This uses https://github.com/actions/runner/pull/783 to not have
> > un-trusted users run code (security is based on the actors of the commit
> -
> > commiter’s PRs and direct pushes  are allowed to run builds on
> self-hosted
> > runners) on our hosts, and then a combination of a Github Application,
> AWS
> > Lambda and an AWS Auto-Scaling Group
>
>
> I’d be interested in additional details on how you’ve implemented Lambda
> and AWS Auto-scaling for this.
>
> -Chris
>
>
> >
> > pon., 8 lut 2021, 09:58 użytkownik Antoine Pitrou <anto...@python.org>
> > napisał:
> >
> >>
> >> Hi Jarek,
> >>
> >> Thank you for the document.  Could you tell us more about the "custom
> >> security layer" that you implemented?
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 08/02/2021 à 01:44, Jarek Potiuk a écrit :
> >>> For anyone following this thread - some update from the progress we
> have
> >> in
> >>> Airflow on building self-hosted infrastructure for GitHub actions.
> >>>
> >>> Ash from Airflow is really close to finalizing the work on a nice
> >>> auto-scaling framework for self-hosted workers, but also we checked
> what
> >> is
> >>> the best value for money we can get.
> >>>
> >>> I've run some analysis on the performance and tested my hypothesis
> (based
> >>> on earlier experiences) of significant  optimisations we can get.
> >>>
> >>> I've finished my analysis of potential optimizations we can get on our
> CI
> >>> with the Self-Hosted runners that Ash created. I did some performance
> >>> testing and (very crude) comparison of "traditional approach" with
> Local
> >>> SSDs 2 CPU instances running the tests with something I already tested
> >>> several times on various CI arrangements - running tests with
> High-Memory
> >>> instances (8CPU 64 GB Mem) and running everything (including docker
> >> engine)
> >>> in "tmpfs" - huge ramdisk.
> >>> Seems that 1h 20 minutes of test running can be decreased 8x (!)using
> >> this
> >>> approach (and parallelising some tests) at the same time decreasing the
> >>> cost 2x (!). Yep. You heard right. We can have faster builds this way
> and
> >>> pay less for that. Seems that we will be able to decrease the time to
> run
> >>> all tests for one combination to 10 minutes from 1h20 minutes.
> >>> This is possible because Ash and his team did a great job on setting up
> >>> auto-scaling EC2 instance runners on our Amazon EC2 account (we have
> >>> credits from Amazon to run those jobs - also Astronomer offered
> donation
> >> to
> >>> keep it running ). Seems that by utilizing it  we can not only pay less
> >> but
> >>> also get much faster builds.
> >>>
> >>> If you are interested - my document is here. Open for comments - happy
> to
> >>> add you as editors if you want (just send me your gmail address in
> priv).
> >>> It is rather crude, I had no time to put a bit more effort into it due
> to
> >>> some significant changes in my company, but it should be easy to
> compare
> >>> the values and see the actual improvements we can get. There are
> likely a
> >>> few shortcuts there and some of the numbers are "back-of-the-envelope"
> >> and
> >>> we are going to validate them even more when we implement all the
> >>> optimisations, but the conclusions should be pretty sound.
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1ZZeZ4BYMNX7ycGRUKAXv0s6etz1g-90Onn5nRQQHOfE/edit#
> >>>
> >>> J.
> >>>
> >>>
> >>> On Fri, Jan 8, 2021 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>>>
> >>>> We should be able to make an efficient query via GraphQL API right? I
> >> found
> >>>>> the REST API for actions to be a little underwhelming.
> >>>>
> >>>>
> >>>> That was the first thing I checked when we started looking at the
> stats.
> >>>> Unfortunately last time that I checked (and I even opened an issue for
> >>>> that to
> >>>> Github support) there was not a Github Actions GraphQL API.
> >>>>
> >>>> I got a GH support answer "Yeah we know GH API does not have
> >>>> GraphQL support yet, sorry". I think it has not changed since.
> >>>>
> >>>>
> >>>> We have tried to make our builds faster with more caching but it's not
> >> easy
> >>>>> since it's an embedded systems project we need to target a lot of
> >>>>> configurations and most changes impact all builds.
> >>>>>
> >>>>
> >>>> Indeed, I know how much of my time was spent on optimising Airflow GH
> >>>> usage.
> >>>> I think we eventually decreased the usage 10x or more. But it never
> >>>> helped, for a
> >>>> long as currently anyone even accidentally could block all the slots
> in
> >>>> almost no
> >>>> time at all. We have no organisation-wide way to block this and this
> is
> >>>> the problem.
> >>>>
> >>>> Right now I could:
> >>>> a) mine cryptocurrency using PRs to any Apache project
> >>>> b) block the queue for everone
> >>>>
> >>>> I do not have to be even an Apache committer to do that. It's enough
> if
> >>>> just open one PR
> >>>> which is well crafted and spins of 180 jobs that run for 6 hours. It's
> >>>> super-flawed.
> >>>>
> >>>>
> >>>>>
> >>>>> We too would like to would like to take advantage of our own runners
> >> but
> >>>>> more for the ability to do Hardware In the Loop testing but have
> >> avoided
> >>>>> it
> >>>>> for the reasons already mentioned.
> >>>>>
> >>>>
> >>>> Self-hosted runner for now seems to be the only "Reasonable" option
> but
> >>>> the security
> >>>> issues with the current runner are not allowing us to do it.
> >>>>
> >>>>>
> >>>>> --Brennan
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> +48 660 796 129
> >>>>
> >>>
> >>>
> >>
>
>

-- 
+48 660 796 129

Reply via email to