The preliminary description (Still lacking some recent changes and details) is here: https://cwiki.apache.org/confluence/display/INFRA/Self-hosted+GitHub+runners and you can grab Ash as he mentioned in the comments if you want to get some more details on it .
On Mon, Feb 8, 2021 at 11:01 PM Chris Lambertus <c...@apache.org> wrote: > > > > On Feb 8, 2021, at 1:51 PM, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > This uses https://github.com/actions/runner/pull/783 to not have > > un-trusted users run code (security is based on the actors of the commit > - > > commiter’s PRs and direct pushes are allowed to run builds on > self-hosted > > runners) on our hosts, and then a combination of a Github Application, > AWS > > Lambda and an AWS Auto-Scaling Group > > > I’d be interested in additional details on how you’ve implemented Lambda > and AWS Auto-scaling for this. > > -Chris > > > > > > pon., 8 lut 2021, 09:58 użytkownik Antoine Pitrou <anto...@python.org> > > napisał: > > > >> > >> Hi Jarek, > >> > >> Thank you for the document. Could you tell us more about the "custom > >> security layer" that you implemented? > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 08/02/2021 à 01:44, Jarek Potiuk a écrit : > >>> For anyone following this thread - some update from the progress we > have > >> in > >>> Airflow on building self-hosted infrastructure for GitHub actions. > >>> > >>> Ash from Airflow is really close to finalizing the work on a nice > >>> auto-scaling framework for self-hosted workers, but also we checked > what > >> is > >>> the best value for money we can get. > >>> > >>> I've run some analysis on the performance and tested my hypothesis > (based > >>> on earlier experiences) of significant optimisations we can get. > >>> > >>> I've finished my analysis of potential optimizations we can get on our > CI > >>> with the Self-Hosted runners that Ash created. I did some performance > >>> testing and (very crude) comparison of "traditional approach" with > Local > >>> SSDs 2 CPU instances running the tests with something I already tested > >>> several times on various CI arrangements - running tests with > High-Memory > >>> instances (8CPU 64 GB Mem) and running everything (including docker > >> engine) > >>> in "tmpfs" - huge ramdisk. > >>> Seems that 1h 20 minutes of test running can be decreased 8x (!)using > >> this > >>> approach (and parallelising some tests) at the same time decreasing the > >>> cost 2x (!). Yep. You heard right. We can have faster builds this way > and > >>> pay less for that. Seems that we will be able to decrease the time to > run > >>> all tests for one combination to 10 minutes from 1h20 minutes. > >>> This is possible because Ash and his team did a great job on setting up > >>> auto-scaling EC2 instance runners on our Amazon EC2 account (we have > >>> credits from Amazon to run those jobs - also Astronomer offered > donation > >> to > >>> keep it running ). Seems that by utilizing it we can not only pay less > >> but > >>> also get much faster builds. > >>> > >>> If you are interested - my document is here. Open for comments - happy > to > >>> add you as editors if you want (just send me your gmail address in > priv). > >>> It is rather crude, I had no time to put a bit more effort into it due > to > >>> some significant changes in my company, but it should be easy to > compare > >>> the values and see the actual improvements we can get. There are > likely a > >>> few shortcuts there and some of the numbers are "back-of-the-envelope" > >> and > >>> we are going to validate them even more when we implement all the > >>> optimisations, but the conclusions should be pretty sound. > >>> > >>> > >> > https://docs.google.com/document/d/1ZZeZ4BYMNX7ycGRUKAXv0s6etz1g-90Onn5nRQQHOfE/edit# > >>> > >>> J. > >>> > >>> > >>> On Fri, Jan 8, 2021 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >>> > >>>> > >>>> We should be able to make an efficient query via GraphQL API right? I > >> found > >>>>> the REST API for actions to be a little underwhelming. > >>>> > >>>> > >>>> That was the first thing I checked when we started looking at the > stats. > >>>> Unfortunately last time that I checked (and I even opened an issue for > >>>> that to > >>>> Github support) there was not a Github Actions GraphQL API. > >>>> > >>>> I got a GH support answer "Yeah we know GH API does not have > >>>> GraphQL support yet, sorry". I think it has not changed since. > >>>> > >>>> > >>>> We have tried to make our builds faster with more caching but it's not > >> easy > >>>>> since it's an embedded systems project we need to target a lot of > >>>>> configurations and most changes impact all builds. > >>>>> > >>>> > >>>> Indeed, I know how much of my time was spent on optimising Airflow GH > >>>> usage. > >>>> I think we eventually decreased the usage 10x or more. But it never > >>>> helped, for a > >>>> long as currently anyone even accidentally could block all the slots > in > >>>> almost no > >>>> time at all. We have no organisation-wide way to block this and this > is > >>>> the problem. > >>>> > >>>> Right now I could: > >>>> a) mine cryptocurrency using PRs to any Apache project > >>>> b) block the queue for everone > >>>> > >>>> I do not have to be even an Apache committer to do that. It's enough > if > >>>> just open one PR > >>>> which is well crafted and spins of 180 jobs that run for 6 hours. It's > >>>> super-flawed. > >>>> > >>>> > >>>>> > >>>>> We too would like to would like to take advantage of our own runners > >> but > >>>>> more for the ability to do Hardware In the Loop testing but have > >> avoided > >>>>> it > >>>>> for the reasons already mentioned. > >>>>> > >>>> > >>>> Self-hosted runner for now seems to be the only "Reasonable" option > but > >>>> the security > >>>> issues with the current runner are not allowing us to do it. > >>>> > >>>>> > >>>>> --Brennan > >>>>> > >>>> > >>>> > >>>> -- > >>>> +48 660 796 129 > >>>> > >>> > >>> > >> > > -- +48 660 796 129