Cool. Thanks for the explanation :). Happy to brainstorm a bit - looks like you are well prepared. But still there are a few things left which might be difficult or maybe even impossible to handle.
I think if you do this extra hoop and indeed not use at all pull_request_target workflow then as of two weeks you could actually "some" security in place: https://github.blog/2022-03-23-github-actions-secure-self-hosted-runners-specific-workflows/ - we've been discussing it with Zeppelin as well and I started to look at it recently, but in essence, you should be able to make sure that your `pull_request_target" workflow is the only one that can use self-hosted runners and make sure no other workflows can. Otherwise you risk the problem that the users will actually modify any of the existing workflows they have access to (maybe you indeed do not need any) and make them run on self-hosted machines of yours. There are two small problems with using pull_request_target and one huge: 1) By default you get only the main version of both - the workflow and the code that is checked out. If you want to run any actions on the code of the pull request, you need to specifically check out the code of the pull request (likely in a different directory so that you don't override the scripts that you actually run in that workflow). We do it in Airflow already so it's possible. But effectively you have two copies of code that you manage - one that comes with your "pull_request_target" (from main) and one that comes from the user from pull request. 2) The second caveat of the "pull_request_targe" is that if you want to make any changes in the workflow itself, then you either "test in production" or you need to push your changes as `main` branch in your fork to test it which adds a lot of extra hoops. Those are very annoying things but we made it work in Airflow with quite extra effort. 3) The huge problem with using pull_request_target on long running machines is that it opens you up for cryptocurrency mining by bad actors. And there is very little that you can do to protect yourself "with certainty" on a living system with "static machines" - short of going the same protections that cloud vendors do for their computing instances, no more, no less (and that's really the difficult part) Actually, crypto-currency mining is a much bigger problem for "pull_request_target" - if it will at any step execute an unreviewed user's code from the PR (and I cannot imagine any non-trivial CI workflow that would not execute user code in some ways). The possibility of "cryptocurrency abuse" is not "academic" problem - this had actually happened and this is main reason why GitHub introduced "Manual Approval": https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/. After this change, GitHub Actions will prevent the "pull_request" workflows originating from "new users": all fine there. They protect their own infrastructure. But - if you run the user's code in "pull_request_target" - this protection DOES NOT WORK. Look at this image here: https://pasteboard.co/x2t7T3T3CP2k.png. When new users submit PR, the "pull_request" workflow is indeed protected, But "pull_request_target" is considered safe (as it only runs from main) and it will run immediately, without waiting for approval. So in a sense "pull_request_target" is more sensitive for cryptocurrency mining cases. If your "pull_request_target" workflow at any time will execute code coming from the user - then the "new user" can very easily sneak in something that will continue running in a background and run cryptocurrency mining without any need for approval. If you have a long running machine - this is a really bad scenario. In the case of Airflow we have "ephemeral" machines that get killed after the job is done. They are also "spot instances" so they get killed within 24 hours no matter what. Also we very, very carefully review any changes coming to pull_request_target workflow. Currently, the only code (I think!!) from user we run there is "building Docker image from Dockerfile changeable by the user" and "running python code in Docker to build packages" - both have time limit and docker containers are killed after specified time (and user cannot influence the time) - which makes it totally unattractive for currency mining (but if they escape docker, the machines also will be shutdown relatively quickly). But even with those careful reviews, we already had a few cases where even with the thorough review, I realised just a day or so later that I opened a door for the anonymous user to execute their own code on our self-hosted machines directly. For example here is a fix of such "vulnerability": https://github.com/apache/airflow/commit/197b1390ed039486512436a6457e6961aef41a94 . It looks pretty innocent, but those few lines fix the problem where anonymous/new users could open a PR and execute code and (for example) get a remote shell to our machine. If that was a "long running" one - that would give pretty much full access to mine bitcoin on our build machines and do all kinds of stuff. I personally would never run a long-running machine for CI self-hosted runners. It's far too dangerous IMHO. And really this is the main reason why the problem is that hard - because you have to make a lot of effort to make sure that bad actor's code will not escape from the workflow code into the machines of yours. If you run them continuously, I think ultimately there will always be a way to escape (either via some vulnerability or a bug in your code like in our case). This is why I think to get really "reasonably protected", you need to do both things: * ephemeral machines started on demand controlled by webhooks of github actions * docker-contained user code in short running jobs Other comments: - If you do not need write permission - then it's cool and makes even accidental slips less problematic, because the most you can do is to impact the builds of others. If you do not use artifacts of the CI elsewhere, this is secure. - Running the entire workflow in docker containers makes perfect sense. It does not give full protection but makes it considerably harder to escape J. On Thu, Apr 7, 2022 at 8:52 AM Chesnay Schepler <ches...@apache.org> wrote: > Our thinking goes along these lines: > > One of the main issues seems to be the possibility of contributors > changing the workflow files and having those workflows being run in the PR. > This alone would void several safety measures that one could set up in > the workflow itself. > The main thing missing here on githubs side is some form of (secure) > validation step that a workflow should even run. > > AFAIK you solved this by modifying the runner to check for an allow list > of the person running the workflow. > We intend to not use the pull_request event but either > * pull_request_target > * have a bot trigger workflow_dispatch after some validation (like > checking the diff for changes to certain files/directories) with the > workflow checking out the PR. > Depends a bit on whether there are any surprises in the > pull_request_target option. > > Both of these have the disadvantage of potentially exposing a write > token and secrets. > As we don't need write permissions at all we can both disable write > permissions in the UI and set a restrictive permissions map in the > workflow. > As for secrets, according to the documentation reusable workflows must > opt-in to specific secrets, so the main workflow run for PRs will not do > anything but reuse another workflow which doesn't use opt-in to any > secrets. > > On the topic of secrets, branches and PRs builds are run by different > sets of machines via labels, to prevent cases of someone breaking out of > a PR build and into another branch build. > > We intend to run the entire workflow in docker containers, which run on > a static set of machines. > Setting up ephemeral virtual machines would of course be nicer, but our > machines don't support that. > > For 3rd party actions we, as of right now, only intend to use those > provided by Github (things like checkout or artifact management). Our > pipeline isn't really complex or fancy, just expensive. > Since these actions offer pretty basic functionality there is no need to > use the latest-and-greatest stuff, so we pin them to specific commits > and quite likely don't need to ever touch them. > Chances are that if we need a new feature, by the time that happens it > has already been out in the wild for months. > > 3rd party dependencies are certainly tricky, but it doesn't seem > specific to self-hosted runners. > > On 06/04/2022 23:28, Jarek Potiuk wrote: > > So could you explain what you've done to fulfill that? > > > >> If you want to use GitHub Actions, consider using your own self-hosted > runner, but only if you can afford to build and maintain your own > self-hosted infrastructure (this is not an easy task due to security > limitations of the official GitHub Actions runners). > > What have you planned to make your infrastructure ready for the > > security challenges there ? > > > > Could you please explain your understanding of it and what you've done > > to fulfill it ? > > > > J. > > > > On Wed, Apr 6, 2022 at 10:44 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> I have very much read that. > >> > >> On 06/04/2022 19:22, Jarek Potiuk wrote: > >>> Since you referred Ash's link you probably have not read this: > >>> > >>> However this is not something to tackle lightly, as Infra *will not > manage > >>> or secure your VM* - that is up to you. > >>> > >>> > >>> On Wed, Apr 6, 2022 at 7:21 PM Chesnay Schepler <ches...@apache.org> > wrote: > >>> > >>>> This article also lists self-hosted runners as an option: > >>>> > >>>> > https://cwiki.apache.org/confluence/display/INFRA/GitHub+self-hosted+runners > >>>> > >>>> On 06/04/2022 11:56, Chesnay Schepler wrote: > >>>>>> Did you find some documentation somewhere that we might have said > >>>>> otherwise? > >>>>> > >>>>> We knew that Airflow is using them and thus thought it would be fine. > >>>>> We also had a chat with the Airflow folks and IIRC it also wasn't > >>>>> mentioned. > >>>>> > >>>>> There were several tickets where other projects requested token where > >>>>> no limitation was mentioned: > >>>>> * Arrow; token was provided: > >>>>> https://issues.apache.org/jira/browse/INFRA-19875 > >>>>> * Beam: https://issues.apache.org/jira/browse/INFRA-22840 > >>>>> * Zeppelin: https://issues.apache.org/jira/browse/INFRA-22674 > >>>>> And in fact our own latest request for 2 tokens was also granted in > >>>>> https://issues.apache.org/jira/browse/INFRA-23086. The alarm bells > >>>>> only went off when we requested more tokens. > >>>>> > >>>>> Then we have https://infra.apache.org/self-hosted-runners.html which > >>>>> states /"//Apache permits projects to use self-hosted runners [but > >>>>> does not recommend them]./ > >>>>> / > >>>>> / > >>>>> At last, we have > >>>>> > https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status > >>>>> (admittedly not an official INFRA resource, but it is linked in some > >>>>> INFRA tickets / discussions), which again lists self-hosted runners > as > >>>>> an option (while listing /caveats/)./ > >>>>> / > >>>>> / > >>>>> / > >>>>> TL;DR://There was plenty of information from which one would conclude > >>>>> that self-hosted runners are allowed, and no information to the > contrary. > >>>>> // > >>>>> > >>>>> > >>>>> On 06/04/2022 11:43, Gavin McDonald wrote: > >>>>>> Hi. > >>>>>> > >>>>>> On Wed, Apr 6, 2022 at 11:31 AM Chesnay Schepler<ches...@apache.org > > > >>>>>> wrote: > >>>>>> > >>>>>>> Hello, > >>>>>>> > >>>>>>> Inhttps://issues.apache.org/jira/browse/INFRA-23086 it was > mentioned > >>>>>>> that a security audit of self-hosted runners for github actions is > >>>>>>> being > >>>>>>> conducted at the moment, and that until this is complete no > significant > >>>>>>> number of self-hosted runners can be set up. > >>>>>>> This came as a bit of a surprise to us (the Flink project); we > >>>>>>> wanted to > >>>>>>> complete our migration to github actions within the next 2-3 weeks, > >>>>>>> which is now effectively blocked. > >>>>>>> > >>>>>> I wanted to ask about this part, why was it a surprise? > >>>>>> > >>>>>> Self Hosted Github Runners > >>>>>> has never been approved for general projects use at the moment. Did > you > >>>>>> find > >>>>>> some documentation somewhere that we might have said otherwise? > >>>>>> > >>>>>> We are still evaluating a safe and secure way in which we can deploy > >>>>>> self > >>>>>> hosted runners > >>>>>> at the ASF. Currently Airflow are the only approved project, and > we are > >>>>>> working with Beam > >>>>>> to ensure the same level of security if not better. the result of > this > >>>>>> experiment will determine > >>>>>> when we can open up self hosted runners for all projects. > >>>>>> > >>>>>> 2 to 3 weeks MIGHT be do-able but I'll let you know, still working > with > >>>>>> Beam currently. > >>>>>> > >>>>>> > >>>>>>> I wanted to ask whether there is some form of ETA on when this > audit is > >>>>>>> complete. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Chesnay > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >