Our thinking goes along these lines:

One of the main issues seems to be the possibility of contributors changing the workflow files and having those workflows being run in the PR. This alone would void several safety measures that one could set up in the workflow itself. The main thing missing here on githubs side is some form of (secure) validation step that a workflow should even run.

AFAIK you solved this by modifying the runner to check for an allow list of the person running the workflow.
We intend to not use the pull_request event but either
* pull_request_target
* have a bot trigger workflow_dispatch after some validation (like checking the diff for changes to certain files/directories) with the workflow checking out the PR. Depends a bit on whether there are any surprises in the pull_request_target option.

Both of these have the disadvantage of potentially exposing a write token and secrets. As we don't need write permissions at all we can both disable write permissions in the UI and set a restrictive permissions map in the workflow. As for secrets, according to the documentation reusable workflows must opt-in to specific secrets, so the main workflow run for PRs will not do anything but reuse another workflow which doesn't use opt-in to any secrets.

On the topic of secrets, branches and PRs builds are run by different sets of machines via labels, to prevent cases of someone breaking out of a PR build and into another branch build.

We intend to run the entire workflow in docker containers, which run on a static set of machines. Setting up ephemeral virtual machines would of course be nicer, but our machines don't support that.

For 3rd party actions we, as of right now, only intend to use those provided by Github (things like checkout or artifact management). Our pipeline isn't really complex or fancy, just expensive. Since these actions offer pretty basic functionality there is no need to use the latest-and-greatest stuff, so we pin them to specific commits and quite likely don't need to ever touch them. Chances are that if we need a new feature, by the time that happens it has already been out in the wild for months.

3rd party dependencies are certainly tricky, but it doesn't seem specific to self-hosted runners.

On 06/04/2022 23:28, Jarek Potiuk wrote:
So could you explain what you've done to fulfill that?

If you want to use GitHub Actions, consider using your own self-hosted runner, 
but only if you can afford to build and maintain your own self-hosted 
infrastructure (this is not an easy task due to security limitations of the 
official GitHub Actions runners).
What have you planned to make your infrastructure ready for the
security challenges there ?

Could you please explain your understanding of it and what you've done
to fulfill it ?

J.

On Wed, Apr 6, 2022 at 10:44 PM Chesnay Schepler <ches...@apache.org> wrote:
I have very much read that.

On 06/04/2022 19:22, Jarek Potiuk wrote:
Since you referred Ash's link you probably have not read this:

   However this is not something to tackle lightly, as Infra *will not manage
or secure your VM* - that is up to you.


On Wed, Apr 6, 2022 at 7:21 PM Chesnay Schepler <ches...@apache.org> wrote:

This article also lists self-hosted runners as an option:

https://cwiki.apache.org/confluence/display/INFRA/GitHub+self-hosted+runners

On 06/04/2022 11:56, Chesnay Schepler wrote:
Did you find some documentation somewhere that we might have said
otherwise?

We knew that Airflow is using them and thus thought it would be fine.
We also had a chat with the Airflow folks and IIRC it also wasn't
mentioned.

There were several tickets where other projects requested token where
no limitation was mentioned:
* Arrow; token was provided:
https://issues.apache.org/jira/browse/INFRA-19875
* Beam: https://issues.apache.org/jira/browse/INFRA-22840
* Zeppelin: https://issues.apache.org/jira/browse/INFRA-22674
And in fact our own latest request for 2 tokens was also granted in
https://issues.apache.org/jira/browse/INFRA-23086. The alarm bells
only went off when we requested more tokens.

Then we have https://infra.apache.org/self-hosted-runners.html which
states /"//Apache permits projects to use self-hosted runners [but
does not recommend them]./
/
/
At last, we have
https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status
(admittedly not an official INFRA resource, but it is linked in some
INFRA tickets / discussions), which again lists self-hosted runners as
an option (while listing /caveats/)./
/
/
/
TL;DR://There was plenty of information from which one would conclude
that self-hosted runners are allowed, and no information to the contrary.
//


On 06/04/2022 11:43, Gavin McDonald wrote:
Hi.

On Wed, Apr 6, 2022 at 11:31 AM Chesnay Schepler<ches...@apache.org>
wrote:

Hello,

Inhttps://issues.apache.org/jira/browse/INFRA-23086  it was mentioned
that a security audit of self-hosted runners for github actions is
being
conducted at the moment, and that until this is complete no significant
number of self-hosted runners can be set up.
This came as a bit of a surprise to us (the Flink project); we
wanted to
complete our migration to github actions within the next 2-3 weeks,
which is now effectively blocked.

I wanted to ask about this part, why was it a surprise?

Self Hosted Github Runners
has never been approved for general projects use at the moment. Did you
find
some documentation somewhere that we might have said otherwise?

We are still evaluating a safe and secure way in which we can deploy
self
hosted runners
at the  ASF. Currently Airflow are the only approved project, and we are
working with Beam
to ensure the same level of security if not better. the result of this
experiment will determine
when we can open up self hosted runners for all projects.

2 to 3 weeks MIGHT be do-able but I'll let you know, still working with
Beam currently.


I wanted to ask whether there is some form of ETA on when this audit is
complete.

Regards,
Chesnay





Reply via email to