Hello Vladimir, all, Happy to share our experiences and thoughts. Better late than never - but we discussed that in the apache-airflow private group before and we've implemented a number of protections in place so happy to share our learnings and practices.
> Gregg, > Do you have pointers that clarify how actions can modify Apache repositories? > I strongly believe that Actions are read-only by default. This is only for actions that are executed in the context of PRs coming from forks. In this case, access is read-only and indeed it is 'safe' for Apache https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#permissions-for-the-github_token In the Apache projects, everything we do is public by default, so READ access to any artifacts/images, etc should be perfectly ok. Any workflows that are run within the repository (direct pushes, PRs inside the repo, but also special kind of events like 'workflow_run` 'workflow_dispatch' 'scheduled` - they all run with a "READ/WRITE" access token that has pretty much "write all" capabilities. Those workflows however only run using the code that has been already approved or pushed by one of the maintainers, so there is no risk of "anyone in the world" running their unreviewed code there. Side comment: it would be disastrous if PRs would have any way of getting WRITE access. PR can also modify the workflow itself therefore anyone in the world can do anything during the build and you can't prevent it. This is one of the reasons why we cannot yet enable self-hosted runners for Public Repositories to reduce the strain on the 180 slot queue we have for all Apache projects. Here is an explanation from GitHub why you should not use it for public repositories, https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories we are discussing with Github Actions a solution to that (Ash the PMC of Airflow even created a PR that addresses it https://github.com/actions/runner/pull/783) - we had a planned meeting with the GitHub account for Apache and I hope it will happen in January). > AFAIK the only way GitHub Action can modify the repository is when the user provides credentials. > Of course, if somebody generates a personal access token and commits it to a public repository, > then anyone can use it. Nope. Github Action can use the GITHUB_TOKEN to perform write operations to anything in the repo. We are extensively using that in Apache Airflow to (mostly to optimize our CI): * push changes to special orphaned branches in our repo - you can see an example of such pushed change here https://github.com/apache/airflow/commit/a5bc781b8bcbe5135f82d9f11531076458064571 * push images to GitHub registry (we use it to build images once and re-use it in multiple jobs (you can see it for example here https://github.com/apache/airflow/runs/1621175344?check_suite_focus=true) * cancel running workflows (we use it to cancel duplicated workflows and optimize GA job queues) there is a whole action I wrote to do just that: https://github.com/marketplace/actions/cancel-workflow-runs (now the action is copied to https://github.com/apache/airflow-cancel-workflow-runs so that we can use it according to the new policy) * add new checks to running PR (in-progress/complete/error state for the PR)- for example, this check: https://github.com/apache/airflow/pull/13346/checks?check_run_id=1618496728 was added by action that we run in 'workflow_run' type of event for the PR (it is safe to run per-PR it because 'wokflow_run' only uses a main branch version of the workflow) * make comments in PRs - for example we use it to analyze if the PR needs a full set of tests or only a limited set ( https://github.com/apache/airflow/pull/13257#issuecomment-749620034) * label the PRs depending on the approval status and scope of the changes Basically, Actions can do pretty much anything that is specified in this table https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#permissions-for-the-github_token > > However, by default GitHub Action has no write access to the repository This is only true for PRs coming for forks. > > GitHub generates a temporary token for each execution (it is called GITHUB_TOKEN), however, > it is NOT available for actions automatically, and it must be mentioned in *.yml file in order to be used. Yes. Correct. And all the 3rd-parties that we moved REQUIRE this. This means that you have to put ${ secrets.GITHUB_TOKEN } in your workflow to use those actions: * https://github.com/potiuk/cancel-workflow-runs * https://github.com/potiuk/get-workflow-origin * https://github.com/marketplace/actions/github-checks * https://github.com/TobKed/label-when-approved-action * https://github.com/JamesIves/github-pages-deploy-action Basically, any non-trivial action will likely have the requirement to add GITHUB_TOKEN. If you use those actions, you have no choice - you have to add the token. And the token gives "all" access. Currently, there is no way to have 'scoped' tokens. When the action has access to the token, it can do ANYTHING. There are few exceptions - for example, branch protection might prevent the action from modifying the master branch but there are ways around it - for example such action could modify an existing approved PR (if it is made inside the repo) and merge it. No problem with that. Now we are getting to the bottom of the problem. The problem is that author of the action can modify the action without the knowledge of the user. And an action that yesterday just made comments, can today push commits to your repo and you will not even know about this change. In most cases people refer to a particular version of the action by its version: uses: potiuk/cancel-workflow-runs@4_7 The problem with it is that '4_7' is a tag. And the tag can be moved at any time. So yesterday, the action pointed to commit A but today it can point to commit B and you as a user will not even know that it happened. Yeah. that means that a scheduled workflow of yours that yesterday just made some comments on your PR might today modify your code without you as maintainer making any change to it and without you noticing. This is exactly what Greg is writing about. That's' why in Airflow we introduced the rule that all 'untrusted' 3rd-party actions are always referred to by a full COMMIT_HASH, not version and that we always review the code of those actions when we change it. This follows directly advise of GitHub where they explained it - why and how's and also they explained why it has to be the full commit hash not the short version of it: https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions : Quote from that document: | This means that a compromise of a single action within a workflow can be very significant, as that compromised action would have access to all secrets configured on your repository, and can use the | GITHUB_TOKEN to write to the repository. Consequently, there is significant risk in sourcing actions from third-party repositories on GitHub. You can help mitigate this risk by following these good practices: This is what we do in Airflow now: uses: apache/airflow-cancel-workflow-runs@953e057dc81d3458935a18d1184c386b0f6b5738 # v4_7 But those practices are very difficult to enforce, remember about them and review - especially on 'Apache Security/infra' level. Even this Sunday I found out that we used two actions with tags (they were in Amazon (with GITHUB_TOKEN) and Codecov (without GITHUB_TOKEN) domains. They are rather safe but we should have used tags there, however it slipped under the radar. So I perfectly understand why Apache Infra did what they did - they have no mechanisms to enforce best practices. They have to enforce that all actions are within the organisation + there is an 'allowed list' of organizations that are also OK. I hope you will find that explanation useful J. -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660 796 129