Hello Vladimir, all,

Happy to share our experiences and thoughts. Better late than never - but
we discussed that in the apache-airflow private group before and we've
implemented a number of protections in place so happy
to share our learnings and practices.

> Gregg,
> Do you have pointers that clarify how actions can modify Apache
repositories?
> I strongly believe that Actions are read-only by default.


This is only for actions that are executed in the context of PRs coming
from forks. In this case, access is read-only and indeed it is 'safe'  for
Apache
https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#permissions-for-the-github_token
In the Apache projects, everything we do is public by default, so READ
access to any artifacts/images, etc should be perfectly ok.

Any workflows that are run within the repository (direct pushes, PRs inside
the repo, but also special kind of events like 'workflow_run`
'workflow_dispatch'
'scheduled` - they all run with a "READ/WRITE" access token that has pretty
much "write all" capabilities. Those workflows however only run using the
code
that has been already approved or pushed by one of the maintainers, so
there is no risk of "anyone in the world" running their unreviewed code
there.

Side comment: it would be disastrous if PRs would have any way of getting
WRITE access. PR can also modify the workflow itself therefore anyone in
the world can
do anything during the build and you can't prevent it. This is one of the
reasons why we cannot yet enable self-hosted runners for Public
Repositories to reduce the
strain on the 180 slot queue we have for all Apache projects. Here is an
explanation from GitHub why you should not use it for public repositories,
https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
we are discussing with Github Actions a solution to that (Ash the PMC of
Airflow even created a PR that addresses it
https://github.com/actions/runner/pull/783) -
we had a planned meeting with the GitHub account for Apache and I hope it
will happen in January).

> AFAIK the only way GitHub Action can modify the repository is when the
user provides credentials.
> Of course, if somebody generates a personal access token and commits it
to a public repository,
> then anyone can use it.


Nope.

Github Action can use the GITHUB_TOKEN to perform write operations to
anything in the repo. We are extensively using that in Apache Airflow to
(mostly to optimize our CI):
* push changes to special orphaned branches in our repo - you can see an
example of such pushed change here
https://github.com/apache/airflow/commit/a5bc781b8bcbe5135f82d9f11531076458064571
* push images to GitHub registry (we use it to build images once and re-use
it in multiple jobs (you can see it for example here
https://github.com/apache/airflow/runs/1621175344?check_suite_focus=true)
* cancel running workflows (we use it to cancel duplicated workflows and
optimize GA job queues) there is a whole action I wrote to do just that:
https://github.com/marketplace/actions/cancel-workflow-runs
  (now the action is copied to
https://github.com/apache/airflow-cancel-workflow-runs so that we can use
it according to the new policy)
* add new checks to running PR (in-progress/complete/error state for the
PR)- for example, this check:
https://github.com/apache/airflow/pull/13346/checks?check_run_id=1618496728
was added by action
  that we run in 'workflow_run' type of event for the PR (it is safe to run
per-PR it because 'wokflow_run' only uses a main branch version of the
workflow)
* make comments in PRs - for example we use it to analyze if the PR needs a
full set of tests or only a limited set (
https://github.com/apache/airflow/pull/13257#issuecomment-749620034)
* label the PRs depending on the approval status and scope of the changes

Basically, Actions can do pretty much anything that is specified in this
table
https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#permissions-for-the-github_token

>
> However, by default GitHub Action has no write access to the repository

This is only true for PRs coming for forks.

>
> GitHub generates a temporary token for each execution (it is called
GITHUB_TOKEN), however,
> it is NOT available for actions automatically, and it must be mentioned
in *.yml file in order to be used.
Yes. Correct. And all the 3rd-parties that we moved REQUIRE this. This
means that you have to put ${ secrets.GITHUB_TOKEN } in your workflow to
use those actions:

* https://github.com/potiuk/cancel-workflow-runs
* https://github.com/potiuk/get-workflow-origin
* https://github.com/marketplace/actions/github-checks
* https://github.com/TobKed/label-when-approved-action
* https://github.com/JamesIves/github-pages-deploy-action

Basically, any non-trivial action will likely have the requirement to add
GITHUB_TOKEN. If you use those actions, you have no choice - you have to
add the token. And the token gives "all" access. Currently, there
is no way to have 'scoped' tokens.  When the action has access to the
token, it can do ANYTHING. There are few exceptions - for example, branch
protection might prevent the action from modifying the master branch
but there are ways around it - for example such action could modify an
existing approved PR (if it is made inside the repo) and merge it. No
problem with that.

Now we are getting to the bottom of the problem. The problem is that author
of the action can modify the action without the knowledge of the user. And
an action that yesterday just made comments, can today
push commits to your repo and you will not even know about this change. In
most cases people refer to a particular version of the action by its
version:

uses: potiuk/cancel-workflow-runs@4_7

The problem with it is that '4_7' is a tag. And the tag can be moved at any
time. So yesterday, the action pointed to commit A but today it can point
to commit B and you as a user will not even know that it happened.
Yeah. that means that a scheduled workflow of yours that yesterday just
made some comments on your PR might today modify your code without you as
maintainer making any change to it and without you noticing.

This is exactly what Greg is writing about. That's' why in Airflow we
introduced the rule that all 'untrusted' 3rd-party actions are always
referred to by a full COMMIT_HASH, not version and that we always review
the code of those actions when we change it. This follows directly advise
of GitHub where they explained it - why and how's and also they explained
why it has to be the full commit hash not the short version of it:
https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions
:

Quote from that document:

| This means that a compromise of a single action within a workflow can be
very significant, as that compromised action would have access to all
secrets configured on your repository, and can use the
| GITHUB_TOKEN to write to the repository. Consequently, there is
significant risk in sourcing actions from third-party repositories on
GitHub. You can help mitigate this risk by following these good practices:

This is what we do in Airflow now:

uses:
apache/airflow-cancel-workflow-runs@953e057dc81d3458935a18d1184c386b0f6b5738
 # v4_7

But those practices are very difficult to enforce, remember about them and
review - especially on 'Apache Security/infra' level. Even this Sunday I
found out that we used two actions with tags (they were in
Amazon (with GITHUB_TOKEN) and Codecov (without GITHUB_TOKEN) domains. They
are rather safe but we should have used tags there, however it slipped
under the radar.

So I perfectly understand why Apache Infra did what they did - they have no
mechanisms to enforce best practices. They have to enforce that all actions
are within the organisation +
there is an 'allowed list' of organizations that are also OK.

I hope you will find that explanation useful

J.



-- 

Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Reply via email to