Hello everyone,

I'd like to propose a new feature in Airflow -- allow users to specify
tasks to skip when trigger DAG run.

>From our own experience, this feature can be very useful when doing
experiments, troubleshooting or re-running existing DAGs. And I
believe it can benefit many Airflow users.

To illustrate the use case, I am going to use this example below.
task-a ☐ -> task-b ☑ -> task-c ☐

Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and
"task-c", I want to trigger a manual DAG run and skip "task-b" (so I
can save time & resource & focus on other two tasks). To do so, today
I have two options:

Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS`
Option 2: Remove "task-b" from my DAG, then trigger DAG

Neither of the options are great. Option 1 can be troublesome when DAG
is large, and there are multiple tasks I want to skip. Option 2
requires change in the DAG file, which is not convenient for just
troubleshooting.

Therefore, I would love to discuss how we can provide an easy way for
users to skip tasks when triggering DAG.

Things to consider are:
1) We should allow user to specify all tasks to skip at once when trigger DAG
2) We should retain the dependencies between non-skip tasks (in above
example, "task-c" won't start until "task-a" completes even if we
skipped "task-b")
3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to
make it more intuitive
4) The implementation should be easy, clean and low risk

Here is my proposed solution (tested locally):
Today, Airflow allow user to pass a JSON to the Dagrun as
{{dag_run.conf}} when triggering DAG. The idea is, before queuing task
instances that satisfies dependences, `scheduler_job.py` (after we
make some change) will filter task instances to skip based on
`dag_run.conf` user passes in (e.g. {"skip_tasks": ["task-b"]}), then
mark them as SKIPPED.

Things I would love to discuss:
- What do you think about this feature?
- What do you think about the proposed solution?
- Did I miss anything that you want to discuss?
- Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to
differentiate SKIPPED?

Howie

Reply via email to