Hello everyone, I'd like to propose a new feature in Airflow -- allow users to specify tasks to skip when trigger DAG run.
>From our own experience, this feature can be very useful when doing experiments, troubleshooting or re-running existing DAGs. And I believe it can benefit many Airflow users. To illustrate the use case, I am going to use this example below. task-a ☐ -> task-b ☑ -> task-c ☐ Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and "task-c", I want to trigger a manual DAG run and skip "task-b" (so I can save time & resource & focus on other two tasks). To do so, today I have two options: Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS` Option 2: Remove "task-b" from my DAG, then trigger DAG Neither of the options are great. Option 1 can be troublesome when DAG is large, and there are multiple tasks I want to skip. Option 2 requires change in the DAG file, which is not convenient for just troubleshooting. Therefore, I would love to discuss how we can provide an easy way for users to skip tasks when triggering DAG. Things to consider are: 1) We should allow user to specify all tasks to skip at once when trigger DAG 2) We should retain the dependencies between non-skip tasks (in above example, "task-c" won't start until "task-a" completes even if we skipped "task-b") 3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to make it more intuitive 4) The implementation should be easy, clean and low risk Here is my proposed solution (tested locally): Today, Airflow allow user to pass a JSON to the Dagrun as {{dag_run.conf}} when triggering DAG. The idea is, before queuing task instances that satisfies dependences, `scheduler_job.py` (after we make some change) will filter task instances to skip based on `dag_run.conf` user passes in (e.g. {"skip_tasks": ["task-b"]}), then mark them as SKIPPED. Things I would love to discuss: - What do you think about this feature? - What do you think about the proposed solution? - Did I miss anything that you want to discuss? - Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to differentiate SKIPPED? Howie