> > [Cons] > 1) Not scalable / Inconvenient > To make a task skippable, one needs to modify existing DAG (to set > `pre_execute`). It seems not difficult, but when your Airflow host a > thousand DAGs own by different teams/users, it can be challenging.
You can use task_policy to apply (or chain, i would think) pre_execute callables globally on your cluster. https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html So the skipped task needs to have a State identical to "Success", other > than the name. Hence a new state may be needed for this feature Howie is > proposing. > This is also one of the reasons why the *pre_execute* method may not > fully resolve the question, because marking the skipped task as "*Skipped*" > may lead to something we don't expect (marking it as "*Success*" is also > a bad practice, because it will be a confusion when user checks the history > later). > Yes, our states and trigger rules might need to become more expressive. But it could still be done from pre_execute because we could have an `AirflowSkipSuccess` exception or something like that. (not that this is the only reasonable solution) *A note on states* This touches directly on a related thread that I have been thinking about a lot of late. We might want to separate "states" from "resolutions". I.e. running / queued / scheduled / done could be thought of as "execution states" whereas skipped / failed / success / upstream failed might be better represented as "resolutions". Similarly when a task is paused (e.g. deferred, up for reschedule) we might track those reasons separately. This came up recently with a bug where start_date kept being updated with a reschedule-mode sensor because, at the time when start_date is updated, the task no longer knows it was "up for reschedule" . It's possible that refactoring "states" could give us better knowledge about what's going on with the task when we have to make decisions like "should we update the start date".