Re: [DISCUSSION] Specify tasks to skip when triggering DAG

Daniel Standish Sat, 05 Feb 2022 08:07:45 -0800

>
> [Cons]
> 1) Not scalable / Inconvenient
> To make a task skippable, one needs to modify existing DAG (to set
> `pre_execute`). It seems not difficult, but when your Airflow host a
> thousand DAGs own by different teams/users, it can be challenging.



You can use task_policy to apply (or chain, i would think) pre_execute
callables globally on your cluster.
https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html

So the skipped task needs to have a State identical to "Success", other
> than the name. Hence a new state may be needed for this feature Howie is
> proposing.
> This is also one of the reasons why the *pre_execute* method may not
> fully resolve the question, because marking the skipped task as "*Skipped*"
> may lead to something we don't expect (marking it as "*Success*" is also
> a bad practice, because it will be a confusion when user checks the history
> later).
>

Yes, our states and trigger rules might need to become more expressive.
But it could still be done from pre_execute because we could have an
`AirflowSkipSuccess` exception or something like that. (not that this is
the only reasonable solution)

*A note on states*

This touches directly on a related thread that I have been thinking about a
lot of late.

We might want to separate "states" from "resolutions".  I.e. running /
queued / scheduled / done could be thought of as "execution states" whereas
skipped / failed / success / upstream failed might be better represented as
"resolutions".  Similarly when a task is paused (e.g. deferred, up for
reschedule) we might track those reasons separately.  This came up recently
with a bug where start_date kept being updated with a reschedule-mode
sensor because, at the time when start_date is updated, the task no
longer knows it was "up for reschedule" .  It's possible that refactoring
"states" could give us better knowledge about what's going on with the task
when we have to make decisions like "should we update the start date".

Re: [DISCUSSION] Specify tasks to skip when triggering DAG

Reply via email to