>
> [Cons]
> 1) Not scalable / Inconvenient
> To make a task skippable, one needs to modify existing DAG (to set
> `pre_execute`). It seems not difficult, but when your Airflow host a
> thousand DAGs own by different teams/users, it can be challenging.


You can use task_policy to apply (or chain, i would think) pre_execute
callables globally on your cluster.
https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html

So the skipped task needs to have a State identical to "Success", other
> than the name. Hence a new state may be needed for this feature Howie is
> proposing.
> This is also one of the reasons why the *pre_execute* method may not
> fully resolve the question, because marking the skipped task as "*Skipped*"
> may lead to something we don't expect (marking it as "*Success*" is also
> a bad practice, because it will be a confusion when user checks the history
> later).
>

Yes, our states and trigger rules might need to become more expressive.
But it could still be done from pre_execute because we could have an
`AirflowSkipSuccess` exception or something like that. (not that this is
the only reasonable solution)

*A note on states*

This touches directly on a related thread that I have been thinking about a
lot of late.

We might want to separate "states" from "resolutions".  I.e. running /
queued / scheduled / done could be thought of as "execution states" whereas
skipped / failed / success / upstream failed might be better represented as
"resolutions".  Similarly when a task is paused (e.g. deferred, up for
reschedule) we might track those reasons separately.  This came up recently
with a bug where start_date kept being updated with a reschedule-mode
sensor because, at the time when start_date is updated, the task no
longer knows it was "up for reschedule" .  It's possible that refactoring
"states" could give us better knowledge about what's going on with the task
when we have to make decisions like "should we update the start date".

Reply via email to