Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

Brent Bovenzi Thu, 24 Apr 2025 09:57:16 -0700

Yeah, if we do a similar endpoint we should filter it to only include
unpaused Dags. We do check if the dag is paused during auto refresh in a
lot of places.


On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal
<pedro.n.l...@tecnico.ulisboa.pt.invalid> wrote:

> A 2025-04-03 19:28, Brent Bovenzi escreveu:
> > The issue is that duration is based off of start and end dates. If
> > there is
> > no end date we usually default to now. But that is misleading when a
> > dag
> > run is running but the dag is paused.
> > Let me take a look at where we use duration in the 3.0 UI and see if we
> > can
> > reduce that confusion. We don't have the "5 longest dag runs" in our
> > new
> > dashboard page, which replaces cluster activity. If we wanted that
> > feature
> > again, we should be mindful of this and filter out paused dags in the
> > API
> > request.
> >
> >
> >
> > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
> > <pedro.n.l...@tecnico.ulisboa.pt.invalid> wrote:
> >
> >> A 2025-03-31 22:26, Jens Scheffler escreveu:
> >> > Hi,
> >> >
> >> > thanks for working on the bug and raising a PR to fix it.
> >> >
> >> > As other commiters also commented I think from product view I'd expect
> >> > a
> >> > different resolution. We use the "Pause DAG" in most cases for
> >> > administrative or infrastructure problems to prevent further failures
> >> > and/or to drain infra to switch some backend.
> >> >
> >> > I assume when we pause a long-running DAG that is in-between execution
> >> > of tasks we want to really "pause" scheduling, we don't want to set it
> >> > to failed. That would also not be correct because once we un-pause the
> >> > running DAGs should continoue to work. I see no reason marking this
> >> > failed anf then manually running behind to reset the state later.
> >> >
> >> > My view on this is that as also proposed in the discussion of the bug,
> >> > we should rather filter the paused DAG from clouster activity
> reporting
> >> > such that paused DAGs are not reported with excessive runtime. Also
> >> > later if un-paused it would be "right" that the overall DAG runtime
> was
> >> > longer than normal (would not expect to deduct the paused time from
> >> > runtime of the DAG.)
> >> >
> >> > If I want (as operator/admin) to really terminate existing running
> >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for
> >> > running with paused DAG id and mark them as failed explicitly.
> >> >
> >> > Jens
> >> >
> >> > On 31.03.25 20:50, Pedro Nunes Leal wrote:
> >> >> Hello everyone,
> >> >>
> >> >> Currently, I'm trying to fix this bug:
> >> >> https://github.com/apache/airflow/issues/44443
> >> >>
> >> >> Basically, the issue is that the DAGs would be stuck on running even
> >> >> though they were paused.
> >> >> Consequently, the duration of the dag run will keep on increasing
> even
> >> >> though the DAG is paused.
> >> >>
> >> >> My proposal to solve this problem is changing the DAGs state from
> >> >> running to failed, when paused, to avoid the increment of their
> >> >> duration.
> >> >>
> >> >> Since this can be an impactful change, I would like to hear what
> >> >> others think about it.
> >> >>
> >> >> Link for the Pull Request:
> >> >> https://github.com/apache/airflow/pull/47557
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> That can be a better approach.
> >>
> >> However, if I'm not mistaken, the code related to the cluster activity
> >> page doesn't exist in Airflow 3 (the version where I'm trying to do
> >> the
> >> changes).
> >>
> >> So what should I do in this case?
> >> Is there any other way not involving cluster activity to solve this
> >> problem?
> >>
> >> The change to queued state instead of fail was my proposal at the
> >> beginning, and it really pauses the DAG.
> >> This is the type of solution I was thinking, because as I said before
> >> in
> >> the pull request, I feel that the cluster activity behavior is just a
> >> symptom from a bigger problem (the DAGs doesn't really pause, they
> >> just
> >> keep running).
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
> Hello,
>
> Any update related to the use of duration in the UI 3.0?
>
> Maybe this bug isn't really an issue if cluster activity was removed in
> the newer version, and it's just something to have in mind in case
> something similar to cluster activity is implemented in 3.0 UI.
>
>  From what I understand, the current behavior of staying on running and
> the duration increasing is what is expected from the pause
> functionality.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

Reply via email to