Yeah, if we do a similar endpoint we should filter it to only include unpaused Dags. We do check if the dag is paused during auto refresh in a lot of places.
On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal <pedro.n.l...@tecnico.ulisboa.pt.invalid> wrote: > A 2025-04-03 19:28, Brent Bovenzi escreveu: > > The issue is that duration is based off of start and end dates. If > > there is > > no end date we usually default to now. But that is misleading when a > > dag > > run is running but the dag is paused. > > Let me take a look at where we use duration in the 3.0 UI and see if we > > can > > reduce that confusion. We don't have the "5 longest dag runs" in our > > new > > dashboard page, which replaces cluster activity. If we wanted that > > feature > > again, we should be mindful of this and filter out paused dags in the > > API > > request. > > > > > > > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal > > <pedro.n.l...@tecnico.ulisboa.pt.invalid> wrote: > > > >> A 2025-03-31 22:26, Jens Scheffler escreveu: > >> > Hi, > >> > > >> > thanks for working on the bug and raising a PR to fix it. > >> > > >> > As other commiters also commented I think from product view I'd expect > >> > a > >> > different resolution. We use the "Pause DAG" in most cases for > >> > administrative or infrastructure problems to prevent further failures > >> > and/or to drain infra to switch some backend. > >> > > >> > I assume when we pause a long-running DAG that is in-between execution > >> > of tasks we want to really "pause" scheduling, we don't want to set it > >> > to failed. That would also not be correct because once we un-pause the > >> > running DAGs should continoue to work. I see no reason marking this > >> > failed anf then manually running behind to reset the state later. > >> > > >> > My view on this is that as also proposed in the discussion of the bug, > >> > we should rather filter the paused DAG from clouster activity > reporting > >> > such that paused DAGs are not reported with excessive runtime. Also > >> > later if un-paused it would be "right" that the overall DAG runtime > was > >> > longer than normal (would not expect to deduct the paused time from > >> > runtime of the DAG.) > >> > > >> > If I want (as operator/admin) to really terminate existing running > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for > >> > running with paused DAG id and mark them as failed explicitly. > >> > > >> > Jens > >> > > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote: > >> >> Hello everyone, > >> >> > >> >> Currently, I'm trying to fix this bug: > >> >> https://github.com/apache/airflow/issues/44443 > >> >> > >> >> Basically, the issue is that the DAGs would be stuck on running even > >> >> though they were paused. > >> >> Consequently, the duration of the dag run will keep on increasing > even > >> >> though the DAG is paused. > >> >> > >> >> My proposal to solve this problem is changing the DAGs state from > >> >> running to failed, when paused, to avoid the increment of their > >> >> duration. > >> >> > >> >> Since this can be an impactful change, I would like to hear what > >> >> others think about it. > >> >> > >> >> Link for the Pull Request: > >> >> https://github.com/apache/airflow/pull/47557 > >> >> > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> >> > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> > For additional commands, e-mail: dev-h...@airflow.apache.org > >> That can be a better approach. > >> > >> However, if I'm not mistaken, the code related to the cluster activity > >> page doesn't exist in Airflow 3 (the version where I'm trying to do > >> the > >> changes). > >> > >> So what should I do in this case? > >> Is there any other way not involving cluster activity to solve this > >> problem? > >> > >> The change to queued state instead of fail was my proposal at the > >> beginning, and it really pauses the DAG. > >> This is the type of solution I was thinking, because as I said before > >> in > >> the pull request, I feel that the cluster activity behavior is just a > >> symptom from a bigger problem (the DAGs doesn't really pause, they > >> just > >> keep running). > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > >> > Hello, > > Any update related to the use of duration in the UI 3.0? > > Maybe this bug isn't really an issue if cluster activity was removed in > the newer version, and it's just something to have in mind in case > something similar to cluster activity is implemented in 3.0 UI. > > From what I understand, the current behavior of staying on running and > the duration increasing is what is expected from the pause > functionality. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >