It's an interesting discussion and I remember struggling with this when I
started working with Airflow.
But, I also agree with the viewpoint of it being an established concept now
regardless of the origin.

I am personally leaning towards the perspective best expressed by Daniel
Standish and Brent of using Dag as a word, rather than the computer science
concept.

Best regards,
Vikram


On Tue, Oct 22, 2024 at 9:46 AM Oliveira, Niko <oniko...@amazon.com.invalid>
wrote:

> I agree with the general sentiment of: You're right Ryan, DAG isn't great
> and I'd rather workflow, but changing it will cause much more wreckage than
> it solves.
>
> Also agree with the idea to just move away from defining DAG. I think
> we've been naturally doing that as a community for a while now anyway, so
> that feels like a natural step.
>
> Cheers,
> Niko
>
> ________________________________
> From: Ash Berlin-Taylor <a...@apache.org>
> Sent: Tuesday, October 22, 2024 9:06:39 AM
> To: dev@airflow.apache.org
> Subject: RE: [EXT] Airflow should deprecate the term "DAG" for end users
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Best argument in favour of keeping “dags” as a term — getting to re-use
> puns like https://i.imgflip.com/1xhtwh.jpg
>
> In all seriousness: I don’t mind either way, both sides have good reasons
> presented.
>
> -a
>
> > On 22 Oct 2024, at 17:03, Daniel Standish
> <daniel.stand...@astronomer.io.INVALID> wrote:
> >
> > Yeah just say, when asked where the name comes from, "well, no one
> actually
> > knows but..." and then make something up.
> >
> > On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >> Just to clarify - "directed acyclic graph" is the tongue-twister,
> >>
> >> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >>> I like what both Daniel and Brent wrote. I would very much want to be
> >> able
> >>> to say just "dag" without explaining it further.
> >>>
> >>> For me every time I explain "DAG" at a talk it's a tongue-twister, and
> I
> >>> almost stutter on trying to recall how to pronounce it properly.
> >>>
> >>> J.
> >>>
> >>>
> >>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi
> >> <br...@astronomer.io.invalid>
> >>> wrote:
> >>>
> >>>> I remember we explored renaming "DAG" when starting on AIP-38 to
> >> modernize
> >>>> the UI. Both "pipeline" or "workflow" are more descriptive of what one
> >> is
> >>>> actually doing while Directed Acyclic Graph is an implementation
> detail.
> >>>> But I agree with Daniel Standish, at this point "DAG" has become "dag"
> >> , a
> >>>> word in its own right.
> >>>>
> >>>> Examples for "dag" are abound in community discussion, Airflow Summit
> >>>> talks, documentation and even in the UI. Let's embrace "dag". A user
> >> just
> >>>> needs to learn one new word vs the technical concept behind that
> word. I
> >>>> think that is much less effort than refactoring so much code,
> >>>> documentation, blog posts, stack overflow questions, etc.
> >>>>
> >>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish
> >>>> <daniel.stand...@astronomer.io.invalid> wrote:
> >>>>
> >>>>> I am skeptical.  Seems like introducing a lot of pain for
> questionable
> >>>>> benefit.  But, I am def sympathetic to the idea.  I agree the
> >>>> association
> >>>>> with "directed acyclic graph" is not helpful.
> >>>>>
> >>>>> And along those lines, I offer here some less invasive mitigations.
> >>>>>
> >>>>> One thing we can do no matter what is to de-emphasize the math nerd
> >>>> origins
> >>>>> of the name.  That is to say, in docs / website / etc, *never define*
> >>>>> airflow's "dag" concept as a directed acyclic graph.  Always define
> it
> >>>> as a
> >>>>> pipeline, collection of tasks, workflow etc.
> >>>>>
> >>>>> The "directed acyclic graph" part of it is like a historical
> footnote,
> >>>> and
> >>>>> we could make one mention of it somewhere hidden.
> >>>>>
> >>>>> We could also start using lowercase in the docs in general e.g.
> >> writing
> >>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc.  The upper case
> >>>> part
> >>>>> of it makes it look like an acronym; but "dag" in airlfow is just an
> >>>>> airflow concept and the association with "DAGs" is not really
> >> unhelpful.
> >>>>>
> >>>>> In other words embrace that "dag" in airflow is its own thing, is
> >>>>> *not* strictly
> >>>>> speaking a directed acyclic graph (which nobody knows about anyway),
> >> and
> >>>>> tell them what it is in simple terms that normal people understand.
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com>
> >> wrote:
> >>>>>
> >>>>>> DAG is so embedded into what we do that it will be extremely
> >>>> difficult to
> >>>>>> get rid of it completely. Also I think it will make a lot of
> >> "google"
> >>>>>> searches and "stack overflow" searches not finding the right
> >> answers.
> >>>>> This
> >>>>>> is one of the strengths of Airflow - besides the community and ideas
> >>>> that
> >>>>>> Bernd mentioned - is the vast number of examples, problems and
> >>>> solutions
> >>>>>> you can so easily find (and we have to remember that all the AI
> >>>> trained
> >>>>> on
> >>>>>> past data will be also rather poorly matching queries of people.
> >>>>>>
> >>>>>> I am not too attached to DAG. I could easily switch. And if we do -
> >> I
> >>>>>> would be for using workflow or pipeline instead of `dag` if not the
> >>>> above
> >>>>>> reason, but I think I am here with Igor that it might cause more
> >>>> problems
> >>>>>> than it solves.
> >>>>>>
> >>>>>> But I am not 100% against - if others will think it's a good idea, I
> >>>> am
> >>>>> ok
> >>>>>> with it.
> >>>>>>
> >>>>>> J,
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat
> >>>>>> <abhishek.bha...@astronomer.io.invalid> wrote:
> >>>>>>
> >>>>>>> Agreed that the word DAG makes very less sense to someone new to
> >>>>> workflow
> >>>>>>> orchestration. But it does also show the nature of being acyclic.
> >>>> Sure,
> >>>>>> as
> >>>>>>> Bas mentioned, there are ways to workaround it. Still, in my
> >>>> opinion,
> >>>>>> there
> >>>>>>> is generally no need for cyclic behavior in workflow
> >> orchestration.
> >>>>> Most
> >>>>>>> (*if
> >>>>>>> not all*) cases can be in some way can be covered using an acyclic
> >>>>> manner
> >>>>>>> with multiple runs. Hence, the idempotency. So I would want the
> >>>>> "acyclic"
> >>>>>>> word to stick.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Avi
> >>>>>>>
> >>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de>
> >> wrote:
> >>>>>>>
> >>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new
> >>>>> ideas.
> >>>>>>>>
> >>>>>>>> The Term DAG is misleading; it should be replaced by the more
> >>>> general
> >>>>>>> Term
> >>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN)
> >> (maybe
> >>>>>>> without
> >>>>>>>> a direction);
> >>>>>>>> and ... these Graphs should be stored in a Graph Database.
> >>>>>>>>
> >>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be
> >>>> assigned
> >>>>> to
> >>>>>> an
> >>>>>>>> executable (Python-, Rust-, ... ) member of a library.
> >>>>>>>>
> >>>>>>>> A running Graph might have a different structure than a
> >>>> configuration
> >>>>>>>> Graph.
> >>>>>>>>
> >>>>>>>> Forget that if you think it's bullshit.
> >>>>>>>>
> >>>>>>>> Best Regards
> >>>>>>>>
> >>>>>>>> Bernd Ströhle
> >>>>>>>> M: +49 171 5357916
> >>>>>>>> E: bernd.stroe...@gmail.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID>
> >>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM
> >>>>>>>> To: dev@airflow.apache.org
> >>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end
> >> users
> >>>>>>>>
> >>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of
> >>>>> Airflow
> >>>>>>>> DAG definition interface at so many levels, that any attempt to
> >>>>> change
> >>>>>> it
> >>>>>>>> will only introduce more chaos, not reduce it. The only thing
> >>>> that is
> >>>>>>> worse
> >>>>>>>> than a poorly chosen name in the code is when there are two ways
> >>>> to
> >>>>>>> define
> >>>>>>>> the same thing. Countless articles and tutorials will suddenly
> >>>> become
> >>>>>>>> confusing as they all refer to workflows as "DAG"s.
> >>>>>>>>
> >>>>>>>> We are already at risk of scaring the users away with a number
> >> of
> >>>>>>> breaking
> >>>>>>>> changes in Airflow 3, promising even more breaking changes for
> >> the
> >>>>> most
> >>>>>>>> basic things is not something that people are looking for.
> >>>> Attempting
> >>>>>> to
> >>>>>>>> change the fundamental terms will be interpreted as an even
> >>>> stronger
> >>>>>>> signal
> >>>>>>>> of project immaturity.
> >>>>>>>>
> >>>>>>>> Given that, I oppose the idea of changing the term in the long
> >>>> run. I
> >>>>>>> even
> >>>>>>>> stricter oppose the idea of deprecating it in the DAG definition
> >>>>>>> interface.
> >>>>>>>> We better put our time and efforts in other places in Airflow,
> >> of
> >>>>> which
> >>>>>>>> there are plenty.
> >>>>>>>>
> >>>>>>>> Kind regards,
> >>>>>>>> Igor
> >>>>>>>>
> >>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak
> >>>>>> <b...@astronomer.io.invalid
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Couple of thoughts:
> >>>>>>>>>
> >>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over
> >>>> time,
> >>>>>>>>> for example there are now several ways to create cyclic
> >> graphs,
> >>>>> e.g.
> >>>>>>>>> using the @continuous schedule. I imagine these properties
> >>>>> vanishing
> >>>>>>>>> even more in the future, so from that perspective I support
> >>>>> changing
> >>>>>>>>> “DAG" to a more generic name.
> >>>>>>>>>
> >>>>>>>>> 2. How other orchestration frameworks do naming:
> >>>>>>>>> Dagster: pipeline
> >>>>>>>>> Prefect: flow
> >>>>>>>>> Flyte: workflow
> >>>>>>>>> Temporal: workflow
> >>>>>>>>> Kestra: flow
> >>>>>>>>>
> >>>>>>>>>        I think “workflow” is the most fitting name.
> >>>>>>>>>
> >>>>>>>>> 3. Given the large impact of this change, I suggest defining a
> >>>>> clear
> >>>>>>>>> path forward. Would we first introduce the deprecation in
> >>>> Airflow
> >>>>> 3,
> >>>>>>>>> and remove “DAG” in Airflow 4?
> >>>>>>>>>
> >>>>>>>>> Bas
> >>>>>>>>>
> >>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I don't see a problem with the term DAG, especially when
> >> most
> >>>>> other
> >>>>>>>>>> platforms embrace the term wholeheartedly.
> >>>>>>>>>> I don't see anything intimidating or confusing about it at
> >>>> all,
> >>>>>>>>>> changing the term though would be fairly confusing to most
> >>>> users
> >>>>>> who
> >>>>>>>>>> have been
> >>>>>>>>> using
> >>>>>>>>>> the term for years.
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung
> >>>>>>>>>> <t...@astronomer.io.invalid
> >>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I totally agree with doing away with the term DAG. The only
> >>>>>> problem
> >>>>>>>>> (aside
> >>>>>>>>>>> from actually telling people—including myself—to stop using
> >>>> the
> >>>>>>>>>>> term)
> >>>>>>>>> is to
> >>>>>>>>>>> come up with a reasonable alternative.
> >>>>>>>>>>>
> >>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not
> >>>> very
> >>>>>>>>> accurate
> >>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a
> >>>> bit
> >>>>>>>>>>> broad; I tried to use it in a few places and kept finding
> >>>> myself
> >>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG
> >>>>>> definition”
> >>>>>>>>>>> (defeating the purpose).
> >>>>>>>>>>>
> >>>>>>>>>>> TP
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler
> >>>>>>>>>>>> <j_scheff...@gmx.de.INVALID>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Ryan,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for posting. I share the exactly same observation,
> >>>> had a
> >>>>>>>>>>>> short
> >>>>>>>>>>> laight because the DAG question is always an introduction
> >> if
> >>>>>>>>>>> someone
> >>>>>>>>> joins
> >>>>>>>>>>> the party. I think a global renaming makes sense.
> >> Especially
> >>>>> when
> >>>>>>>>>>> we
> >>>>>>>>> also
> >>>>>>>>>>> rename Dataset to Asset this is also a reasonable step.
> >>>> Concepts
> >>>>>>>>>>> still
> >>>>>>>>> can
> >>>>>>>>>>> stay the same.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with
> >>>> you
> >>>>> and
> >>>>>>>>>>>> +1
> >>>>>>>>> for
> >>>>>>>>>>> raising the discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Technically we can still think if we keep details of
> >> python
> >>>>> names
> >>>>>>>>>>>> the
> >>>>>>>>>>> same because the execution is still a DAG… but user facing
> >> it
> >>>>> is a
> >>>>>>>>> workflow.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jens
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sent from my Smartphone
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter <
> >>>>>> ryan.hat...@astronomer.io
> >>>>>>>>> .invalid>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Everyone please sheathe your swords... at least for now.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users.
> >>>>> Indeed,
> >>>>>>>>>>>>> it
> >>>>>>>>> has
> >>>>>>>>>>>>> little meaning outside of some mathematicians and
> >> software
> >>>>>>>>>>>>> engineers
> >>>>>>>>> for
> >>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone
> >>>> new
> >>>>> to
> >>>>>>>>>>>>> data engineering or workflow orchestration, one of the
> >>>> first
> >>>>>>>>>>>>> questions they
> >>>>>>>>>>> will
> >>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is
> >>>> almost
> >>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to
> >>>>> worry
> >>>>>>>>>>>>> about what
> >>>>>>>>> that
> >>>>>>>>>>>>> means; it's just a term for your workflow."
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The term "DAG" is problematic for at least a couple
> >>>> important
> >>>>>>>> reasons:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is
> >>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow
> >>>> to
> >>>>> be
> >>>>>>>>>>>>> approachable, and
> >>>>>>>>>>> using
> >>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an
> >>>>> initial
> >>>>>>>>>>> barrier to
> >>>>>>>>>>>>> understanding.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG
> >> is
> >>>>> just
> >>>>>>>>>>>>> one component of an Airflow workflow. The workflow
> >> includes
> >>>>> its
> >>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters,
> >> and
> >>>>> other
> >>>>>>>>>>>>> metadata that
> >>>>>>>>> the
> >>>>>>>>>>>>> DAG component doesn’t account for.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Consider the following from the Airflow homepage
> >>>>>>>>>>>>> <https://airflow.apache.org/>.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apache Airflow® is a platform created by the community to
> >>>>>>>>>>> programmatically
> >>>>>>>>>>>>> author, schedule and monitor workflows.
> >>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page
> >>>>>>>>>>>>> <
> >>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html
> >>>>>>>>>>>>>> ,
> >>>>>>>>> we
> >>>>>>>>>>> can
> >>>>>>>>>>>>> see that the docs explain what Airflow is without using
> >>>> "DAG."
> >>>>>>>>>>>>> It's
> >>>>>>>>>>> only in
> >>>>>>>>>>>>> the *workflow* Python code that the term is introduced
> >> out
> >>>> of
> >>>>>>>>>>>>> nowhere
> >>>>>>>>>>> as a
> >>>>>>>>>>>>> comment that awkwardly tries to explain it:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It makes sense to not refer to DAGs in these
> >> introductions
> >>>> to
> >>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it
> >>>>>>> orchestrates
> >>>>>>>> workflows*.
> >>>>>>>>>>> The
> >>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost
> >>>> every
> >>>>>>>>>>>>> user, workflows must adhere to.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG"
> >>>> and
> >>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For example, instead of...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @dag(
> >>>>>>>>>>>>> schedule="@daily",
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1)
> >>>>>>>>>>>>> )
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Users could do...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @workflow(
> >>>>>>>>>>>>> schedule="@daily",
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>> run_timeout=timedelta(hours=1)
> >>>>>>>>>>>>> )
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And with that... I will start running away.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>> ------------------------------------------------------------------
> >>>>>>>>>>>> --- To unsubscribe, e-mail:
> >>>> dev-unsubscr...@airflow.apache.org
> >>>>>>>>>>>> For additional commands, e-mail:
> >>>> dev-h...@airflow.apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>> -------------------------------------------------------------------
> >>>>>>>>>>> -- To unsubscribe, e-mail:
> >>>> dev-unsubscr...@airflow.apache.org
> >>>>>>>>>>> For additional commands, e-mail:
> >> dev-h...@airflow.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Reply via email to