I agree with the general sentiment of: You're right Ryan, DAG isn't great and 
I'd rather workflow, but changing it will cause much more wreckage than it 
solves.

Also agree with the idea to just move away from defining DAG. I think we've 
been naturally doing that as a community for a while now anyway, so that feels 
like a natural step.

Cheers,
Niko

________________________________
From: Ash Berlin-Taylor <a...@apache.org>
Sent: Tuesday, October 22, 2024 9:06:39 AM
To: dev@airflow.apache.org
Subject: RE: [EXT] Airflow should deprecate the term "DAG" for end users

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Best argument in favour of keeping “dags” as a term — getting to re-use puns 
like https://i.imgflip.com/1xhtwh.jpg

In all seriousness: I don’t mind either way, both sides have good reasons 
presented.

-a

> On 22 Oct 2024, at 17:03, Daniel Standish 
> <daniel.stand...@astronomer.io.INVALID> wrote:
>
> Yeah just say, when asked where the name comes from, "well, no one actually
> knows but..." and then make something up.
>
> On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Just to clarify - "directed acyclic graph" is the tongue-twister,
>>
>> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I like what both Daniel and Brent wrote. I would very much want to be
>> able
>>> to say just "dag" without explaining it further.
>>>
>>> For me every time I explain "DAG" at a talk it's a tongue-twister, and I
>>> almost stutter on trying to recall how to pronounce it properly.
>>>
>>> J.
>>>
>>>
>>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi
>> <br...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> I remember we explored renaming "DAG" when starting on AIP-38 to
>> modernize
>>>> the UI. Both "pipeline" or "workflow" are more descriptive of what one
>> is
>>>> actually doing while Directed Acyclic Graph is an implementation detail.
>>>> But I agree with Daniel Standish, at this point "DAG" has become "dag"
>> , a
>>>> word in its own right.
>>>>
>>>> Examples for "dag" are abound in community discussion, Airflow Summit
>>>> talks, documentation and even in the UI. Let's embrace "dag". A user
>> just
>>>> needs to learn one new word vs the technical concept behind that word. I
>>>> think that is much less effort than refactoring so much code,
>>>> documentation, blog posts, stack overflow questions, etc.
>>>>
>>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish
>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>
>>>>> I am skeptical.  Seems like introducing a lot of pain for questionable
>>>>> benefit.  But, I am def sympathetic to the idea.  I agree the
>>>> association
>>>>> with "directed acyclic graph" is not helpful.
>>>>>
>>>>> And along those lines, I offer here some less invasive mitigations.
>>>>>
>>>>> One thing we can do no matter what is to de-emphasize the math nerd
>>>> origins
>>>>> of the name.  That is to say, in docs / website / etc, *never define*
>>>>> airflow's "dag" concept as a directed acyclic graph.  Always define it
>>>> as a
>>>>> pipeline, collection of tasks, workflow etc.
>>>>>
>>>>> The "directed acyclic graph" part of it is like a historical footnote,
>>>> and
>>>>> we could make one mention of it somewhere hidden.
>>>>>
>>>>> We could also start using lowercase in the docs in general e.g.
>> writing
>>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc.  The upper case
>>>> part
>>>>> of it makes it look like an acronym; but "dag" in airlfow is just an
>>>>> airflow concept and the association with "DAGs" is not really
>> unhelpful.
>>>>>
>>>>> In other words embrace that "dag" in airflow is its own thing, is
>>>>> *not* strictly
>>>>> speaking a directed acyclic graph (which nobody knows about anyway),
>> and
>>>>> tell them what it is in simple terms that normal people understand.
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>>>>>
>>>>>> DAG is so embedded into what we do that it will be extremely
>>>> difficult to
>>>>>> get rid of it completely. Also I think it will make a lot of
>> "google"
>>>>>> searches and "stack overflow" searches not finding the right
>> answers.
>>>>> This
>>>>>> is one of the strengths of Airflow - besides the community and ideas
>>>> that
>>>>>> Bernd mentioned - is the vast number of examples, problems and
>>>> solutions
>>>>>> you can so easily find (and we have to remember that all the AI
>>>> trained
>>>>> on
>>>>>> past data will be also rather poorly matching queries of people.
>>>>>>
>>>>>> I am not too attached to DAG. I could easily switch. And if we do -
>> I
>>>>>> would be for using workflow or pipeline instead of `dag` if not the
>>>> above
>>>>>> reason, but I think I am here with Igor that it might cause more
>>>> problems
>>>>>> than it solves.
>>>>>>
>>>>>> But I am not 100% against - if others will think it's a good idea, I
>>>> am
>>>>> ok
>>>>>> with it.
>>>>>>
>>>>>> J,
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat
>>>>>> <abhishek.bha...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> Agreed that the word DAG makes very less sense to someone new to
>>>>> workflow
>>>>>>> orchestration. But it does also show the nature of being acyclic.
>>>> Sure,
>>>>>> as
>>>>>>> Bas mentioned, there are ways to workaround it. Still, in my
>>>> opinion,
>>>>>> there
>>>>>>> is generally no need for cyclic behavior in workflow
>> orchestration.
>>>>> Most
>>>>>>> (*if
>>>>>>> not all*) cases can be in some way can be covered using an acyclic
>>>>> manner
>>>>>>> with multiple runs. Hence, the idempotency. So I would want the
>>>>> "acyclic"
>>>>>>> word to stick.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Avi
>>>>>>>
>>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de>
>> wrote:
>>>>>>>
>>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new
>>>>> ideas.
>>>>>>>>
>>>>>>>> The Term DAG is misleading; it should be replaced by the more
>>>> general
>>>>>>> Term
>>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN)
>> (maybe
>>>>>>> without
>>>>>>>> a direction);
>>>>>>>> and ... these Graphs should be stored in a Graph Database.
>>>>>>>>
>>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be
>>>> assigned
>>>>> to
>>>>>> an
>>>>>>>> executable (Python-, Rust-, ... ) member of a library.
>>>>>>>>
>>>>>>>> A running Graph might have a different structure than a
>>>> configuration
>>>>>>>> Graph.
>>>>>>>>
>>>>>>>> Forget that if you think it's bullshit.
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Bernd Ströhle
>>>>>>>> M: +49 171 5357916
>>>>>>>> E: bernd.stroe...@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID>
>>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM
>>>>>>>> To: dev@airflow.apache.org
>>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end
>> users
>>>>>>>>
>>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of
>>>>> Airflow
>>>>>>>> DAG definition interface at so many levels, that any attempt to
>>>>> change
>>>>>> it
>>>>>>>> will only introduce more chaos, not reduce it. The only thing
>>>> that is
>>>>>>> worse
>>>>>>>> than a poorly chosen name in the code is when there are two ways
>>>> to
>>>>>>> define
>>>>>>>> the same thing. Countless articles and tutorials will suddenly
>>>> become
>>>>>>>> confusing as they all refer to workflows as "DAG"s.
>>>>>>>>
>>>>>>>> We are already at risk of scaring the users away with a number
>> of
>>>>>>> breaking
>>>>>>>> changes in Airflow 3, promising even more breaking changes for
>> the
>>>>> most
>>>>>>>> basic things is not something that people are looking for.
>>>> Attempting
>>>>>> to
>>>>>>>> change the fundamental terms will be interpreted as an even
>>>> stronger
>>>>>>> signal
>>>>>>>> of project immaturity.
>>>>>>>>
>>>>>>>> Given that, I oppose the idea of changing the term in the long
>>>> run. I
>>>>>>> even
>>>>>>>> stricter oppose the idea of deprecating it in the DAG definition
>>>>>>> interface.
>>>>>>>> We better put our time and efforts in other places in Airflow,
>> of
>>>>> which
>>>>>>>> there are plenty.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Igor
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak
>>>>>> <b...@astronomer.io.invalid
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Couple of thoughts:
>>>>>>>>>
>>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over
>>>> time,
>>>>>>>>> for example there are now several ways to create cyclic
>> graphs,
>>>>> e.g.
>>>>>>>>> using the @continuous schedule. I imagine these properties
>>>>> vanishing
>>>>>>>>> even more in the future, so from that perspective I support
>>>>> changing
>>>>>>>>> “DAG" to a more generic name.
>>>>>>>>>
>>>>>>>>> 2. How other orchestration frameworks do naming:
>>>>>>>>> Dagster: pipeline
>>>>>>>>> Prefect: flow
>>>>>>>>> Flyte: workflow
>>>>>>>>> Temporal: workflow
>>>>>>>>> Kestra: flow
>>>>>>>>>
>>>>>>>>>        I think “workflow” is the most fitting name.
>>>>>>>>>
>>>>>>>>> 3. Given the large impact of this change, I suggest defining a
>>>>> clear
>>>>>>>>> path forward. Would we first introduce the deprecation in
>>>> Airflow
>>>>> 3,
>>>>>>>>> and remove “DAG” in Airflow 4?
>>>>>>>>>
>>>>>>>>> Bas
>>>>>>>>>
>>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> I don't see a problem with the term DAG, especially when
>> most
>>>>> other
>>>>>>>>>> platforms embrace the term wholeheartedly.
>>>>>>>>>> I don't see anything intimidating or confusing about it at
>>>> all,
>>>>>>>>>> changing the term though would be fairly confusing to most
>>>> users
>>>>>> who
>>>>>>>>>> have been
>>>>>>>>> using
>>>>>>>>>> the term for years.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung
>>>>>>>>>> <t...@astronomer.io.invalid
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I totally agree with doing away with the term DAG. The only
>>>>>> problem
>>>>>>>>> (aside
>>>>>>>>>>> from actually telling people—including myself—to stop using
>>>> the
>>>>>>>>>>> term)
>>>>>>>>> is to
>>>>>>>>>>> come up with a reasonable alternative.
>>>>>>>>>>>
>>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not
>>>> very
>>>>>>>>> accurate
>>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a
>>>> bit
>>>>>>>>>>> broad; I tried to use it in a few places and kept finding
>>>> myself
>>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG
>>>>>> definition”
>>>>>>>>>>> (defeating the purpose).
>>>>>>>>>>>
>>>>>>>>>>> TP
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler
>>>>>>>>>>>> <j_scheff...@gmx.de.INVALID>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for posting. I share the exactly same observation,
>>>> had a
>>>>>>>>>>>> short
>>>>>>>>>>> laight because the DAG question is always an introduction
>> if
>>>>>>>>>>> someone
>>>>>>>>> joins
>>>>>>>>>>> the party. I think a global renaming makes sense.
>> Especially
>>>>> when
>>>>>>>>>>> we
>>>>>>>>> also
>>>>>>>>>>> rename Dataset to Asset this is also a reasonable step.
>>>> Concepts
>>>>>>>>>>> still
>>>>>>>>> can
>>>>>>>>>>> stay the same.
>>>>>>>>>>>>
>>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with
>>>> you
>>>>> and
>>>>>>>>>>>> +1
>>>>>>>>> for
>>>>>>>>>>> raising the discussion.
>>>>>>>>>>>>
>>>>>>>>>>>> Technically we can still think if we keep details of
>> python
>>>>> names
>>>>>>>>>>>> the
>>>>>>>>>>> same because the execution is still a DAG… but user facing
>> it
>>>>> is a
>>>>>>>>> workflow.
>>>>>>>>>>>>
>>>>>>>>>>>> Jens
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from my Smartphone
>>>>>>>>>>>>
>>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter <
>>>>>> ryan.hat...@astronomer.io
>>>>>>>>> .invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Everyone please sheathe your swords... at least for now.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users.
>>>>> Indeed,
>>>>>>>>>>>>> it
>>>>>>>>> has
>>>>>>>>>>>>> little meaning outside of some mathematicians and
>> software
>>>>>>>>>>>>> engineers
>>>>>>>>> for
>>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone
>>>> new
>>>>> to
>>>>>>>>>>>>> data engineering or workflow orchestration, one of the
>>>> first
>>>>>>>>>>>>> questions they
>>>>>>>>>>> will
>>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is
>>>> almost
>>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to
>>>>> worry
>>>>>>>>>>>>> about what
>>>>>>>>> that
>>>>>>>>>>>>> means; it's just a term for your workflow."
>>>>>>>>>>>>>
>>>>>>>>>>>>> The term "DAG" is problematic for at least a couple
>>>> important
>>>>>>>> reasons:
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is
>>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow
>>>> to
>>>>> be
>>>>>>>>>>>>> approachable, and
>>>>>>>>>>> using
>>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an
>>>>> initial
>>>>>>>>>>> barrier to
>>>>>>>>>>>>> understanding.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG
>> is
>>>>> just
>>>>>>>>>>>>> one component of an Airflow workflow. The workflow
>> includes
>>>>> its
>>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters,
>> and
>>>>> other
>>>>>>>>>>>>> metadata that
>>>>>>>>> the
>>>>>>>>>>>>> DAG component doesn’t account for.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the following from the Airflow homepage
>>>>>>>>>>>>> <https://airflow.apache.org/>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apache Airflow® is a platform created by the community to
>>>>>>>>>>> programmatically
>>>>>>>>>>>>> author, schedule and monitor workflows.
>>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page
>>>>>>>>>>>>> <
>>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html
>>>>>>>>>>>>>> ,
>>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>> see that the docs explain what Airflow is without using
>>>> "DAG."
>>>>>>>>>>>>> It's
>>>>>>>>>>> only in
>>>>>>>>>>>>> the *workflow* Python code that the term is introduced
>> out
>>>> of
>>>>>>>>>>>>> nowhere
>>>>>>>>>>> as a
>>>>>>>>>>>>> comment that awkwardly tries to explain it:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks
>>>>>>>>>>>>>
>>>>>>>>>>>>> It makes sense to not refer to DAGs in these
>> introductions
>>>> to
>>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it
>>>>>>> orchestrates
>>>>>>>> workflows*.
>>>>>>>>>>> The
>>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost
>>>> every
>>>>>>>>>>>>> user, workflows must adhere to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG"
>>>> and
>>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow".
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, instead of...
>>>>>>>>>>>>>
>>>>>>>>>>>>> @dag(
>>>>>>>>>>>>> schedule="@daily",
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1)
>>>>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>> Users could do...
>>>>>>>>>>>>>
>>>>>>>>>>>>> @workflow(
>>>>>>>>>>>>> schedule="@daily",
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> run_timeout=timedelta(hours=1)
>>>>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And with that... I will start running away.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>>>>>>>> --- To unsubscribe, e-mail:
>>>> dev-unsubscr...@airflow.apache.org
>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-h...@airflow.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> -------------------------------------------------------------------
>>>>>>>>>>> -- To unsubscribe, e-mail:
>>>> dev-unsubscr...@airflow.apache.org
>>>>>>>>>>> For additional commands, e-mail:
>> dev-h...@airflow.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to