Best argument in favour of keeping “dags” as a term — getting to re-use puns 
like https://i.imgflip.com/1xhtwh.jpg

In all seriousness: I don’t mind either way, both sides have good reasons 
presented.

-a

> On 22 Oct 2024, at 17:03, Daniel Standish 
> <daniel.stand...@astronomer.io.INVALID> wrote:
> 
> Yeah just say, when asked where the name comes from, "well, no one actually
> knows but..." and then make something up.
> 
> On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> 
>> Just to clarify - "directed acyclic graph" is the tongue-twister,
>> 
>> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> 
>>> I like what both Daniel and Brent wrote. I would very much want to be
>> able
>>> to say just "dag" without explaining it further.
>>> 
>>> For me every time I explain "DAG" at a talk it's a tongue-twister, and I
>>> almost stutter on trying to recall how to pronounce it properly.
>>> 
>>> J.
>>> 
>>> 
>>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi
>> <br...@astronomer.io.invalid>
>>> wrote:
>>> 
>>>> I remember we explored renaming "DAG" when starting on AIP-38 to
>> modernize
>>>> the UI. Both "pipeline" or "workflow" are more descriptive of what one
>> is
>>>> actually doing while Directed Acyclic Graph is an implementation detail.
>>>> But I agree with Daniel Standish, at this point "DAG" has become "dag"
>> , a
>>>> word in its own right.
>>>> 
>>>> Examples for "dag" are abound in community discussion, Airflow Summit
>>>> talks, documentation and even in the UI. Let's embrace "dag". A user
>> just
>>>> needs to learn one new word vs the technical concept behind that word. I
>>>> think that is much less effort than refactoring so much code,
>>>> documentation, blog posts, stack overflow questions, etc.
>>>> 
>>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish
>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>> 
>>>>> I am skeptical.  Seems like introducing a lot of pain for questionable
>>>>> benefit.  But, I am def sympathetic to the idea.  I agree the
>>>> association
>>>>> with "directed acyclic graph" is not helpful.
>>>>> 
>>>>> And along those lines, I offer here some less invasive mitigations.
>>>>> 
>>>>> One thing we can do no matter what is to de-emphasize the math nerd
>>>> origins
>>>>> of the name.  That is to say, in docs / website / etc, *never define*
>>>>> airflow's "dag" concept as a directed acyclic graph.  Always define it
>>>> as a
>>>>> pipeline, collection of tasks, workflow etc.
>>>>> 
>>>>> The "directed acyclic graph" part of it is like a historical footnote,
>>>> and
>>>>> we could make one mention of it somewhere hidden.
>>>>> 
>>>>> We could also start using lowercase in the docs in general e.g.
>> writing
>>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc.  The upper case
>>>> part
>>>>> of it makes it look like an acronym; but "dag" in airlfow is just an
>>>>> airflow concept and the association with "DAGs" is not really
>> unhelpful.
>>>>> 
>>>>> In other words embrace that "dag" in airflow is its own thing, is
>>>>> *not* strictly
>>>>> speaking a directed acyclic graph (which nobody knows about anyway),
>> and
>>>>> tell them what it is in simple terms that normal people understand.
>>>>> 
>>>>> 
>>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>>>>> 
>>>>>> DAG is so embedded into what we do that it will be extremely
>>>> difficult to
>>>>>> get rid of it completely. Also I think it will make a lot of
>> "google"
>>>>>> searches and "stack overflow" searches not finding the right
>> answers.
>>>>> This
>>>>>> is one of the strengths of Airflow - besides the community and ideas
>>>> that
>>>>>> Bernd mentioned - is the vast number of examples, problems and
>>>> solutions
>>>>>> you can so easily find (and we have to remember that all the AI
>>>> trained
>>>>> on
>>>>>> past data will be also rather poorly matching queries of people.
>>>>>> 
>>>>>> I am not too attached to DAG. I could easily switch. And if we do -
>> I
>>>>>> would be for using workflow or pipeline instead of `dag` if not the
>>>> above
>>>>>> reason, but I think I am here with Igor that it might cause more
>>>> problems
>>>>>> than it solves.
>>>>>> 
>>>>>> But I am not 100% against - if others will think it's a good idea, I
>>>> am
>>>>> ok
>>>>>> with it.
>>>>>> 
>>>>>> J,
>>>>>> 
>>>>>> 
>>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat
>>>>>> <abhishek.bha...@astronomer.io.invalid> wrote:
>>>>>> 
>>>>>>> Agreed that the word DAG makes very less sense to someone new to
>>>>> workflow
>>>>>>> orchestration. But it does also show the nature of being acyclic.
>>>> Sure,
>>>>>> as
>>>>>>> Bas mentioned, there are ways to workaround it. Still, in my
>>>> opinion,
>>>>>> there
>>>>>>> is generally no need for cyclic behavior in workflow
>> orchestration.
>>>>> Most
>>>>>>> (*if
>>>>>>> not all*) cases can be in some way can be covered using an acyclic
>>>>> manner
>>>>>>> with multiple runs. Hence, the idempotency. So I would want the
>>>>> "acyclic"
>>>>>>> word to stick.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Avi
>>>>>>> 
>>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de>
>> wrote:
>>>>>>> 
>>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new
>>>>> ideas.
>>>>>>>> 
>>>>>>>> The Term DAG is misleading; it should be replaced by the more
>>>> general
>>>>>>> Term
>>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN)
>> (maybe
>>>>>>> without
>>>>>>>> a direction);
>>>>>>>> and ... these Graphs should be stored in a Graph Database.
>>>>>>>> 
>>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be
>>>> assigned
>>>>> to
>>>>>> an
>>>>>>>> executable (Python-, Rust-, ... ) member of a library.
>>>>>>>> 
>>>>>>>> A running Graph might have a different structure than a
>>>> configuration
>>>>>>>> Graph.
>>>>>>>> 
>>>>>>>> Forget that if you think it's bullshit.
>>>>>>>> 
>>>>>>>> Best Regards
>>>>>>>> 
>>>>>>>> Bernd Ströhle
>>>>>>>> M: +49 171 5357916
>>>>>>>> E: bernd.stroe...@gmail.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID>
>>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM
>>>>>>>> To: dev@airflow.apache.org
>>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end
>> users
>>>>>>>> 
>>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of
>>>>> Airflow
>>>>>>>> DAG definition interface at so many levels, that any attempt to
>>>>> change
>>>>>> it
>>>>>>>> will only introduce more chaos, not reduce it. The only thing
>>>> that is
>>>>>>> worse
>>>>>>>> than a poorly chosen name in the code is when there are two ways
>>>> to
>>>>>>> define
>>>>>>>> the same thing. Countless articles and tutorials will suddenly
>>>> become
>>>>>>>> confusing as they all refer to workflows as "DAG"s.
>>>>>>>> 
>>>>>>>> We are already at risk of scaring the users away with a number
>> of
>>>>>>> breaking
>>>>>>>> changes in Airflow 3, promising even more breaking changes for
>> the
>>>>> most
>>>>>>>> basic things is not something that people are looking for.
>>>> Attempting
>>>>>> to
>>>>>>>> change the fundamental terms will be interpreted as an even
>>>> stronger
>>>>>>> signal
>>>>>>>> of project immaturity.
>>>>>>>> 
>>>>>>>> Given that, I oppose the idea of changing the term in the long
>>>> run. I
>>>>>>> even
>>>>>>>> stricter oppose the idea of deprecating it in the DAG definition
>>>>>>> interface.
>>>>>>>> We better put our time and efforts in other places in Airflow,
>> of
>>>>> which
>>>>>>>> there are plenty.
>>>>>>>> 
>>>>>>>> Kind regards,
>>>>>>>> Igor
>>>>>>>> 
>>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak
>>>>>> <b...@astronomer.io.invalid
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Couple of thoughts:
>>>>>>>>> 
>>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over
>>>> time,
>>>>>>>>> for example there are now several ways to create cyclic
>> graphs,
>>>>> e.g.
>>>>>>>>> using the @continuous schedule. I imagine these properties
>>>>> vanishing
>>>>>>>>> even more in the future, so from that perspective I support
>>>>> changing
>>>>>>>>> “DAG" to a more generic name.
>>>>>>>>> 
>>>>>>>>> 2. How other orchestration frameworks do naming:
>>>>>>>>> Dagster: pipeline
>>>>>>>>> Prefect: flow
>>>>>>>>> Flyte: workflow
>>>>>>>>> Temporal: workflow
>>>>>>>>> Kestra: flow
>>>>>>>>> 
>>>>>>>>>        I think “workflow” is the most fitting name.
>>>>>>>>> 
>>>>>>>>> 3. Given the large impact of this change, I suggest defining a
>>>>> clear
>>>>>>>>> path forward. Would we first introduce the deprecation in
>>>> Airflow
>>>>> 3,
>>>>>>>>> and remove “DAG” in Airflow 4?
>>>>>>>>> 
>>>>>>>>> Bas
>>>>>>>>> 
>>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> I don't see a problem with the term DAG, especially when
>> most
>>>>> other
>>>>>>>>>> platforms embrace the term wholeheartedly.
>>>>>>>>>> I don't see anything intimidating or confusing about it at
>>>> all,
>>>>>>>>>> changing the term though would be fairly confusing to most
>>>> users
>>>>>> who
>>>>>>>>>> have been
>>>>>>>>> using
>>>>>>>>>> the term for years.
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung
>>>>>>>>>> <t...@astronomer.io.invalid
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I totally agree with doing away with the term DAG. The only
>>>>>> problem
>>>>>>>>> (aside
>>>>>>>>>>> from actually telling people—including myself—to stop using
>>>> the
>>>>>>>>>>> term)
>>>>>>>>> is to
>>>>>>>>>>> come up with a reasonable alternative.
>>>>>>>>>>> 
>>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not
>>>> very
>>>>>>>>> accurate
>>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a
>>>> bit
>>>>>>>>>>> broad; I tried to use it in a few places and kept finding
>>>> myself
>>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG
>>>>>> definition”
>>>>>>>>>>> (defeating the purpose).
>>>>>>>>>>> 
>>>>>>>>>>> TP
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler
>>>>>>>>>>>> <j_scheff...@gmx.de.INVALID>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for posting. I share the exactly same observation,
>>>> had a
>>>>>>>>>>>> short
>>>>>>>>>>> laight because the DAG question is always an introduction
>> if
>>>>>>>>>>> someone
>>>>>>>>> joins
>>>>>>>>>>> the party. I think a global renaming makes sense.
>> Especially
>>>>> when
>>>>>>>>>>> we
>>>>>>>>> also
>>>>>>>>>>> rename Dataset to Asset this is also a reasonable step.
>>>> Concepts
>>>>>>>>>>> still
>>>>>>>>> can
>>>>>>>>>>> stay the same.
>>>>>>>>>>>> 
>>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with
>>>> you
>>>>> and
>>>>>>>>>>>> +1
>>>>>>>>> for
>>>>>>>>>>> raising the discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> Technically we can still think if we keep details of
>> python
>>>>> names
>>>>>>>>>>>> the
>>>>>>>>>>> same because the execution is still a DAG… but user facing
>> it
>>>>> is a
>>>>>>>>> workflow.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jens
>>>>>>>>>>>> 
>>>>>>>>>>>> Sent from my Smartphone
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter <
>>>>>> ryan.hat...@astronomer.io
>>>>>>>>> .invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Everyone please sheathe your swords... at least for now.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users.
>>>>> Indeed,
>>>>>>>>>>>>> it
>>>>>>>>> has
>>>>>>>>>>>>> little meaning outside of some mathematicians and
>> software
>>>>>>>>>>>>> engineers
>>>>>>>>> for
>>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone
>>>> new
>>>>> to
>>>>>>>>>>>>> data engineering or workflow orchestration, one of the
>>>> first
>>>>>>>>>>>>> questions they
>>>>>>>>>>> will
>>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is
>>>> almost
>>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to
>>>>> worry
>>>>>>>>>>>>> about what
>>>>>>>>> that
>>>>>>>>>>>>> means; it's just a term for your workflow."
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The term "DAG" is problematic for at least a couple
>>>> important
>>>>>>>> reasons:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is
>>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow
>>>> to
>>>>> be
>>>>>>>>>>>>> approachable, and
>>>>>>>>>>> using
>>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an
>>>>> initial
>>>>>>>>>>> barrier to
>>>>>>>>>>>>> understanding.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG
>> is
>>>>> just
>>>>>>>>>>>>> one component of an Airflow workflow. The workflow
>> includes
>>>>> its
>>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters,
>> and
>>>>> other
>>>>>>>>>>>>> metadata that
>>>>>>>>> the
>>>>>>>>>>>>> DAG component doesn’t account for.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Consider the following from the Airflow homepage
>>>>>>>>>>>>> <https://airflow.apache.org/>.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Apache Airflow® is a platform created by the community to
>>>>>>>>>>> programmatically
>>>>>>>>>>>>> author, schedule and monitor workflows.
>>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page
>>>>>>>>>>>>> <
>>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html
>>>>>>>>>>>>>> ,
>>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>> see that the docs explain what Airflow is without using
>>>> "DAG."
>>>>>>>>>>>>> It's
>>>>>>>>>>> only in
>>>>>>>>>>>>> the *workflow* Python code that the term is introduced
>> out
>>>> of
>>>>>>>>>>>>> nowhere
>>>>>>>>>>> as a
>>>>>>>>>>>>> comment that awkwardly tries to explain it:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It makes sense to not refer to DAGs in these
>> introductions
>>>> to
>>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it
>>>>>>> orchestrates
>>>>>>>> workflows*.
>>>>>>>>>>> The
>>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost
>>>> every
>>>>>>>>>>>>> user, workflows must adhere to.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG"
>>>> and
>>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow".
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For example, instead of...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @dag(
>>>>>>>>>>>>> schedule="@daily",
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1)
>>>>>>>>>>>>> )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Users could do...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @workflow(
>>>>>>>>>>>>> schedule="@daily",
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> run_timeout=timedelta(hours=1)
>>>>>>>>>>>>> )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And with that... I will start running away.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>>>>>>>> --- To unsubscribe, e-mail:
>>>> dev-unsubscr...@airflow.apache.org
>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-h...@airflow.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> -------------------------------------------------------------------
>>>>>>>>>>> -- To unsubscribe, e-mail:
>>>> dev-unsubscr...@airflow.apache.org
>>>>>>>>>>> For additional commands, e-mail:
>> dev-h...@airflow.apache.org
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to