It's an interesting discussion and I remember struggling with this when I started working with Airflow. But, I also agree with the viewpoint of it being an established concept now regardless of the origin.
I am personally leaning towards the perspective best expressed by Daniel Standish and Brent of using Dag as a word, rather than the computer science concept. Best regards, Vikram On Tue, Oct 22, 2024 at 9:46 AM Oliveira, Niko <oniko...@amazon.com.invalid> wrote: > I agree with the general sentiment of: You're right Ryan, DAG isn't great > and I'd rather workflow, but changing it will cause much more wreckage than > it solves. > > Also agree with the idea to just move away from defining DAG. I think > we've been naturally doing that as a community for a while now anyway, so > that feels like a natural step. > > Cheers, > Niko > > ________________________________ > From: Ash Berlin-Taylor <a...@apache.org> > Sent: Tuesday, October 22, 2024 9:06:39 AM > To: dev@airflow.apache.org > Subject: RE: [EXT] Airflow should deprecate the term "DAG" for end users > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > Best argument in favour of keeping “dags” as a term — getting to re-use > puns like https://i.imgflip.com/1xhtwh.jpg > > In all seriousness: I don’t mind either way, both sides have good reasons > presented. > > -a > > > On 22 Oct 2024, at 17:03, Daniel Standish > <daniel.stand...@astronomer.io.INVALID> wrote: > > > > Yeah just say, when asked where the name comes from, "well, no one > actually > > knows but..." and then make something up. > > > > On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > >> Just to clarify - "directed acyclic graph" is the tongue-twister, > >> > >> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> > >>> I like what both Daniel and Brent wrote. I would very much want to be > >> able > >>> to say just "dag" without explaining it further. > >>> > >>> For me every time I explain "DAG" at a talk it's a tongue-twister, and > I > >>> almost stutter on trying to recall how to pronounce it properly. > >>> > >>> J. > >>> > >>> > >>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi > >> <br...@astronomer.io.invalid> > >>> wrote: > >>> > >>>> I remember we explored renaming "DAG" when starting on AIP-38 to > >> modernize > >>>> the UI. Both "pipeline" or "workflow" are more descriptive of what one > >> is > >>>> actually doing while Directed Acyclic Graph is an implementation > detail. > >>>> But I agree with Daniel Standish, at this point "DAG" has become "dag" > >> , a > >>>> word in its own right. > >>>> > >>>> Examples for "dag" are abound in community discussion, Airflow Summit > >>>> talks, documentation and even in the UI. Let's embrace "dag". A user > >> just > >>>> needs to learn one new word vs the technical concept behind that > word. I > >>>> think that is much less effort than refactoring so much code, > >>>> documentation, blog posts, stack overflow questions, etc. > >>>> > >>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish > >>>> <daniel.stand...@astronomer.io.invalid> wrote: > >>>> > >>>>> I am skeptical. Seems like introducing a lot of pain for > questionable > >>>>> benefit. But, I am def sympathetic to the idea. I agree the > >>>> association > >>>>> with "directed acyclic graph" is not helpful. > >>>>> > >>>>> And along those lines, I offer here some less invasive mitigations. > >>>>> > >>>>> One thing we can do no matter what is to de-emphasize the math nerd > >>>> origins > >>>>> of the name. That is to say, in docs / website / etc, *never define* > >>>>> airflow's "dag" concept as a directed acyclic graph. Always define > it > >>>> as a > >>>>> pipeline, collection of tasks, workflow etc. > >>>>> > >>>>> The "directed acyclic graph" part of it is like a historical > footnote, > >>>> and > >>>>> we could make one mention of it somewhere hidden. > >>>>> > >>>>> We could also start using lowercase in the docs in general e.g. > >> writing > >>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc. The upper case > >>>> part > >>>>> of it makes it look like an acronym; but "dag" in airlfow is just an > >>>>> airflow concept and the association with "DAGs" is not really > >> unhelpful. > >>>>> > >>>>> In other words embrace that "dag" in airflow is its own thing, is > >>>>> *not* strictly > >>>>> speaking a directed acyclic graph (which nobody knows about anyway), > >> and > >>>>> tell them what it is in simple terms that normal people understand. > >>>>> > >>>>> > >>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com> > >> wrote: > >>>>> > >>>>>> DAG is so embedded into what we do that it will be extremely > >>>> difficult to > >>>>>> get rid of it completely. Also I think it will make a lot of > >> "google" > >>>>>> searches and "stack overflow" searches not finding the right > >> answers. > >>>>> This > >>>>>> is one of the strengths of Airflow - besides the community and ideas > >>>> that > >>>>>> Bernd mentioned - is the vast number of examples, problems and > >>>> solutions > >>>>>> you can so easily find (and we have to remember that all the AI > >>>> trained > >>>>> on > >>>>>> past data will be also rather poorly matching queries of people. > >>>>>> > >>>>>> I am not too attached to DAG. I could easily switch. And if we do - > >> I > >>>>>> would be for using workflow or pipeline instead of `dag` if not the > >>>> above > >>>>>> reason, but I think I am here with Igor that it might cause more > >>>> problems > >>>>>> than it solves. > >>>>>> > >>>>>> But I am not 100% against - if others will think it's a good idea, I > >>>> am > >>>>> ok > >>>>>> with it. > >>>>>> > >>>>>> J, > >>>>>> > >>>>>> > >>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat > >>>>>> <abhishek.bha...@astronomer.io.invalid> wrote: > >>>>>> > >>>>>>> Agreed that the word DAG makes very less sense to someone new to > >>>>> workflow > >>>>>>> orchestration. But it does also show the nature of being acyclic. > >>>> Sure, > >>>>>> as > >>>>>>> Bas mentioned, there are ways to workaround it. Still, in my > >>>> opinion, > >>>>>> there > >>>>>>> is generally no need for cyclic behavior in workflow > >> orchestration. > >>>>> Most > >>>>>>> (*if > >>>>>>> not all*) cases can be in some way can be covered using an acyclic > >>>>> manner > >>>>>>> with multiple runs. Hence, the idempotency. So I would want the > >>>>> "acyclic" > >>>>>>> word to stick. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Avi > >>>>>>> > >>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de> > >> wrote: > >>>>>>> > >>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new > >>>>> ideas. > >>>>>>>> > >>>>>>>> The Term DAG is misleading; it should be replaced by the more > >>>> general > >>>>>>> Term > >>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN) > >> (maybe > >>>>>>> without > >>>>>>>> a direction); > >>>>>>>> and ... these Graphs should be stored in a Graph Database. > >>>>>>>> > >>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be > >>>> assigned > >>>>> to > >>>>>> an > >>>>>>>> executable (Python-, Rust-, ... ) member of a library. > >>>>>>>> > >>>>>>>> A running Graph might have a different structure than a > >>>> configuration > >>>>>>>> Graph. > >>>>>>>> > >>>>>>>> Forget that if you think it's bullshit. > >>>>>>>> > >>>>>>>> Best Regards > >>>>>>>> > >>>>>>>> Bernd Ströhle > >>>>>>>> M: +49 171 5357916 > >>>>>>>> E: bernd.stroe...@gmail.com > >>>>>>>> > >>>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID> > >>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM > >>>>>>>> To: dev@airflow.apache.org > >>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end > >> users > >>>>>>>> > >>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of > >>>>> Airflow > >>>>>>>> DAG definition interface at so many levels, that any attempt to > >>>>> change > >>>>>> it > >>>>>>>> will only introduce more chaos, not reduce it. The only thing > >>>> that is > >>>>>>> worse > >>>>>>>> than a poorly chosen name in the code is when there are two ways > >>>> to > >>>>>>> define > >>>>>>>> the same thing. Countless articles and tutorials will suddenly > >>>> become > >>>>>>>> confusing as they all refer to workflows as "DAG"s. > >>>>>>>> > >>>>>>>> We are already at risk of scaring the users away with a number > >> of > >>>>>>> breaking > >>>>>>>> changes in Airflow 3, promising even more breaking changes for > >> the > >>>>> most > >>>>>>>> basic things is not something that people are looking for. > >>>> Attempting > >>>>>> to > >>>>>>>> change the fundamental terms will be interpreted as an even > >>>> stronger > >>>>>>> signal > >>>>>>>> of project immaturity. > >>>>>>>> > >>>>>>>> Given that, I oppose the idea of changing the term in the long > >>>> run. I > >>>>>>> even > >>>>>>>> stricter oppose the idea of deprecating it in the DAG definition > >>>>>>> interface. > >>>>>>>> We better put our time and efforts in other places in Airflow, > >> of > >>>>> which > >>>>>>>> there are plenty. > >>>>>>>> > >>>>>>>> Kind regards, > >>>>>>>> Igor > >>>>>>>> > >>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak > >>>>>> <b...@astronomer.io.invalid > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Couple of thoughts: > >>>>>>>>> > >>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over > >>>> time, > >>>>>>>>> for example there are now several ways to create cyclic > >> graphs, > >>>>> e.g. > >>>>>>>>> using the @continuous schedule. I imagine these properties > >>>>> vanishing > >>>>>>>>> even more in the future, so from that perspective I support > >>>>> changing > >>>>>>>>> “DAG" to a more generic name. > >>>>>>>>> > >>>>>>>>> 2. How other orchestration frameworks do naming: > >>>>>>>>> Dagster: pipeline > >>>>>>>>> Prefect: flow > >>>>>>>>> Flyte: workflow > >>>>>>>>> Temporal: workflow > >>>>>>>>> Kestra: flow > >>>>>>>>> > >>>>>>>>> I think “workflow” is the most fitting name. > >>>>>>>>> > >>>>>>>>> 3. Given the large impact of this change, I suggest defining a > >>>>> clear > >>>>>>>>> path forward. Would we first introduce the deprecation in > >>>> Airflow > >>>>> 3, > >>>>>>>>> and remove “DAG” in Airflow 4? > >>>>>>>>> > >>>>>>>>> Bas > >>>>>>>>> > >>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>> I don't see a problem with the term DAG, especially when > >> most > >>>>> other > >>>>>>>>>> platforms embrace the term wholeheartedly. > >>>>>>>>>> I don't see anything intimidating or confusing about it at > >>>> all, > >>>>>>>>>> changing the term though would be fairly confusing to most > >>>> users > >>>>>> who > >>>>>>>>>> have been > >>>>>>>>> using > >>>>>>>>>> the term for years. > >>>>>>>>>> > >>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung > >>>>>>>>>> <t...@astronomer.io.invalid > >>>>>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> I totally agree with doing away with the term DAG. The only > >>>>>> problem > >>>>>>>>> (aside > >>>>>>>>>>> from actually telling people—including myself—to stop using > >>>> the > >>>>>>>>>>> term) > >>>>>>>>> is to > >>>>>>>>>>> come up with a reasonable alternative. > >>>>>>>>>>> > >>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not > >>>> very > >>>>>>>>> accurate > >>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a > >>>> bit > >>>>>>>>>>> broad; I tried to use it in a few places and kept finding > >>>> myself > >>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG > >>>>>> definition” > >>>>>>>>>>> (defeating the purpose). > >>>>>>>>>>> > >>>>>>>>>>> TP > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler > >>>>>>>>>>>> <j_scheff...@gmx.de.INVALID> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Ryan, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for posting. I share the exactly same observation, > >>>> had a > >>>>>>>>>>>> short > >>>>>>>>>>> laight because the DAG question is always an introduction > >> if > >>>>>>>>>>> someone > >>>>>>>>> joins > >>>>>>>>>>> the party. I think a global renaming makes sense. > >> Especially > >>>>> when > >>>>>>>>>>> we > >>>>>>>>> also > >>>>>>>>>>> rename Dataset to Asset this is also a reasonable step. > >>>> Concepts > >>>>>>>>>>> still > >>>>>>>>> can > >>>>>>>>>>> stay the same. > >>>>>>>>>>>> > >>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with > >>>> you > >>>>> and > >>>>>>>>>>>> +1 > >>>>>>>>> for > >>>>>>>>>>> raising the discussion. > >>>>>>>>>>>> > >>>>>>>>>>>> Technically we can still think if we keep details of > >> python > >>>>> names > >>>>>>>>>>>> the > >>>>>>>>>>> same because the execution is still a DAG… but user facing > >> it > >>>>> is a > >>>>>>>>> workflow. > >>>>>>>>>>>> > >>>>>>>>>>>> Jens > >>>>>>>>>>>> > >>>>>>>>>>>> Sent from my Smartphone > >>>>>>>>>>>> > >>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter < > >>>>>> ryan.hat...@astronomer.io > >>>>>>>>> .invalid> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Everyone please sheathe your swords... at least for now. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users. > >>>>> Indeed, > >>>>>>>>>>>>> it > >>>>>>>>> has > >>>>>>>>>>>>> little meaning outside of some mathematicians and > >> software > >>>>>>>>>>>>> engineers > >>>>>>>>> for > >>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone > >>>> new > >>>>> to > >>>>>>>>>>>>> data engineering or workflow orchestration, one of the > >>>> first > >>>>>>>>>>>>> questions they > >>>>>>>>>>> will > >>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is > >>>> almost > >>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to > >>>>> worry > >>>>>>>>>>>>> about what > >>>>>>>>> that > >>>>>>>>>>>>> means; it's just a term for your workflow." > >>>>>>>>>>>>> > >>>>>>>>>>>>> The term "DAG" is problematic for at least a couple > >>>> important > >>>>>>>> reasons: > >>>>>>>>>>>>> > >>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is > >>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow > >>>> to > >>>>> be > >>>>>>>>>>>>> approachable, and > >>>>>>>>>>> using > >>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an > >>>>> initial > >>>>>>>>>>> barrier to > >>>>>>>>>>>>> understanding. > >>>>>>>>>>>>> > >>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG > >> is > >>>>> just > >>>>>>>>>>>>> one component of an Airflow workflow. The workflow > >> includes > >>>>> its > >>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters, > >> and > >>>>> other > >>>>>>>>>>>>> metadata that > >>>>>>>>> the > >>>>>>>>>>>>> DAG component doesn’t account for. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Consider the following from the Airflow homepage > >>>>>>>>>>>>> <https://airflow.apache.org/>. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Apache Airflow® is a platform created by the community to > >>>>>>>>>>> programmatically > >>>>>>>>>>>>> author, schedule and monitor workflows. > >>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page > >>>>>>>>>>>>> < > >>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html > >>>>>>>>>>>>>> , > >>>>>>>>> we > >>>>>>>>>>> can > >>>>>>>>>>>>> see that the docs explain what Airflow is without using > >>>> "DAG." > >>>>>>>>>>>>> It's > >>>>>>>>>>> only in > >>>>>>>>>>>>> the *workflow* Python code that the term is introduced > >> out > >>>> of > >>>>>>>>>>>>> nowhere > >>>>>>>>>>> as a > >>>>>>>>>>>>> comment that awkwardly tries to explain it: > >>>>>>>>>>>>> > >>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks > >>>>>>>>>>>>> > >>>>>>>>>>>>> It makes sense to not refer to DAGs in these > >> introductions > >>>> to > >>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it > >>>>>>> orchestrates > >>>>>>>> workflows*. > >>>>>>>>>>> The > >>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost > >>>> every > >>>>>>>>>>>>> user, workflows must adhere to. > >>>>>>>>>>>>> > >>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG" > >>>> and > >>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow". > >>>>>>>>>>>>> > >>>>>>>>>>>>> For example, instead of... > >>>>>>>>>>>>> > >>>>>>>>>>>>> @dag( > >>>>>>>>>>>>> schedule="@daily", > >>>>>>>>>>>>> ... > >>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1) > >>>>>>>>>>>>> ) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Users could do... > >>>>>>>>>>>>> > >>>>>>>>>>>>> @workflow( > >>>>>>>>>>>>> schedule="@daily", > >>>>>>>>>>>>> ... > >>>>>>>>>>>>> run_timeout=timedelta(hours=1) > >>>>>>>>>>>>> ) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> And with that... I will start running away. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>>>>>>>> --- To unsubscribe, e-mail: > >>>> dev-unsubscr...@airflow.apache.org > >>>>>>>>>>>> For additional commands, e-mail: > >>>> dev-h...@airflow.apache.org > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> ------------------------------------------------------------------- > >>>>>>>>>>> -- To unsubscribe, e-mail: > >>>> dev-unsubscr...@airflow.apache.org > >>>>>>>>>>> For additional commands, e-mail: > >> dev-h...@airflow.apache.org > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >