In my experience, when you ask those with Airflow experience what a dag is, they'll start talking about workflow attributes - stuff like dags being a series of steps or tasks with owners. The structure doesn't come up.
Echo-ing others, at this point, my vote is to embrace the name and de-emphasize the mathematical structure aspect. On Tue, Oct 22, 2024 at 3:47 PM Vikram Koka <vik...@astronomer.io.invalid> wrote: > It's an interesting discussion and I remember struggling with this when I > started working with Airflow. > But, I also agree with the viewpoint of it being an established concept now > regardless of the origin. > > I am personally leaning towards the perspective best expressed by Daniel > Standish and Brent of using Dag as a word, rather than the computer science > concept. > > Best regards, > Vikram > > > On Tue, Oct 22, 2024 at 9:46 AM Oliveira, Niko <oniko...@amazon.com.invalid > > > wrote: > > > I agree with the general sentiment of: You're right Ryan, DAG isn't great > > and I'd rather workflow, but changing it will cause much more wreckage > than > > it solves. > > > > Also agree with the idea to just move away from defining DAG. I think > > we've been naturally doing that as a community for a while now anyway, so > > that feels like a natural step. > > > > Cheers, > > Niko > > > > ________________________________ > > From: Ash Berlin-Taylor <a...@apache.org> > > Sent: Tuesday, October 22, 2024 9:06:39 AM > > To: dev@airflow.apache.org > > Subject: RE: [EXT] Airflow should deprecate the term "DAG" for end users > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and > know > > the content is safe. > > > > > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne > pouvez > > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain > que > > le contenu ne présente aucun risque. > > > > > > > > Best argument in favour of keeping “dags” as a term — getting to re-use > > puns like https://i.imgflip.com/1xhtwh.jpg > > > > In all seriousness: I don’t mind either way, both sides have good reasons > > presented. > > > > -a > > > > > On 22 Oct 2024, at 17:03, Daniel Standish > > <daniel.stand...@astronomer.io.INVALID> wrote: > > > > > > Yeah just say, when asked where the name comes from, "well, no one > > actually > > > knows but..." and then make something up. > > > > > > On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > >> Just to clarify - "directed acyclic graph" is the tongue-twister, > > >> > > >> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > >> > > >>> I like what both Daniel and Brent wrote. I would very much want to be > > >> able > > >>> to say just "dag" without explaining it further. > > >>> > > >>> For me every time I explain "DAG" at a talk it's a tongue-twister, > and > > I > > >>> almost stutter on trying to recall how to pronounce it properly. > > >>> > > >>> J. > > >>> > > >>> > > >>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi > > >> <br...@astronomer.io.invalid> > > >>> wrote: > > >>> > > >>>> I remember we explored renaming "DAG" when starting on AIP-38 to > > >> modernize > > >>>> the UI. Both "pipeline" or "workflow" are more descriptive of what > one > > >> is > > >>>> actually doing while Directed Acyclic Graph is an implementation > > detail. > > >>>> But I agree with Daniel Standish, at this point "DAG" has become > "dag" > > >> , a > > >>>> word in its own right. > > >>>> > > >>>> Examples for "dag" are abound in community discussion, Airflow > Summit > > >>>> talks, documentation and even in the UI. Let's embrace "dag". A user > > >> just > > >>>> needs to learn one new word vs the technical concept behind that > > word. I > > >>>> think that is much less effort than refactoring so much code, > > >>>> documentation, blog posts, stack overflow questions, etc. > > >>>> > > >>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish > > >>>> <daniel.stand...@astronomer.io.invalid> wrote: > > >>>> > > >>>>> I am skeptical. Seems like introducing a lot of pain for > > questionable > > >>>>> benefit. But, I am def sympathetic to the idea. I agree the > > >>>> association > > >>>>> with "directed acyclic graph" is not helpful. > > >>>>> > > >>>>> And along those lines, I offer here some less invasive mitigations. > > >>>>> > > >>>>> One thing we can do no matter what is to de-emphasize the math nerd > > >>>> origins > > >>>>> of the name. That is to say, in docs / website / etc, *never > define* > > >>>>> airflow's "dag" concept as a directed acyclic graph. Always define > > it > > >>>> as a > > >>>>> pipeline, collection of tasks, workflow etc. > > >>>>> > > >>>>> The "directed acyclic graph" part of it is like a historical > > footnote, > > >>>> and > > >>>>> we could make one mention of it somewhere hidden. > > >>>>> > > >>>>> We could also start using lowercase in the docs in general e.g. > > >> writing > > >>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc. The upper > case > > >>>> part > > >>>>> of it makes it look like an acronym; but "dag" in airlfow is just > an > > >>>>> airflow concept and the association with "DAGs" is not really > > >> unhelpful. > > >>>>> > > >>>>> In other words embrace that "dag" in airflow is its own thing, is > > >>>>> *not* strictly > > >>>>> speaking a directed acyclic graph (which nobody knows about > anyway), > > >> and > > >>>>> tell them what it is in simple terms that normal people understand. > > >>>>> > > >>>>> > > >>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com> > > >> wrote: > > >>>>> > > >>>>>> DAG is so embedded into what we do that it will be extremely > > >>>> difficult to > > >>>>>> get rid of it completely. Also I think it will make a lot of > > >> "google" > > >>>>>> searches and "stack overflow" searches not finding the right > > >> answers. > > >>>>> This > > >>>>>> is one of the strengths of Airflow - besides the community and > ideas > > >>>> that > > >>>>>> Bernd mentioned - is the vast number of examples, problems and > > >>>> solutions > > >>>>>> you can so easily find (and we have to remember that all the AI > > >>>> trained > > >>>>> on > > >>>>>> past data will be also rather poorly matching queries of people. > > >>>>>> > > >>>>>> I am not too attached to DAG. I could easily switch. And if we do > - > > >> I > > >>>>>> would be for using workflow or pipeline instead of `dag` if not > the > > >>>> above > > >>>>>> reason, but I think I am here with Igor that it might cause more > > >>>> problems > > >>>>>> than it solves. > > >>>>>> > > >>>>>> But I am not 100% against - if others will think it's a good > idea, I > > >>>> am > > >>>>> ok > > >>>>>> with it. > > >>>>>> > > >>>>>> J, > > >>>>>> > > >>>>>> > > >>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat > > >>>>>> <abhishek.bha...@astronomer.io.invalid> wrote: > > >>>>>> > > >>>>>>> Agreed that the word DAG makes very less sense to someone new to > > >>>>> workflow > > >>>>>>> orchestration. But it does also show the nature of being acyclic. > > >>>> Sure, > > >>>>>> as > > >>>>>>> Bas mentioned, there are ways to workaround it. Still, in my > > >>>> opinion, > > >>>>>> there > > >>>>>>> is generally no need for cyclic behavior in workflow > > >> orchestration. > > >>>>> Most > > >>>>>>> (*if > > >>>>>>> not all*) cases can be in some way can be covered using an > acyclic > > >>>>> manner > > >>>>>>> with multiple runs. Hence, the idempotency. So I would want the > > >>>>> "acyclic" > > >>>>>>> word to stick. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Avi > > >>>>>>> > > >>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de> > > >> wrote: > > >>>>>>> > > >>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new > > >>>>> ideas. > > >>>>>>>> > > >>>>>>>> The Term DAG is misleading; it should be replaced by the more > > >>>> general > > >>>>>>> Term > > >>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN) > > >> (maybe > > >>>>>>> without > > >>>>>>>> a direction); > > >>>>>>>> and ... these Graphs should be stored in a Graph Database. > > >>>>>>>> > > >>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be > > >>>> assigned > > >>>>> to > > >>>>>> an > > >>>>>>>> executable (Python-, Rust-, ... ) member of a library. > > >>>>>>>> > > >>>>>>>> A running Graph might have a different structure than a > > >>>> configuration > > >>>>>>>> Graph. > > >>>>>>>> > > >>>>>>>> Forget that if you think it's bullshit. > > >>>>>>>> > > >>>>>>>> Best Regards > > >>>>>>>> > > >>>>>>>> Bernd Ströhle > > >>>>>>>> M: +49 171 5357916 > > >>>>>>>> E: bernd.stroe...@gmail.com > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> -----Original Message----- > > >>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID> > > >>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM > > >>>>>>>> To: dev@airflow.apache.org > > >>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end > > >> users > > >>>>>>>> > > >>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of > > >>>>> Airflow > > >>>>>>>> DAG definition interface at so many levels, that any attempt to > > >>>>> change > > >>>>>> it > > >>>>>>>> will only introduce more chaos, not reduce it. The only thing > > >>>> that is > > >>>>>>> worse > > >>>>>>>> than a poorly chosen name in the code is when there are two ways > > >>>> to > > >>>>>>> define > > >>>>>>>> the same thing. Countless articles and tutorials will suddenly > > >>>> become > > >>>>>>>> confusing as they all refer to workflows as "DAG"s. > > >>>>>>>> > > >>>>>>>> We are already at risk of scaring the users away with a number > > >> of > > >>>>>>> breaking > > >>>>>>>> changes in Airflow 3, promising even more breaking changes for > > >> the > > >>>>> most > > >>>>>>>> basic things is not something that people are looking for. > > >>>> Attempting > > >>>>>> to > > >>>>>>>> change the fundamental terms will be interpreted as an even > > >>>> stronger > > >>>>>>> signal > > >>>>>>>> of project immaturity. > > >>>>>>>> > > >>>>>>>> Given that, I oppose the idea of changing the term in the long > > >>>> run. I > > >>>>>>> even > > >>>>>>>> stricter oppose the idea of deprecating it in the DAG definition > > >>>>>>> interface. > > >>>>>>>> We better put our time and efforts in other places in Airflow, > > >> of > > >>>>> which > > >>>>>>>> there are plenty. > > >>>>>>>> > > >>>>>>>> Kind regards, > > >>>>>>>> Igor > > >>>>>>>> > > >>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak > > >>>>>> <b...@astronomer.io.invalid > > >>>>>>>> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Couple of thoughts: > > >>>>>>>>> > > >>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over > > >>>> time, > > >>>>>>>>> for example there are now several ways to create cyclic > > >> graphs, > > >>>>> e.g. > > >>>>>>>>> using the @continuous schedule. I imagine these properties > > >>>>> vanishing > > >>>>>>>>> even more in the future, so from that perspective I support > > >>>>> changing > > >>>>>>>>> “DAG" to a more generic name. > > >>>>>>>>> > > >>>>>>>>> 2. How other orchestration frameworks do naming: > > >>>>>>>>> Dagster: pipeline > > >>>>>>>>> Prefect: flow > > >>>>>>>>> Flyte: workflow > > >>>>>>>>> Temporal: workflow > > >>>>>>>>> Kestra: flow > > >>>>>>>>> > > >>>>>>>>> I think “workflow” is the most fitting name. > > >>>>>>>>> > > >>>>>>>>> 3. Given the large impact of this change, I suggest defining a > > >>>>> clear > > >>>>>>>>> path forward. Would we first introduce the deprecation in > > >>>> Airflow > > >>>>> 3, > > >>>>>>>>> and remove “DAG” in Airflow 4? > > >>>>>>>>> > > >>>>>>>>> Bas > > >>>>>>>>> > > >>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: > > >>>>>>>>>> > > >>>>>>>>>> I don't see a problem with the term DAG, especially when > > >> most > > >>>>> other > > >>>>>>>>>> platforms embrace the term wholeheartedly. > > >>>>>>>>>> I don't see anything intimidating or confusing about it at > > >>>> all, > > >>>>>>>>>> changing the term though would be fairly confusing to most > > >>>> users > > >>>>>> who > > >>>>>>>>>> have been > > >>>>>>>>> using > > >>>>>>>>>> the term for years. > > >>>>>>>>>> > > >>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung > > >>>>>>>>>> <t...@astronomer.io.invalid > > >>>>>>>>>> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> I totally agree with doing away with the term DAG. The only > > >>>>>> problem > > >>>>>>>>> (aside > > >>>>>>>>>>> from actually telling people—including myself—to stop using > > >>>> the > > >>>>>>>>>>> term) > > >>>>>>>>> is to > > >>>>>>>>>>> come up with a reasonable alternative. > > >>>>>>>>>>> > > >>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not > > >>>> very > > >>>>>>>>> accurate > > >>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a > > >>>> bit > > >>>>>>>>>>> broad; I tried to use it in a few places and kept finding > > >>>> myself > > >>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG > > >>>>>> definition” > > >>>>>>>>>>> (defeating the purpose). > > >>>>>>>>>>> > > >>>>>>>>>>> TP > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler > > >>>>>>>>>>>> <j_scheff...@gmx.de.INVALID> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> Hi Ryan, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks for posting. I share the exactly same observation, > > >>>> had a > > >>>>>>>>>>>> short > > >>>>>>>>>>> laight because the DAG question is always an introduction > > >> if > > >>>>>>>>>>> someone > > >>>>>>>>> joins > > >>>>>>>>>>> the party. I think a global renaming makes sense. > > >> Especially > > >>>>> when > > >>>>>>>>>>> we > > >>>>>>>>> also > > >>>>>>>>>>> rename Dataset to Asset this is also a reasonable step. > > >>>> Concepts > > >>>>>>>>>>> still > > >>>>>>>>> can > > >>>>>>>>>>> stay the same. > > >>>>>>>>>>>> > > >>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with > > >>>> you > > >>>>> and > > >>>>>>>>>>>> +1 > > >>>>>>>>> for > > >>>>>>>>>>> raising the discussion. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Technically we can still think if we keep details of > > >> python > > >>>>> names > > >>>>>>>>>>>> the > > >>>>>>>>>>> same because the execution is still a DAG… but user facing > > >> it > > >>>>> is a > > >>>>>>>>> workflow. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Jens > > >>>>>>>>>>>> > > >>>>>>>>>>>> Sent from my Smartphone > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter < > > >>>>>> ryan.hat...@astronomer.io > > >>>>>>>>> .invalid> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Everyone please sheathe your swords... at least for now. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users. > > >>>>> Indeed, > > >>>>>>>>>>>>> it > > >>>>>>>>> has > > >>>>>>>>>>>>> little meaning outside of some mathematicians and > > >> software > > >>>>>>>>>>>>> engineers > > >>>>>>>>> for > > >>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone > > >>>> new > > >>>>> to > > >>>>>>>>>>>>> data engineering or workflow orchestration, one of the > > >>>> first > > >>>>>>>>>>>>> questions they > > >>>>>>>>>>> will > > >>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is > > >>>> almost > > >>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to > > >>>>> worry > > >>>>>>>>>>>>> about what > > >>>>>>>>> that > > >>>>>>>>>>>>> means; it's just a term for your workflow." > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> The term "DAG" is problematic for at least a couple > > >>>> important > > >>>>>>>> reasons: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is > > >>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow > > >>>> to > > >>>>> be > > >>>>>>>>>>>>> approachable, and > > >>>>>>>>>>> using > > >>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an > > >>>>> initial > > >>>>>>>>>>> barrier to > > >>>>>>>>>>>>> understanding. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG > > >> is > > >>>>> just > > >>>>>>>>>>>>> one component of an Airflow workflow. The workflow > > >> includes > > >>>>> its > > >>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters, > > >> and > > >>>>> other > > >>>>>>>>>>>>> metadata that > > >>>>>>>>> the > > >>>>>>>>>>>>> DAG component doesn’t account for. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Consider the following from the Airflow homepage > > >>>>>>>>>>>>> <https://airflow.apache.org/>. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Apache Airflow® is a platform created by the community to > > >>>>>>>>>>> programmatically > > >>>>>>>>>>>>> author, schedule and monitor workflows. > > >>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page > > >>>>>>>>>>>>> < > > >>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html > > >>>>>>>>>>>>>> , > > >>>>>>>>> we > > >>>>>>>>>>> can > > >>>>>>>>>>>>> see that the docs explain what Airflow is without using > > >>>> "DAG." > > >>>>>>>>>>>>> It's > > >>>>>>>>>>> only in > > >>>>>>>>>>>>> the *workflow* Python code that the term is introduced > > >> out > > >>>> of > > >>>>>>>>>>>>> nowhere > > >>>>>>>>>>> as a > > >>>>>>>>>>>>> comment that awkwardly tries to explain it: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> It makes sense to not refer to DAGs in these > > >> introductions > > >>>> to > > >>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it > > >>>>>>> orchestrates > > >>>>>>>> workflows*. > > >>>>>>>>>>> The > > >>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost > > >>>> every > > >>>>>>>>>>>>> user, workflows must adhere to. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG" > > >>>> and > > >>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow". > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> For example, instead of... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> @dag( > > >>>>>>>>>>>>> schedule="@daily", > > >>>>>>>>>>>>> ... > > >>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1) > > >>>>>>>>>>>>> ) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Users could do... > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> @workflow( > > >>>>>>>>>>>>> schedule="@daily", > > >>>>>>>>>>>>> ... > > >>>>>>>>>>>>> run_timeout=timedelta(hours=1) > > >>>>>>>>>>>>> ) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> And with that... I will start running away. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>> ------------------------------------------------------------------ > > >>>>>>>>>>>> --- To unsubscribe, e-mail: > > >>>> dev-unsubscr...@airflow.apache.org > > >>>>>>>>>>>> For additional commands, e-mail: > > >>>> dev-h...@airflow.apache.org > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>> > ------------------------------------------------------------------- > > >>>>>>>>>>> -- To unsubscribe, e-mail: > > >>>> dev-unsubscr...@airflow.apache.org > > >>>>>>>>>>> For additional commands, e-mail: > > >> dev-h...@airflow.apache.org > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>> > --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > >