Yeah just say, when asked where the name comes from, "well, no one actually knows but..." and then make something up.
On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Just to clarify - "directed acyclic graph" is the tongue-twister, > > On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > I like what both Daniel and Brent wrote. I would very much want to be > able > > to say just "dag" without explaining it further. > > > > For me every time I explain "DAG" at a talk it's a tongue-twister, and I > > almost stutter on trying to recall how to pronounce it properly. > > > > J. > > > > > > On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi > <br...@astronomer.io.invalid> > > wrote: > > > >> I remember we explored renaming "DAG" when starting on AIP-38 to > modernize > >> the UI. Both "pipeline" or "workflow" are more descriptive of what one > is > >> actually doing while Directed Acyclic Graph is an implementation detail. > >> But I agree with Daniel Standish, at this point "DAG" has become "dag" > , a > >> word in its own right. > >> > >> Examples for "dag" are abound in community discussion, Airflow Summit > >> talks, documentation and even in the UI. Let's embrace "dag". A user > just > >> needs to learn one new word vs the technical concept behind that word. I > >> think that is much less effort than refactoring so much code, > >> documentation, blog posts, stack overflow questions, etc. > >> > >> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish > >> <daniel.stand...@astronomer.io.invalid> wrote: > >> > >> > I am skeptical. Seems like introducing a lot of pain for questionable > >> > benefit. But, I am def sympathetic to the idea. I agree the > >> association > >> > with "directed acyclic graph" is not helpful. > >> > > >> > And along those lines, I offer here some less invasive mitigations. > >> > > >> > One thing we can do no matter what is to de-emphasize the math nerd > >> origins > >> > of the name. That is to say, in docs / website / etc, *never define* > >> > airflow's "dag" concept as a directed acyclic graph. Always define it > >> as a > >> > pipeline, collection of tasks, workflow etc. > >> > > >> > The "directed acyclic graph" part of it is like a historical footnote, > >> and > >> > we could make one mention of it somewhere hidden. > >> > > >> > We could also start using lowercase in the docs in general e.g. > writing > >> > "dag" / "dags" instead of writing "DAG" / "DAGs" etc. The upper case > >> part > >> > of it makes it look like an acronym; but "dag" in airlfow is just an > >> > airflow concept and the association with "DAGs" is not really > unhelpful. > >> > > >> > In other words embrace that "dag" in airflow is its own thing, is > >> > *not* strictly > >> > speaking a directed acyclic graph (which nobody knows about anyway), > and > >> > tell them what it is in simple terms that normal people understand. > >> > > >> > > >> > On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com> > wrote: > >> > > >> > > DAG is so embedded into what we do that it will be extremely > >> difficult to > >> > > get rid of it completely. Also I think it will make a lot of > "google" > >> > > searches and "stack overflow" searches not finding the right > answers. > >> > This > >> > > is one of the strengths of Airflow - besides the community and ideas > >> that > >> > > Bernd mentioned - is the vast number of examples, problems and > >> solutions > >> > > you can so easily find (and we have to remember that all the AI > >> trained > >> > on > >> > > past data will be also rather poorly matching queries of people. > >> > > > >> > > I am not too attached to DAG. I could easily switch. And if we do - > I > >> > > would be for using workflow or pipeline instead of `dag` if not the > >> above > >> > > reason, but I think I am here with Igor that it might cause more > >> problems > >> > > than it solves. > >> > > > >> > > But I am not 100% against - if others will think it's a good idea, I > >> am > >> > ok > >> > > with it. > >> > > > >> > > J, > >> > > > >> > > > >> > > On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat > >> > > <abhishek.bha...@astronomer.io.invalid> wrote: > >> > > > >> > > > Agreed that the word DAG makes very less sense to someone new to > >> > workflow > >> > > > orchestration. But it does also show the nature of being acyclic. > >> Sure, > >> > > as > >> > > > Bas mentioned, there are ways to workaround it. Still, in my > >> opinion, > >> > > there > >> > > > is generally no need for cyclic behavior in workflow > orchestration. > >> > Most > >> > > > (*if > >> > > > not all*) cases can be in some way can be covered using an acyclic > >> > manner > >> > > > with multiple runs. Hence, the idempotency. So I would want the > >> > "acyclic" > >> > > > word to stick. > >> > > > > >> > > > Regards, > >> > > > Avi > >> > > > > >> > > > On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de> > wrote: > >> > > > > >> > > > > Brilliant, I am on the way to become an Airflow Fan; so many new > >> > ideas. > >> > > > > > >> > > > > The Term DAG is misleading; it should be replaced by the more > >> general > >> > > > Term > >> > > > > Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN) > (maybe > >> > > > without > >> > > > > a direction); > >> > > > > and ... these Graphs should be stored in a Graph Database. > >> > > > > > >> > > > > Every Node or Sup-Graph of an Airflow Graph (AFG) might be > >> assigned > >> > to > >> > > an > >> > > > > executable (Python-, Rust-, ... ) member of a library. > >> > > > > > >> > > > > A running Graph might have a different structure than a > >> configuration > >> > > > > Graph. > >> > > > > > >> > > > > Forget that if you think it's bullshit. > >> > > > > > >> > > > > Best Regards > >> > > > > > >> > > > > Bernd Ströhle > >> > > > > M: +49 171 5357916 > >> > > > > E: bernd.stroe...@gmail.com > >> > > > > > >> > > > > > >> > > > > -----Original Message----- > >> > > > > From: Igor Kholopov <ikholo...@google.com.INVALID> > >> > > > > Sent: Tuesday, October 22, 2024 12:02 PM > >> > > > > To: dev@airflow.apache.org > >> > > > > Subject: Re: Airflow should deprecate the term "DAG" for end > users > >> > > > > > >> > > > > Even though the term "DAG" is clearly suboptimal, it is part of > >> > Airflow > >> > > > > DAG definition interface at so many levels, that any attempt to > >> > change > >> > > it > >> > > > > will only introduce more chaos, not reduce it. The only thing > >> that is > >> > > > worse > >> > > > > than a poorly chosen name in the code is when there are two ways > >> to > >> > > > define > >> > > > > the same thing. Countless articles and tutorials will suddenly > >> become > >> > > > > confusing as they all refer to workflows as "DAG"s. > >> > > > > > >> > > > > We are already at risk of scaring the users away with a number > of > >> > > > breaking > >> > > > > changes in Airflow 3, promising even more breaking changes for > the > >> > most > >> > > > > basic things is not something that people are looking for. > >> Attempting > >> > > to > >> > > > > change the fundamental terms will be interpreted as an even > >> stronger > >> > > > signal > >> > > > > of project immaturity. > >> > > > > > >> > > > > Given that, I oppose the idea of changing the term in the long > >> run. I > >> > > > even > >> > > > > stricter oppose the idea of deprecating it in the DAG definition > >> > > > interface. > >> > > > > We better put our time and efforts in other places in Airflow, > of > >> > which > >> > > > > there are plenty. > >> > > > > > >> > > > > Kind regards, > >> > > > > Igor > >> > > > > > >> > > > > On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak > >> > > <b...@astronomer.io.invalid > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > Couple of thoughts: > >> > > > > > > >> > > > > > 1. The boundaries/properties of “DAG” have already faded over > >> time, > >> > > > > > for example there are now several ways to create cyclic > graphs, > >> > e.g. > >> > > > > > using the @continuous schedule. I imagine these properties > >> > vanishing > >> > > > > > even more in the future, so from that perspective I support > >> > changing > >> > > > > > “DAG" to a more generic name. > >> > > > > > > >> > > > > > 2. How other orchestration frameworks do naming: > >> > > > > > Dagster: pipeline > >> > > > > > Prefect: flow > >> > > > > > Flyte: workflow > >> > > > > > Temporal: workflow > >> > > > > > Kestra: flow > >> > > > > > > >> > > > > > I think “workflow” is the most fitting name. > >> > > > > > > >> > > > > > 3. Given the large impact of this change, I suggest defining a > >> > clear > >> > > > > > path forward. Would we first introduce the deprecation in > >> Airflow > >> > 3, > >> > > > > > and remove “DAG” in Airflow 4? > >> > > > > > > >> > > > > > Bas > >> > > > > > > >> > > > > > > On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: > >> > > > > > > > >> > > > > > > I don't see a problem with the term DAG, especially when > most > >> > other > >> > > > > > > platforms embrace the term wholeheartedly. > >> > > > > > > I don't see anything intimidating or confusing about it at > >> all, > >> > > > > > > changing the term though would be fairly confusing to most > >> users > >> > > who > >> > > > > > > have been > >> > > > > > using > >> > > > > > > the term for years. > >> > > > > > > > >> > > > > > > On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung > >> > > > > > > <t...@astronomer.io.invalid > >> > > > > > > > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > >> I totally agree with doing away with the term DAG. The only > >> > > problem > >> > > > > > (aside > >> > > > > > >> from actually telling people—including myself—to stop using > >> the > >> > > > > > >> term) > >> > > > > > is to > >> > > > > > >> come up with a reasonable alternative. > >> > > > > > >> > >> > > > > > >> I can’t recall who, but someone mentioned “workflow” is not > >> very > >> > > > > > accurate > >> > > > > > >> for Airflow. The term “definition” was proposed, but it’s a > >> bit > >> > > > > > >> broad; I tried to use it in a few places and kept finding > >> myself > >> > > > > > >> doubting “what definition?” and wanting to clarify “DAG > >> > > definition” > >> > > > > > >> (defeating the purpose). > >> > > > > > >> > >> > > > > > >> TP > >> > > > > > >> > >> > > > > > >> > >> > > > > > >>> On 22 Oct 2024, at 13:07, Jens Scheffler > >> > > > > > >>> <j_scheff...@gmx.de.INVALID> > >> > > > > > >> wrote: > >> > > > > > >>> > >> > > > > > >>> Hi Ryan, > >> > > > > > >>> > >> > > > > > >>> Thanks for posting. I share the exactly same observation, > >> had a > >> > > > > > >>> short > >> > > > > > >> laight because the DAG question is always an introduction > if > >> > > > > > >> someone > >> > > > > > joins > >> > > > > > >> the party. I think a global renaming makes sense. > Especially > >> > when > >> > > > > > >> we > >> > > > > > also > >> > > > > > >> rename Dataset to Asset this is also a reasonable step. > >> Concepts > >> > > > > > >> still > >> > > > > > can > >> > > > > > >> stay the same. > >> > > > > > >>> > >> > > > > > >>> So I hope I don‘t need to join hiding below the desk with > >> you > >> > and > >> > > > > > >>> +1 > >> > > > > > for > >> > > > > > >> raising the discussion. > >> > > > > > >>> > >> > > > > > >>> Technically we can still think if we keep details of > python > >> > names > >> > > > > > >>> the > >> > > > > > >> same because the execution is still a DAG… but user facing > it > >> > is a > >> > > > > > workflow. > >> > > > > > >>> > >> > > > > > >>> Jens > >> > > > > > >>> > >> > > > > > >>> Sent from my Smartphone > >> > > > > > >>> > >> > > > > > >>>> On 21. Oct 2024, at 23:56, Ryan Hatter < > >> > > ryan.hat...@astronomer.io > >> > > > > > .invalid> > >> > > > > > >> wrote: > >> > > > > > >>>> > >> > > > > > >>>> Everyone please sheathe your swords... at least for now. > >> > > > > > >>>> > >> > > > > > >>>> The term "DAG" has very little meaning to Airflow users. > >> > Indeed, > >> > > > > > >>>> it > >> > > > > > has > >> > > > > > >>>> little meaning outside of some mathematicians and > software > >> > > > > > >>>> engineers > >> > > > > > for > >> > > > > > >>>> whom the properties of a DAG actually matter. For someone > >> new > >> > to > >> > > > > > >>>> data engineering or workflow orchestration, one of the > >> first > >> > > > > > >>>> questions they > >> > > > > > >> will > >> > > > > > >>>> likely have is, "what on earth is a DAG?" The answer is > >> almost > >> > > > > > >>>> always, "It's a directed acyclic graph. You don't need to > >> > worry > >> > > > > > >>>> about what > >> > > > > > that > >> > > > > > >>>> means; it's just a term for your workflow." > >> > > > > > >>>> > >> > > > > > >>>> The term "DAG" is problematic for at least a couple > >> important > >> > > > > reasons: > >> > > > > > >>>> > >> > > > > > >>>> *Complexity for New Users*: As mentioned above, "DAG" is > >> > > > > > >>>> unnecessarily intimidating and confusing. We want Airflow > >> to > >> > be > >> > > > > > >>>> approachable, and > >> > > > > > >> using > >> > > > > > >>>> technical jargon like "DAG" right off the bat creates an > >> > initial > >> > > > > > >> barrier to > >> > > > > > >>>> understanding. > >> > > > > > >>>> > >> > > > > > >>>> *Disconnect Between DAG and Workflow Concepts*: The DAG > is > >> > just > >> > > > > > >>>> one component of an Airflow workflow. The workflow > includes > >> > its > >> > > > > > >>>> schedule, retries, timeouts, a dozen other parameters, > and > >> > other > >> > > > > > >>>> metadata that > >> > > > > > the > >> > > > > > >>>> DAG component doesn’t account for. > >> > > > > > >>>> > >> > > > > > >>>> Consider the following from the Airflow homepage > >> > > > > > >>>> <https://airflow.apache.org/>. > >> > > > > > >>>> > >> > > > > > >>>> Apache Airflow® is a platform created by the community to > >> > > > > > >> programmatically > >> > > > > > >>>> author, schedule and monitor workflows. > >> > > > > > >>>> Then, if we look at the "What is Airflow?" docs page > >> > > > > > >>>> < > >> > > https://airflow.apache.org/docs/apache-airflow/stable/index.html > >> > > > > > >>>> >, > >> > > > > > we > >> > > > > > >> can > >> > > > > > >>>> see that the docs explain what Airflow is without using > >> "DAG." > >> > > > > > >>>> It's > >> > > > > > >> only in > >> > > > > > >>>> the *workflow* Python code that the term is introduced > out > >> of > >> > > > > > >>>> nowhere > >> > > > > > >> as a > >> > > > > > >>>> comment that awkwardly tries to explain it: > >> > > > > > >>>> > >> > > > > > >>>> # A DAG represents a workflow, a collection of tasks > >> > > > > > >>>> > >> > > > > > >>>> It makes sense to not refer to DAGs in these > introductions > >> to > >> > > > > > >>>> Airflow, because *Airflow doesn't orchestrate DAGs; it > >> > > > orchestrates > >> > > > > workflows*. > >> > > > > > >> The > >> > > > > > >>>> DAG is the model that, for reasons irrelevant to almost > >> every > >> > > > > > >>>> user, workflows must adhere to. > >> > > > > > >>>> > >> > > > > > >>>> So, I propose at least adding an alias for the term "DAG" > >> and > >> > > > > > >>>> updating documentation to replace "DAG" with "workflow". > >> > > > > > >>>> > >> > > > > > >>>> For example, instead of... > >> > > > > > >>>> > >> > > > > > >>>> @dag( > >> > > > > > >>>> schedule="@daily", > >> > > > > > >>>> ... > >> > > > > > >>>> dagrun_timeout=timedelta(hours=1) > >> > > > > > >>>> ) > >> > > > > > >>>> > >> > > > > > >>>> Users could do... > >> > > > > > >>>> > >> > > > > > >>>> @workflow( > >> > > > > > >>>> schedule="@daily", > >> > > > > > >>>> ... > >> > > > > > >>>> run_timeout=timedelta(hours=1) > >> > > > > > >>>> ) > >> > > > > > >>>> > >> > > > > > >>>> > >> > > > > > >>>> And with that... I will start running away. > >> > > > > > >>> > >> > > > > > >>> > >> > > > > > >>> > >> > > ------------------------------------------------------------------ > >> > > > > > >>> --- To unsubscribe, e-mail: > >> dev-unsubscr...@airflow.apache.org > >> > > > > > >>> For additional commands, e-mail: > >> dev-h...@airflow.apache.org > >> > > > > > >>> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > ------------------------------------------------------------------- > >> > > > > > >> -- To unsubscribe, e-mail: > >> dev-unsubscr...@airflow.apache.org > >> > > > > > >> For additional commands, e-mail: > dev-h...@airflow.apache.org > >> > > > > > >> > >> > > > > > >> > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> --------------------------------------------------------------------- > >> > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >