Best argument in favour of keeping “dags” as a term — getting to re-use puns like https://i.imgflip.com/1xhtwh.jpg
In all seriousness: I don’t mind either way, both sides have good reasons presented. -a > On 22 Oct 2024, at 17:03, Daniel Standish > <daniel.stand...@astronomer.io.INVALID> wrote: > > Yeah just say, when asked where the name comes from, "well, no one actually > knows but..." and then make something up. > > On Tue, Oct 22, 2024 at 8:31 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Just to clarify - "directed acyclic graph" is the tongue-twister, >> >> On Tue, Oct 22, 2024 at 5:29 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> I like what both Daniel and Brent wrote. I would very much want to be >> able >>> to say just "dag" without explaining it further. >>> >>> For me every time I explain "DAG" at a talk it's a tongue-twister, and I >>> almost stutter on trying to recall how to pronounce it properly. >>> >>> J. >>> >>> >>> On Tue, Oct 22, 2024 at 5:27 PM Brent Bovenzi >> <br...@astronomer.io.invalid> >>> wrote: >>> >>>> I remember we explored renaming "DAG" when starting on AIP-38 to >> modernize >>>> the UI. Both "pipeline" or "workflow" are more descriptive of what one >> is >>>> actually doing while Directed Acyclic Graph is an implementation detail. >>>> But I agree with Daniel Standish, at this point "DAG" has become "dag" >> , a >>>> word in its own right. >>>> >>>> Examples for "dag" are abound in community discussion, Airflow Summit >>>> talks, documentation and even in the UI. Let's embrace "dag". A user >> just >>>> needs to learn one new word vs the technical concept behind that word. I >>>> think that is much less effort than refactoring so much code, >>>> documentation, blog posts, stack overflow questions, etc. >>>> >>>> On Tue, Oct 22, 2024 at 10:51 AM Daniel Standish >>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>> >>>>> I am skeptical. Seems like introducing a lot of pain for questionable >>>>> benefit. But, I am def sympathetic to the idea. I agree the >>>> association >>>>> with "directed acyclic graph" is not helpful. >>>>> >>>>> And along those lines, I offer here some less invasive mitigations. >>>>> >>>>> One thing we can do no matter what is to de-emphasize the math nerd >>>> origins >>>>> of the name. That is to say, in docs / website / etc, *never define* >>>>> airflow's "dag" concept as a directed acyclic graph. Always define it >>>> as a >>>>> pipeline, collection of tasks, workflow etc. >>>>> >>>>> The "directed acyclic graph" part of it is like a historical footnote, >>>> and >>>>> we could make one mention of it somewhere hidden. >>>>> >>>>> We could also start using lowercase in the docs in general e.g. >> writing >>>>> "dag" / "dags" instead of writing "DAG" / "DAGs" etc. The upper case >>>> part >>>>> of it makes it look like an acronym; but "dag" in airlfow is just an >>>>> airflow concept and the association with "DAGs" is not really >> unhelpful. >>>>> >>>>> In other words embrace that "dag" in airflow is its own thing, is >>>>> *not* strictly >>>>> speaking a directed acyclic graph (which nobody knows about anyway), >> and >>>>> tell them what it is in simple terms that normal people understand. >>>>> >>>>> >>>>> On Tue, Oct 22, 2024 at 7:27 AM Jarek Potiuk <ja...@potiuk.com> >> wrote: >>>>> >>>>>> DAG is so embedded into what we do that it will be extremely >>>> difficult to >>>>>> get rid of it completely. Also I think it will make a lot of >> "google" >>>>>> searches and "stack overflow" searches not finding the right >> answers. >>>>> This >>>>>> is one of the strengths of Airflow - besides the community and ideas >>>> that >>>>>> Bernd mentioned - is the vast number of examples, problems and >>>> solutions >>>>>> you can so easily find (and we have to remember that all the AI >>>> trained >>>>> on >>>>>> past data will be also rather poorly matching queries of people. >>>>>> >>>>>> I am not too attached to DAG. I could easily switch. And if we do - >> I >>>>>> would be for using workflow or pipeline instead of `dag` if not the >>>> above >>>>>> reason, but I think I am here with Igor that it might cause more >>>> problems >>>>>> than it solves. >>>>>> >>>>>> But I am not 100% against - if others will think it's a good idea, I >>>> am >>>>> ok >>>>>> with it. >>>>>> >>>>>> J, >>>>>> >>>>>> >>>>>> On Tue, Oct 22, 2024 at 3:12 PM Abhishek Bhakat >>>>>> <abhishek.bha...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> Agreed that the word DAG makes very less sense to someone new to >>>>> workflow >>>>>>> orchestration. But it does also show the nature of being acyclic. >>>> Sure, >>>>>> as >>>>>>> Bas mentioned, there are ways to workaround it. Still, in my >>>> opinion, >>>>>> there >>>>>>> is generally no need for cyclic behavior in workflow >> orchestration. >>>>> Most >>>>>>> (*if >>>>>>> not all*) cases can be in some way can be covered using an acyclic >>>>> manner >>>>>>> with multiple runs. Hence, the idempotency. So I would want the >>>>> "acyclic" >>>>>>> word to stick. >>>>>>> >>>>>>> Regards, >>>>>>> Avi >>>>>>> >>>>>>> On Tue, Oct 22, 2024 at 12:41 PM <bernd.stroe...@kosakya.de> >> wrote: >>>>>>> >>>>>>>> Brilliant, I am on the way to become an Airflow Fan; so many new >>>>> ideas. >>>>>>>> >>>>>>>> The Term DAG is misleading; it should be replaced by the more >>>> general >>>>>>> Term >>>>>>>> Airflow (Workflow) Graph (AFG) or Airflow (Petri) Net (AFN) >> (maybe >>>>>>> without >>>>>>>> a direction); >>>>>>>> and ... these Graphs should be stored in a Graph Database. >>>>>>>> >>>>>>>> Every Node or Sup-Graph of an Airflow Graph (AFG) might be >>>> assigned >>>>> to >>>>>> an >>>>>>>> executable (Python-, Rust-, ... ) member of a library. >>>>>>>> >>>>>>>> A running Graph might have a different structure than a >>>> configuration >>>>>>>> Graph. >>>>>>>> >>>>>>>> Forget that if you think it's bullshit. >>>>>>>> >>>>>>>> Best Regards >>>>>>>> >>>>>>>> Bernd Ströhle >>>>>>>> M: +49 171 5357916 >>>>>>>> E: bernd.stroe...@gmail.com >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Igor Kholopov <ikholo...@google.com.INVALID> >>>>>>>> Sent: Tuesday, October 22, 2024 12:02 PM >>>>>>>> To: dev@airflow.apache.org >>>>>>>> Subject: Re: Airflow should deprecate the term "DAG" for end >> users >>>>>>>> >>>>>>>> Even though the term "DAG" is clearly suboptimal, it is part of >>>>> Airflow >>>>>>>> DAG definition interface at so many levels, that any attempt to >>>>> change >>>>>> it >>>>>>>> will only introduce more chaos, not reduce it. The only thing >>>> that is >>>>>>> worse >>>>>>>> than a poorly chosen name in the code is when there are two ways >>>> to >>>>>>> define >>>>>>>> the same thing. Countless articles and tutorials will suddenly >>>> become >>>>>>>> confusing as they all refer to workflows as "DAG"s. >>>>>>>> >>>>>>>> We are already at risk of scaring the users away with a number >> of >>>>>>> breaking >>>>>>>> changes in Airflow 3, promising even more breaking changes for >> the >>>>> most >>>>>>>> basic things is not something that people are looking for. >>>> Attempting >>>>>> to >>>>>>>> change the fundamental terms will be interpreted as an even >>>> stronger >>>>>>> signal >>>>>>>> of project immaturity. >>>>>>>> >>>>>>>> Given that, I oppose the idea of changing the term in the long >>>> run. I >>>>>>> even >>>>>>>> stricter oppose the idea of deprecating it in the DAG definition >>>>>>> interface. >>>>>>>> We better put our time and efforts in other places in Airflow, >> of >>>>> which >>>>>>>> there are plenty. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Igor >>>>>>>> >>>>>>>> On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak >>>>>> <b...@astronomer.io.invalid >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Couple of thoughts: >>>>>>>>> >>>>>>>>> 1. The boundaries/properties of “DAG” have already faded over >>>> time, >>>>>>>>> for example there are now several ways to create cyclic >> graphs, >>>>> e.g. >>>>>>>>> using the @continuous schedule. I imagine these properties >>>>> vanishing >>>>>>>>> even more in the future, so from that perspective I support >>>>> changing >>>>>>>>> “DAG" to a more generic name. >>>>>>>>> >>>>>>>>> 2. How other orchestration frameworks do naming: >>>>>>>>> Dagster: pipeline >>>>>>>>> Prefect: flow >>>>>>>>> Flyte: workflow >>>>>>>>> Temporal: workflow >>>>>>>>> Kestra: flow >>>>>>>>> >>>>>>>>> I think “workflow” is the most fitting name. >>>>>>>>> >>>>>>>>> 3. Given the large impact of this change, I suggest defining a >>>>> clear >>>>>>>>> path forward. Would we first introduce the deprecation in >>>> Airflow >>>>> 3, >>>>>>>>> and remove “DAG” in Airflow 4? >>>>>>>>> >>>>>>>>> Bas >>>>>>>>> >>>>>>>>>> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> I don't see a problem with the term DAG, especially when >> most >>>>> other >>>>>>>>>> platforms embrace the term wholeheartedly. >>>>>>>>>> I don't see anything intimidating or confusing about it at >>>> all, >>>>>>>>>> changing the term though would be fairly confusing to most >>>> users >>>>>> who >>>>>>>>>> have been >>>>>>>>> using >>>>>>>>>> the term for years. >>>>>>>>>> >>>>>>>>>> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung >>>>>>>>>> <t...@astronomer.io.invalid >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I totally agree with doing away with the term DAG. The only >>>>>> problem >>>>>>>>> (aside >>>>>>>>>>> from actually telling people—including myself—to stop using >>>> the >>>>>>>>>>> term) >>>>>>>>> is to >>>>>>>>>>> come up with a reasonable alternative. >>>>>>>>>>> >>>>>>>>>>> I can’t recall who, but someone mentioned “workflow” is not >>>> very >>>>>>>>> accurate >>>>>>>>>>> for Airflow. The term “definition” was proposed, but it’s a >>>> bit >>>>>>>>>>> broad; I tried to use it in a few places and kept finding >>>> myself >>>>>>>>>>> doubting “what definition?” and wanting to clarify “DAG >>>>>> definition” >>>>>>>>>>> (defeating the purpose). >>>>>>>>>>> >>>>>>>>>>> TP >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 22 Oct 2024, at 13:07, Jens Scheffler >>>>>>>>>>>> <j_scheff...@gmx.de.INVALID> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for posting. I share the exactly same observation, >>>> had a >>>>>>>>>>>> short >>>>>>>>>>> laight because the DAG question is always an introduction >> if >>>>>>>>>>> someone >>>>>>>>> joins >>>>>>>>>>> the party. I think a global renaming makes sense. >> Especially >>>>> when >>>>>>>>>>> we >>>>>>>>> also >>>>>>>>>>> rename Dataset to Asset this is also a reasonable step. >>>> Concepts >>>>>>>>>>> still >>>>>>>>> can >>>>>>>>>>> stay the same. >>>>>>>>>>>> >>>>>>>>>>>> So I hope I don‘t need to join hiding below the desk with >>>> you >>>>> and >>>>>>>>>>>> +1 >>>>>>>>> for >>>>>>>>>>> raising the discussion. >>>>>>>>>>>> >>>>>>>>>>>> Technically we can still think if we keep details of >> python >>>>> names >>>>>>>>>>>> the >>>>>>>>>>> same because the execution is still a DAG… but user facing >> it >>>>> is a >>>>>>>>> workflow. >>>>>>>>>>>> >>>>>>>>>>>> Jens >>>>>>>>>>>> >>>>>>>>>>>> Sent from my Smartphone >>>>>>>>>>>> >>>>>>>>>>>>> On 21. Oct 2024, at 23:56, Ryan Hatter < >>>>>> ryan.hat...@astronomer.io >>>>>>>>> .invalid> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Everyone please sheathe your swords... at least for now. >>>>>>>>>>>>> >>>>>>>>>>>>> The term "DAG" has very little meaning to Airflow users. >>>>> Indeed, >>>>>>>>>>>>> it >>>>>>>>> has >>>>>>>>>>>>> little meaning outside of some mathematicians and >> software >>>>>>>>>>>>> engineers >>>>>>>>> for >>>>>>>>>>>>> whom the properties of a DAG actually matter. For someone >>>> new >>>>> to >>>>>>>>>>>>> data engineering or workflow orchestration, one of the >>>> first >>>>>>>>>>>>> questions they >>>>>>>>>>> will >>>>>>>>>>>>> likely have is, "what on earth is a DAG?" The answer is >>>> almost >>>>>>>>>>>>> always, "It's a directed acyclic graph. You don't need to >>>>> worry >>>>>>>>>>>>> about what >>>>>>>>> that >>>>>>>>>>>>> means; it's just a term for your workflow." >>>>>>>>>>>>> >>>>>>>>>>>>> The term "DAG" is problematic for at least a couple >>>> important >>>>>>>> reasons: >>>>>>>>>>>>> >>>>>>>>>>>>> *Complexity for New Users*: As mentioned above, "DAG" is >>>>>>>>>>>>> unnecessarily intimidating and confusing. We want Airflow >>>> to >>>>> be >>>>>>>>>>>>> approachable, and >>>>>>>>>>> using >>>>>>>>>>>>> technical jargon like "DAG" right off the bat creates an >>>>> initial >>>>>>>>>>> barrier to >>>>>>>>>>>>> understanding. >>>>>>>>>>>>> >>>>>>>>>>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG >> is >>>>> just >>>>>>>>>>>>> one component of an Airflow workflow. The workflow >> includes >>>>> its >>>>>>>>>>>>> schedule, retries, timeouts, a dozen other parameters, >> and >>>>> other >>>>>>>>>>>>> metadata that >>>>>>>>> the >>>>>>>>>>>>> DAG component doesn’t account for. >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the following from the Airflow homepage >>>>>>>>>>>>> <https://airflow.apache.org/>. >>>>>>>>>>>>> >>>>>>>>>>>>> Apache Airflow® is a platform created by the community to >>>>>>>>>>> programmatically >>>>>>>>>>>>> author, schedule and monitor workflows. >>>>>>>>>>>>> Then, if we look at the "What is Airflow?" docs page >>>>>>>>>>>>> < >>>>>> https://airflow.apache.org/docs/apache-airflow/stable/index.html >>>>>>>>>>>>>> , >>>>>>>>> we >>>>>>>>>>> can >>>>>>>>>>>>> see that the docs explain what Airflow is without using >>>> "DAG." >>>>>>>>>>>>> It's >>>>>>>>>>> only in >>>>>>>>>>>>> the *workflow* Python code that the term is introduced >> out >>>> of >>>>>>>>>>>>> nowhere >>>>>>>>>>> as a >>>>>>>>>>>>> comment that awkwardly tries to explain it: >>>>>>>>>>>>> >>>>>>>>>>>>> # A DAG represents a workflow, a collection of tasks >>>>>>>>>>>>> >>>>>>>>>>>>> It makes sense to not refer to DAGs in these >> introductions >>>> to >>>>>>>>>>>>> Airflow, because *Airflow doesn't orchestrate DAGs; it >>>>>>> orchestrates >>>>>>>> workflows*. >>>>>>>>>>> The >>>>>>>>>>>>> DAG is the model that, for reasons irrelevant to almost >>>> every >>>>>>>>>>>>> user, workflows must adhere to. >>>>>>>>>>>>> >>>>>>>>>>>>> So, I propose at least adding an alias for the term "DAG" >>>> and >>>>>>>>>>>>> updating documentation to replace "DAG" with "workflow". >>>>>>>>>>>>> >>>>>>>>>>>>> For example, instead of... >>>>>>>>>>>>> >>>>>>>>>>>>> @dag( >>>>>>>>>>>>> schedule="@daily", >>>>>>>>>>>>> ... >>>>>>>>>>>>> dagrun_timeout=timedelta(hours=1) >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>>> Users could do... >>>>>>>>>>>>> >>>>>>>>>>>>> @workflow( >>>>>>>>>>>>> schedule="@daily", >>>>>>>>>>>>> ... >>>>>>>>>>>>> run_timeout=timedelta(hours=1) >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> And with that... I will start running away. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> ------------------------------------------------------------------ >>>>>>>>>>>> --- To unsubscribe, e-mail: >>>> dev-unsubscr...@airflow.apache.org >>>>>>>>>>>> For additional commands, e-mail: >>>> dev-h...@airflow.apache.org >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> ------------------------------------------------------------------- >>>>>>>>>>> -- To unsubscribe, e-mail: >>>> dev-unsubscr...@airflow.apache.org >>>>>>>>>>> For additional commands, e-mail: >> dev-h...@airflow.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>