Even though the term "DAG" is clearly suboptimal, it is part of Airflow DAG definition interface at so many levels, that any attempt to change it will only introduce more chaos, not reduce it. The only thing that is worse than a poorly chosen name in the code is when there are two ways to define the same thing. Countless articles and tutorials will suddenly become confusing as they all refer to workflows as "DAG"s.
We are already at risk of scaring the users away with a number of breaking changes in Airflow 3, promising even more breaking changes for the most basic things is not something that people are looking for. Attempting to change the fundamental terms will be interpreted as an even stronger signal of project immaturity. Given that, I oppose the idea of changing the term in the long run. I even stricter oppose the idea of deprecating it in the DAG definition interface. We better put our time and efforts in other places in Airflow, of which there are plenty. Kind regards, Igor On Tue, Oct 22, 2024 at 10:36 AM Bas Harenslak <b...@astronomer.io.invalid> wrote: > Couple of thoughts: > > 1. The boundaries/properties of “DAG” have already faded over time, for > example there are now several ways to create cyclic graphs, e.g. using the > @continuous schedule. I imagine these properties vanishing even more in the > future, so from that perspective I support changing “DAG" to a more generic > name. > > 2. How other orchestration frameworks do naming: > Dagster: pipeline > Prefect: flow > Flyte: workflow > Temporal: workflow > Kestra: flow > > I think “workflow” is the most fitting name. > > 3. Given the large impact of this change, I suggest defining a clear path > forward. Would we first introduce the deprecation in Airflow 3, and remove > “DAG” in Airflow 4? > > Bas > > > On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: > > > > I don't see a problem with the term DAG, especially when most other > > platforms embrace the term wholeheartedly. > > I don't see anything intimidating or confusing about it at all, changing > > the term though would be fairly confusing to most users who have been > using > > the term for years. > > > > On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung <t...@astronomer.io.invalid > > > > wrote: > > > >> I totally agree with doing away with the term DAG. The only problem > (aside > >> from actually telling people—including myself—to stop using the term) > is to > >> come up with a reasonable alternative. > >> > >> I can’t recall who, but someone mentioned “workflow” is not very > accurate > >> for Airflow. The term “definition” was proposed, but it’s a bit broad; I > >> tried to use it in a few places and kept finding myself doubting “what > >> definition?” and wanting to clarify “DAG definition” (defeating the > >> purpose). > >> > >> TP > >> > >> > >>> On 22 Oct 2024, at 13:07, Jens Scheffler <j_scheff...@gmx.de.INVALID> > >> wrote: > >>> > >>> Hi Ryan, > >>> > >>> Thanks for posting. I share the exactly same observation, had a short > >> laight because the DAG question is always an introduction if someone > joins > >> the party. I think a global renaming makes sense. Especially when we > also > >> rename Dataset to Asset this is also a reasonable step. Concepts still > can > >> stay the same. > >>> > >>> So I hope I don‘t need to join hiding below the desk with you and +1 > for > >> raising the discussion. > >>> > >>> Technically we can still think if we keep details of python names the > >> same because the execution is still a DAG… but user facing it is a > workflow. > >>> > >>> Jens > >>> > >>> Sent from my Smartphone > >>> > >>>> On 21. Oct 2024, at 23:56, Ryan Hatter <ryan.hat...@astronomer.io > .invalid> > >> wrote: > >>>> > >>>> Everyone please sheathe your swords... at least for now. > >>>> > >>>> The term "DAG" has very little meaning to Airflow users. Indeed, it > has > >>>> little meaning outside of some mathematicians and software engineers > for > >>>> whom the properties of a DAG actually matter. For someone new to data > >>>> engineering or workflow orchestration, one of the first questions they > >> will > >>>> likely have is, "what on earth is a DAG?" The answer is almost always, > >>>> "It's a directed acyclic graph. You don't need to worry about what > that > >>>> means; it's just a term for your workflow." > >>>> > >>>> The term "DAG" is problematic for at least a couple important reasons: > >>>> > >>>> *Complexity for New Users*: As mentioned above, "DAG" is unnecessarily > >>>> intimidating and confusing. We want Airflow to be approachable, and > >> using > >>>> technical jargon like "DAG" right off the bat creates an initial > >> barrier to > >>>> understanding. > >>>> > >>>> *Disconnect Between DAG and Workflow Concepts*: The DAG is just one > >>>> component of an Airflow workflow. The workflow includes its schedule, > >>>> retries, timeouts, a dozen other parameters, and other metadata that > the > >>>> DAG component doesn’t account for. > >>>> > >>>> Consider the following from the Airflow homepage > >>>> <https://airflow.apache.org/>. > >>>> > >>>> Apache Airflow® is a platform created by the community to > >> programmatically > >>>> author, schedule and monitor workflows. > >>>> Then, if we look at the "What is Airflow?" docs page > >>>> <https://airflow.apache.org/docs/apache-airflow/stable/index.html>, > we > >> can > >>>> see that the docs explain what Airflow is without using "DAG." It's > >> only in > >>>> the *workflow* Python code that the term is introduced out of nowhere > >> as a > >>>> comment that awkwardly tries to explain it: > >>>> > >>>> # A DAG represents a workflow, a collection of tasks > >>>> > >>>> It makes sense to not refer to DAGs in these introductions to Airflow, > >>>> because *Airflow doesn't orchestrate DAGs; it orchestrates workflows*. > >> The > >>>> DAG is the model that, for reasons irrelevant to almost every user, > >>>> workflows must adhere to. > >>>> > >>>> So, I propose at least adding an alias for the term "DAG" and updating > >>>> documentation to replace "DAG" with "workflow". > >>>> > >>>> For example, instead of... > >>>> > >>>> @dag( > >>>> schedule="@daily", > >>>> ... > >>>> dagrun_timeout=timedelta(hours=1) > >>>> ) > >>>> > >>>> Users could do... > >>>> > >>>> @workflow( > >>>> schedule="@daily", > >>>> ... > >>>> run_timeout=timedelta(hours=1) > >>>> ) > >>>> > >>>> > >>>> And with that... I will start running away. > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > >> > >