Couple of thoughts: 1. The boundaries/properties of “DAG” have already faded over time, for example there are now several ways to create cyclic graphs, e.g. using the @continuous schedule. I imagine these properties vanishing even more in the future, so from that perspective I support changing “DAG" to a more generic name.
2. How other orchestration frameworks do naming: Dagster: pipeline Prefect: flow Flyte: workflow Temporal: workflow Kestra: flow I think “workflow” is the most fitting name. 3. Given the large impact of this change, I suggest defining a clear path forward. Would we first introduce the deprecation in Airflow 3, and remove “DAG” in Airflow 4? Bas > On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote: > > I don't see a problem with the term DAG, especially when most other > platforms embrace the term wholeheartedly. > I don't see anything intimidating or confusing about it at all, changing > the term though would be fairly confusing to most users who have been using > the term for years. > > On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung <t...@astronomer.io.invalid> > wrote: > >> I totally agree with doing away with the term DAG. The only problem (aside >> from actually telling people—including myself—to stop using the term) is to >> come up with a reasonable alternative. >> >> I can’t recall who, but someone mentioned “workflow” is not very accurate >> for Airflow. The term “definition” was proposed, but it’s a bit broad; I >> tried to use it in a few places and kept finding myself doubting “what >> definition?” and wanting to clarify “DAG definition” (defeating the >> purpose). >> >> TP >> >> >>> On 22 Oct 2024, at 13:07, Jens Scheffler <j_scheff...@gmx.de.INVALID> >> wrote: >>> >>> Hi Ryan, >>> >>> Thanks for posting. I share the exactly same observation, had a short >> laight because the DAG question is always an introduction if someone joins >> the party. I think a global renaming makes sense. Especially when we also >> rename Dataset to Asset this is also a reasonable step. Concepts still can >> stay the same. >>> >>> So I hope I don‘t need to join hiding below the desk with you and +1 for >> raising the discussion. >>> >>> Technically we can still think if we keep details of python names the >> same because the execution is still a DAG… but user facing it is a workflow. >>> >>> Jens >>> >>> Sent from my Smartphone >>> >>>> On 21. Oct 2024, at 23:56, Ryan Hatter <ryan.hat...@astronomer.io.invalid> >> wrote: >>>> >>>> Everyone please sheathe your swords... at least for now. >>>> >>>> The term "DAG" has very little meaning to Airflow users. Indeed, it has >>>> little meaning outside of some mathematicians and software engineers for >>>> whom the properties of a DAG actually matter. For someone new to data >>>> engineering or workflow orchestration, one of the first questions they >> will >>>> likely have is, "what on earth is a DAG?" The answer is almost always, >>>> "It's a directed acyclic graph. You don't need to worry about what that >>>> means; it's just a term for your workflow." >>>> >>>> The term "DAG" is problematic for at least a couple important reasons: >>>> >>>> *Complexity for New Users*: As mentioned above, "DAG" is unnecessarily >>>> intimidating and confusing. We want Airflow to be approachable, and >> using >>>> technical jargon like "DAG" right off the bat creates an initial >> barrier to >>>> understanding. >>>> >>>> *Disconnect Between DAG and Workflow Concepts*: The DAG is just one >>>> component of an Airflow workflow. The workflow includes its schedule, >>>> retries, timeouts, a dozen other parameters, and other metadata that the >>>> DAG component doesn’t account for. >>>> >>>> Consider the following from the Airflow homepage >>>> <https://airflow.apache.org/>. >>>> >>>> Apache Airflow® is a platform created by the community to >> programmatically >>>> author, schedule and monitor workflows. >>>> Then, if we look at the "What is Airflow?" docs page >>>> <https://airflow.apache.org/docs/apache-airflow/stable/index.html>, we >> can >>>> see that the docs explain what Airflow is without using "DAG." It's >> only in >>>> the *workflow* Python code that the term is introduced out of nowhere >> as a >>>> comment that awkwardly tries to explain it: >>>> >>>> # A DAG represents a workflow, a collection of tasks >>>> >>>> It makes sense to not refer to DAGs in these introductions to Airflow, >>>> because *Airflow doesn't orchestrate DAGs; it orchestrates workflows*. >> The >>>> DAG is the model that, for reasons irrelevant to almost every user, >>>> workflows must adhere to. >>>> >>>> So, I propose at least adding an alias for the term "DAG" and updating >>>> documentation to replace "DAG" with "workflow". >>>> >>>> For example, instead of... >>>> >>>> @dag( >>>> schedule="@daily", >>>> ... >>>> dagrun_timeout=timedelta(hours=1) >>>> ) >>>> >>>> Users could do... >>>> >>>> @workflow( >>>> schedule="@daily", >>>> ... >>>> run_timeout=timedelta(hours=1) >>>> ) >>>> >>>> >>>> And with that... I will start running away. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>