Couple of thoughts:

1. The boundaries/properties of “DAG” have already faded over time, for example 
there are now several ways to create cyclic graphs, e.g. using the @continuous 
schedule. I imagine these properties vanishing even more in the future, so from 
that perspective I support changing “DAG" to a more generic name.

2. How other orchestration frameworks do naming:
Dagster: pipeline
Prefect: flow
Flyte: workflow
Temporal: workflow
Kestra: flow

        I think “workflow” is the most fitting name.

3. Given the large impact of this change, I suggest defining a clear path 
forward. Would we first introduce the deprecation in Airflow 3, and remove 
“DAG” in Airflow 4?

Bas

> On 22 Oct 2024, at 09:22, Neil <neil4r...@gmail.com> wrote:
> 
> I don't see a problem with the term DAG, especially when most other
> platforms embrace the term wholeheartedly.
> I don't see anything intimidating or confusing about it at all, changing
> the term though would be fairly confusing to most users who have been using
> the term for years.
> 
> On Tue, Oct 22, 2024 at 1:18 AM Tzu-ping Chung <t...@astronomer.io.invalid>
> wrote:
> 
>> I totally agree with doing away with the term DAG. The only problem (aside
>> from actually telling people—including myself—to stop using the term) is to
>> come up with a reasonable alternative.
>> 
>> I can’t recall who, but someone mentioned “workflow” is not very accurate
>> for Airflow. The term “definition” was proposed, but it’s a bit broad; I
>> tried to use it in a few places and kept finding myself doubting “what
>> definition?” and wanting to clarify “DAG definition” (defeating the
>> purpose).
>> 
>> TP
>> 
>> 
>>> On 22 Oct 2024, at 13:07, Jens Scheffler <j_scheff...@gmx.de.INVALID>
>> wrote:
>>> 
>>> Hi Ryan,
>>> 
>>> Thanks for posting. I share the exactly same observation, had a short
>> laight because the DAG question is always an introduction if someone joins
>> the party. I think a global renaming makes sense. Especially when we also
>> rename Dataset to Asset this is also a reasonable step. Concepts still can
>> stay the same.
>>> 
>>> So I hope I don‘t need to join hiding below the desk with you and +1 for
>> raising the discussion.
>>> 
>>> Technically we can still think if we keep details of python names the
>> same because the execution is still a DAG… but user facing it is a workflow.
>>> 
>>> Jens
>>> 
>>> Sent from my Smartphone
>>> 
>>>> On 21. Oct 2024, at 23:56, Ryan Hatter <ryan.hat...@astronomer.io.invalid>
>> wrote:
>>>> 
>>>> Everyone please sheathe your swords... at least for now.
>>>> 
>>>> The term "DAG" has very little meaning to Airflow users. Indeed, it has
>>>> little meaning outside of some mathematicians and software engineers for
>>>> whom the properties of a DAG actually matter. For someone new to data
>>>> engineering or workflow orchestration, one of the first questions they
>> will
>>>> likely have is, "what on earth is a DAG?" The answer is almost always,
>>>> "It's a directed acyclic graph. You don't need to worry about what that
>>>> means; it's just a term for your workflow."
>>>> 
>>>> The term "DAG" is problematic for at least a couple important reasons:
>>>> 
>>>> *Complexity for New Users*: As mentioned above, "DAG" is unnecessarily
>>>> intimidating and confusing. We want Airflow to be approachable, and
>> using
>>>> technical jargon like "DAG" right off the bat creates an initial
>> barrier to
>>>> understanding.
>>>> 
>>>> *Disconnect Between DAG and Workflow Concepts*: The DAG is just one
>>>> component of an Airflow workflow. The workflow includes its schedule,
>>>> retries, timeouts, a dozen other parameters, and other metadata that the
>>>> DAG component doesn’t account for.
>>>> 
>>>> Consider the following from the Airflow homepage
>>>> <https://airflow.apache.org/>.
>>>> 
>>>> Apache Airflow® is a platform created by the community to
>> programmatically
>>>> author, schedule and monitor workflows.
>>>> Then, if we look at the "What is Airflow?" docs page
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/index.html>, we
>> can
>>>> see that the docs explain what Airflow is without using "DAG." It's
>> only in
>>>> the *workflow* Python code that the term is introduced out of nowhere
>> as a
>>>> comment that awkwardly tries to explain it:
>>>> 
>>>> # A DAG represents a workflow, a collection of tasks
>>>> 
>>>> It makes sense to not refer to DAGs in these introductions to Airflow,
>>>> because *Airflow doesn't orchestrate DAGs; it orchestrates workflows*.
>> The
>>>> DAG is the model that, for reasons irrelevant to almost every user,
>>>> workflows must adhere to.
>>>> 
>>>> So, I propose at least adding an alias for the term "DAG" and updating
>>>> documentation to replace "DAG" with "workflow".
>>>> 
>>>> For example, instead of...
>>>> 
>>>> @dag(
>>>> schedule="@daily",
>>>> ...
>>>> dagrun_timeout=timedelta(hours=1)
>>>> )
>>>> 
>>>> Users could do...
>>>> 
>>>> @workflow(
>>>> schedule="@daily",
>>>> ...
>>>> run_timeout=timedelta(hours=1)
>>>> )
>>>> 
>>>> 
>>>> And with that... I will start running away.
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> 
>> 

Reply via email to