Everyone please sheathe your swords... at least for now.

The term "DAG" has very little meaning to Airflow users. Indeed, it has
little meaning outside of some mathematicians and software engineers for
whom the properties of a DAG actually matter. For someone new to data
engineering or workflow orchestration, one of the first questions they will
likely have is, "what on earth is a DAG?" The answer is almost always,
"It's a directed acyclic graph. You don't need to worry about what that
means; it's just a term for your workflow."

The term "DAG" is problematic for at least a couple important reasons:

*Complexity for New Users*: As mentioned above, "DAG" is unnecessarily
intimidating and confusing. We want Airflow to be approachable, and using
technical jargon like "DAG" right off the bat creates an initial barrier to
understanding.

*Disconnect Between DAG and Workflow Concepts*: The DAG is just one
component of an Airflow workflow. The workflow includes its schedule,
retries, timeouts, a dozen other parameters, and other metadata that the
DAG component doesn’t account for.

Consider the following from the Airflow homepage
<https://airflow.apache.org/>.

Apache Airflow® is a platform created by the community to programmatically
author, schedule and monitor workflows.
Then, if we look at the "What is Airflow?" docs page
<https://airflow.apache.org/docs/apache-airflow/stable/index.html>, we can
see that the docs explain what Airflow is without using "DAG." It's only in
the *workflow* Python code that the term is introduced out of nowhere as a
comment that awkwardly tries to explain it:

# A DAG represents a workflow, a collection of tasks

It makes sense to not refer to DAGs in these introductions to Airflow,
because *Airflow doesn't orchestrate DAGs; it orchestrates workflows*. The
DAG is the model that, for reasons irrelevant to almost every user,
workflows must adhere to.

So, I propose at least adding an alias for the term "DAG" and updating
documentation to replace "DAG" with "workflow".

For example, instead of...

@dag(
schedule="@daily",
...
dagrun_timeout=timedelta(hours=1)
)

Users could do...

@workflow(
schedule="@daily",
...
run_timeout=timedelta(hours=1)
)


And with that... I will start running away.

Reply via email to