Naming things: What should the imports in dag files for DAG etc. be?

Ash Berlin-Taylor Fri, 30 Aug 2024 02:48:25 -0700

Hi everyone,

It’s time to have a another discussion about everyone's favourite discussion - 
naming things!


Tl;dr if you have all of AIP-72 and its implications loaded in your head 
already:

##
Where should DAG, TaskGroup, Labels, decorators etc for authoring be imported 
from inside the DAG files? Similarly for DagRun, TaskInstance etc. (these 
likely won’t be created directly by users, but just used for reference 
docs/type hints/editor completion)
##

Assuming most people don’t fall in to that category, read on :)

Right now users import things into their DAG files from a few places. Some/most 
of these are (now) documented in 
https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html

```
from airflow import DAG
from airflow.decorators import task, task_group
from airflow.utils.task_group import TaskGroup
from airflow.utils.edgemodifier import Label # For adding labels between nodes 
on graph
```

The following packages are linked to from that doc too, so they are I guess 
considered quasi-public:


    airflow.exceptions
    airflow.models.dag
    airflow.models.dagbag
    airflow.models.param
    airflow.models.dagrun
    airflow.models.connection
    airflow.models.variable
    airflow.models.xcom
    airflow.utils.state
    airflow.hooks

So as part of my work on AIP-72/Task Execution interface and SDK I want to tidy 
these up and “unify” the imports.

My thinking is as follows:

1. Users should never import things from airflow.models (and in Airflow 3 it 
will be impossible to do so outside of compatibility shims)
2. “TaskGroup” and the state enums should not be imported by users from utils 
(More generally I don’t like “utils” as a namespace/package as I find it’s 
where code just get’s dumped, but that’s a separate point.)


On the subject of Hooks, I think we should consider moving `get_connection` off 
of BaseHook anyway (it’ll be implemented totally differently behind an API 
anyway) on to a class method on Connection.

So now to the crux of the naming debate, and repeating the question from the 
top:

Where should DAG, TaskGroup, Labels, decorators etc for authoring be imported 
from inside the DAG files? Similarly for DagRun, TaskInstance (these two likely 
won’t be created directly by users, but just used for reference docs/type hints)

We don’t have to worry about breaking things or needing every dag to be 
re-written as I already have a way of maintaining backwards-compatibility via a 
shim, so the please think of this as “Given a Greenfield, where should these 
imports live for our users”/“What makes most sense to see in DAG files”.

I have some rough ideas but would like to get other people's views here first.

Cheers,
Ash

Naming things: What should the imports in dag files for DAG etc. be?

Reply via email to