I'm partial to everything that we expect users to use to be importable from `airflow`, but would love to hear other people's thoughts.
On Fri, Aug 30, 2024 at 5:48 AM Ash Berlin-Taylor <a...@apache.org> wrote: > Hi everyone, > > It’s time to have a another discussion about everyone's favourite > discussion - naming things! > > Tl;dr if you have all of AIP-72 and its implications loaded in your head > already: > > ## > Where should DAG, TaskGroup, Labels, decorators etc for authoring be > imported from inside the DAG files? Similarly for DagRun, TaskInstance etc. > (these likely won’t be created directly by users, but just used for > reference docs/type hints/editor completion) > ## > > Assuming most people don’t fall in to that category, read on :) > > Right now users import things into their DAG files from a few places. > Some/most of these are (now) documented in > https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html > > ``` > from airflow import DAG > from airflow.decorators import task, task_group > from airflow.utils.task_group import TaskGroup > from airflow.utils.edgemodifier import Label # For adding labels between > nodes on graph > ``` > > The following packages are linked to from that doc too, so they are I > guess considered quasi-public: > > > airflow.exceptions > airflow.models.dag > airflow.models.dagbag > airflow.models.param > airflow.models.dagrun > airflow.models.connection > airflow.models.variable > airflow.models.xcom > airflow.utils.state > airflow.hooks > > So as part of my work on AIP-72/Task Execution interface and SDK I want to > tidy these up and “unify” the imports. > > My thinking is as follows: > > 1. Users should never import things from airflow.models (and in Airflow 3 > it will be impossible to do so outside of compatibility shims) > 2. “TaskGroup” and the state enums should not be imported by users from > utils (More generally I don’t like “utils” as a namespace/package as I find > it’s where code just get’s dumped, but that’s a separate point.) > > > On the subject of Hooks, I think we should consider moving > `get_connection` off of BaseHook anyway (it’ll be implemented totally > differently behind an API anyway) on to a class method on Connection. > > So now to the crux of the naming debate, and repeating the question from > the top: > > Where should DAG, TaskGroup, Labels, decorators etc for authoring be > imported from inside the DAG files? Similarly for DagRun, TaskInstance > (these two likely won’t be created directly by users, but just used for > reference docs/type hints) > > We don’t have to worry about breaking things or needing every dag to be > re-written as I already have a way of maintaining backwards-compatibility > via a shim, so the please think of this as “Given a Greenfield, where > should these imports live for our users”/“What makes most sense to see in > DAG files”. > > I have some rough ideas but would like to get other people's views here > first. > > Cheers, > Ash