Re: Naming things: What should the imports in dag files for DAG etc. be?

Jarek Potiuk Sat, 31 Aug 2024 15:05:53 -0700

Personally for me "airflow.sdk" is best and very straightforward. And we
have not yet used that for other things before, so it's free to use.

"Models" and similar carried more (often misleading) information - they
were sometimes database models, sometimes they were not. This caused a lot
of confusion.

IMHO explicitly calling something "sdk" is a clear indication "this is what
you are expected to use". And makes it very clear what is and what is not a
public interface. We should aim to make everything in "airflow.<sdk>" (or
whatever we choose) "public" and everything else "private". That should
also reduce the need of having to have a separate description of "what is
public and what is not".

Actually - if we continue doing import initialization as we do today - I
would even go as far as the "airflow_sdk" package - unless we do something
else that we have had a problem with for a long time - getting rid of side
effects of "airflow" import.

It's a bit tangential but actually related - as part of this work we should
IMHO get rid of all side-effects of "import airflow" that we currently
have. If we stick to sub-package of airflow  - it is almost a given thing
since "airflow.sdk"  (or whatever we choose) will be available to "worker",
"dag file processor" and "triggerer" but the rest of the
"airlfow","whatever" will not be, and they won't be able to use DB, where
scheduler, api_server will.

So having side effects - such as connecting to the DB, configuring
settings, plugin manager initialization when you do "import" caused a lot
of pain, cyclic imports and a number of other problems.

I think we should aim to  make "initialization" code explicit rather than
implicit (Python zen) - and (possibly via decorators) simply initialize
what is needed and in the right sequence explicitly for each command. If we
will be able to do it "airflow.sdk" is ok, if we will still have "import
airflow" side-effects, The "airflow_sdk" (or similar) is in this case
better, because otherwise we will have to have some ugly conditional code -
when you have and when you do not have database access.

As an example - If we go for "airflow.sdk" I'd love to see something like
that:

```
@configure_db
@configure_settings
def cli_db():
    pass

@configure_db
@configure_settings
@configure_ui_plugins
def cli_webserver():
    pass

@configure_settings
@configure_ui_plugins
def cli_worker():
    pass
```

Rather than that:

```
import airflow <-- here everything gets initialized
```

J

On Sat, Aug 31, 2024 at 10:17 PM Jens Scheffler <j_scheff...@gmx.de.invalid>
wrote:

> Hi Ash,
>
> I was thinking hard... was setting the email aside and still have no
> real _good_ ideas. I am still good with "models" and "sdk".
>
> Actually what we want to define is an "execution interface" to which the
> structual model as API in Python/or other language gives bindings and
> helper methods. For the application it is around DAGs - but naming it
> DAGs is not good because other non-DAG parts as side objects also need
> to belong there.
>
> Other terms which came into my mind were "Schema", "System" and "Plan"
> but all of there are not as good as the previous "models" or "SDK".
>
> API by the way is too brad and generic and smells like remote. So it
> should _not_ be "API".
>
> The term "Definitions" is a bit too long in my view.
>
> So... TLDR... this email is not much of help other than saying that I'd
> propose to use "airflow.models" or "airflow.sdk". If there are no other
> / better ideas coming :-D
>
> Jens
>
> On 30.08.24 19:03, Ash Berlin-Taylor wrote:
> >> As a side note, I wonder if we should do the user-internal separation
> better for DagRun and TaskInstance
> > Yes, that is a somewhat inevitable side effect of making it be behind an
> API, and one I am looking forward to. There are almost just plain-data
> classes (but not using data classes per se) so we have two different
> classes — one that is the API representation, and an separate internal one
> used by scheduler etc that will have all of the scheduling logic methods.
> >
> > -ash
> >
> >> On 30 Aug 2024, at 17:55, Tzu-ping Chung <t...@astronomer.io.INVALID>
> wrote:
> >>
> >>
> >>
> >>> On 30 Aug 2024, at 17:48, Ash Berlin-Taylor <a...@apache.org> wrote:
> >>>
> >>> Where should DAG, TaskGroup, Labels, decorators etc for authoring be
> imported from inside the DAG files? Similarly for DagRun, TaskInstance
> (these two likely won’t be created directly by users, but just used for
> reference docs/type hints)
> >>>
> >> How about airflow.definitions? When discussing assets there’s a
> question raised on how we should call “DAG files” going forward (because
> those files now may not contain user-defined DAGs at all). “Definition
> files” was raised as a choice, but there’s no existing usage and it might
> be a bit to catch on. If we put all these things into airflow.definitions,
> maybe people will start using that term?
> >>
> >> As a side note, I wonder if we should do the user-internal separation
> better for DagRun and TaskInstance. We already have that separation for
> DAG/DagModel, Dataset/DatasetModel, and more. Maybe we should also have
> constructs that users only see, and are converted to “real” objects (i.e.
> exists in the db) for the scheduler. We already sort of have those in
> DagRunPydantic and TaskInstancePydantic, we just need to name them better
> and expose them at the right places.
> >>
> >> TP
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Re: Naming things: What should the imports in dag files for DAG etc. be?

Reply via email to