Yep so. If we do not have side-effects from import airflow -> my vote would
be "airflow.sdk" :)

On Mon, Sep 2, 2024 at 10:29 AM Ash Berlin-Taylor <a...@apache.org> wrote:

> Yes, strongly agreed on the “no side-effects form `import airflow`”.
>
> To summarise the options so far:
>
> 1. `from airflow import DAG, TaskGroup` — have the imports be from the top
> level airflow module
> 2. `from airflow.definitions import DAG, TaskGroup`
> 3. `from airflow.sdk import DAG, TaskGroup`
>
> > On 31 Aug 2024, at 23:07, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > Should be:
> >
> > ```
> > @configure_settings
> > @configure_worker_plugins
> > def cli_worker():
> >    pass
> > ```
> >
> > On Sun, Sep 1, 2024 at 12:05 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >> Personally for me "airflow.sdk" is best and very straightforward. And we
> >> have not yet used that for other things before, so it's free to use.
> >>
> >> "Models" and similar carried more (often misleading) information - they
> >> were sometimes database models, sometimes they were not. This caused a
> lot
> >> of confusion.
> >>
> >> IMHO explicitly calling something "sdk" is a clear indication "this is
> >> what you are expected to use". And makes it very clear what is and what
> is
> >> not a public interface. We should aim to make everything in
> "airflow.<sdk>"
> >> (or whatever we choose) "public" and everything else "private". That
> should
> >> also reduce the need of having to have a separate description of "what
> is
> >> public and what is not".
> >>
> >> Actually - if we continue doing import initialization as we do today - I
> >> would even go as far as the "airflow_sdk" package - unless we do
> something
> >> else that we have had a problem with for a long time - getting rid of
> side
> >> effects of "airflow" import.
> >>
> >> It's a bit tangential but actually related - as part of this work we
> >> should IMHO get rid of all side-effects of "import airflow" that we
> >> currently have. If we stick to sub-package of airflow  - it is almost a
> >> given thing since "airflow.sdk"  (or whatever we choose) will be
> >> available to "worker", "dag file processor" and "triggerer" but the
> rest of
> >> the "airlfow","whatever" will not be, and they won't be able to use DB,
> >> where scheduler, api_server will.
> >>
> >> So having side effects - such as connecting to the DB, configuring
> >> settings, plugin manager initialization when you do "import" caused a
> lot
> >> of pain, cyclic imports and a number of other problems.
> >>
> >> I think we should aim to  make "initialization" code explicit rather
> than
> >> implicit (Python zen) - and (possibly via decorators) simply initialize
> >> what is needed and in the right sequence explicitly for each command.
> If we
> >> will be able to do it "airflow.sdk" is ok, if we will still have "import
> >> airflow" side-effects, The "airflow_sdk" (or similar) is in this case
> >> better, because otherwise we will have to have some ugly conditional
> code -
> >> when you have and when you do not have database access.
> >>
> >> As an example - If we go for "airflow.sdk" I'd love to see something
> like
> >> that:
> >>
> >> ```
> >> @configure_db
> >> @configure_settings
> >> def cli_db():
> >>    pass
> >>
> >> @configure_db
> >> @configure_settings
> >> @configure_ui_plugins
> >> def cli_webserver():
> >>    pass
> >>
> >> @configure_settings
> >> @configure_ui_plugins
> >> def cli_worker():
> >>    pass
> >> ```
> >>
> >> Rather than that:
> >>
> >> ```
> >> import airflow <-- here everything gets initialized
> >> ```
> >>
> >> J
> >>
> >>
> >> On Sat, Aug 31, 2024 at 10:17 PM Jens Scheffler
> <j_scheff...@gmx.de.invalid>
> >> wrote:
> >>
> >>> Hi Ash,
> >>>
> >>> I was thinking hard... was setting the email aside and still have no
> >>> real _good_ ideas. I am still good with "models" and "sdk".
> >>>
> >>> Actually what we want to define is an "execution interface" to which
> the
> >>> structual model as API in Python/or other language gives bindings and
> >>> helper methods. For the application it is around DAGs - but naming it
> >>> DAGs is not good because other non-DAG parts as side objects also need
> >>> to belong there.
> >>>
> >>> Other terms which came into my mind were "Schema", "System" and "Plan"
> >>> but all of there are not as good as the previous "models" or "SDK".
> >>>
> >>> API by the way is too brad and generic and smells like remote. So it
> >>> should _not_ be "API".
> >>>
> >>> The term "Definitions" is a bit too long in my view.
> >>>
> >>> So... TLDR... this email is not much of help other than saying that I'd
> >>> propose to use "airflow.models" or "airflow.sdk". If there are no other
> >>> / better ideas coming :-D
> >>>
> >>> Jens
> >>>
> >>> On 30.08.24 19:03, Ash Berlin-Taylor wrote:
> >>>>> As a side note, I wonder if we should do the user-internal separation
> >>> better for DagRun and TaskInstance
> >>>> Yes, that is a somewhat inevitable side effect of making it be behind
> >>> an API, and one I am looking forward to. There are almost just
> plain-data
> >>> classes (but not using data classes per se) so we have two different
> >>> classes — one that is the API representation, and an separate internal
> one
> >>> used by scheduler etc that will have all of the scheduling logic
> methods.
> >>>>
> >>>> -ash
> >>>>
> >>>>> On 30 Aug 2024, at 17:55, Tzu-ping Chung <t...@astronomer.io.INVALID>
> >>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 30 Aug 2024, at 17:48, Ash Berlin-Taylor <a...@apache.org> wrote:
> >>>>>>
> >>>>>> Where should DAG, TaskGroup, Labels, decorators etc for authoring be
> >>> imported from inside the DAG files? Similarly for DagRun, TaskInstance
> >>> (these two likely won’t be created directly by users, but just used for
> >>> reference docs/type hints)
> >>>>>>
> >>>>> How about airflow.definitions? When discussing assets there’s a
> >>> question raised on how we should call “DAG files” going forward
> (because
> >>> those files now may not contain user-defined DAGs at all). “Definition
> >>> files” was raised as a choice, but there’s no existing usage and it
> might
> >>> be a bit to catch on. If we put all these things into
> airflow.definitions,
> >>> maybe people will start using that term?
> >>>>>
> >>>>> As a side note, I wonder if we should do the user-internal separation
> >>> better for DagRun and TaskInstance. We already have that separation for
> >>> DAG/DagModel, Dataset/DatasetModel, and more. Maybe we should also have
> >>> constructs that users only see, and are converted to “real” objects
> (i.e.
> >>> exists in the db) for the scheduler. We already sort of have those in
> >>> DagRunPydantic and TaskInstancePydantic, we just need to name them
> better
> >>> and expose them at the right places.
> >>>>>
> >>>>> TP
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>
> >>>
>
>

Reply via email to