Yep so. If we do not have side-effects from import airflow -> my vote would be "airflow.sdk" :)
On Mon, Sep 2, 2024 at 10:29 AM Ash Berlin-Taylor <a...@apache.org> wrote: > Yes, strongly agreed on the “no side-effects form `import airflow`”. > > To summarise the options so far: > > 1. `from airflow import DAG, TaskGroup` — have the imports be from the top > level airflow module > 2. `from airflow.definitions import DAG, TaskGroup` > 3. `from airflow.sdk import DAG, TaskGroup` > > > On 31 Aug 2024, at 23:07, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > Should be: > > > > ``` > > @configure_settings > > @configure_worker_plugins > > def cli_worker(): > > pass > > ``` > > > > On Sun, Sep 1, 2024 at 12:05 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > >> Personally for me "airflow.sdk" is best and very straightforward. And we > >> have not yet used that for other things before, so it's free to use. > >> > >> "Models" and similar carried more (often misleading) information - they > >> were sometimes database models, sometimes they were not. This caused a > lot > >> of confusion. > >> > >> IMHO explicitly calling something "sdk" is a clear indication "this is > >> what you are expected to use". And makes it very clear what is and what > is > >> not a public interface. We should aim to make everything in > "airflow.<sdk>" > >> (or whatever we choose) "public" and everything else "private". That > should > >> also reduce the need of having to have a separate description of "what > is > >> public and what is not". > >> > >> Actually - if we continue doing import initialization as we do today - I > >> would even go as far as the "airflow_sdk" package - unless we do > something > >> else that we have had a problem with for a long time - getting rid of > side > >> effects of "airflow" import. > >> > >> It's a bit tangential but actually related - as part of this work we > >> should IMHO get rid of all side-effects of "import airflow" that we > >> currently have. If we stick to sub-package of airflow - it is almost a > >> given thing since "airflow.sdk" (or whatever we choose) will be > >> available to "worker", "dag file processor" and "triggerer" but the > rest of > >> the "airlfow","whatever" will not be, and they won't be able to use DB, > >> where scheduler, api_server will. > >> > >> So having side effects - such as connecting to the DB, configuring > >> settings, plugin manager initialization when you do "import" caused a > lot > >> of pain, cyclic imports and a number of other problems. > >> > >> I think we should aim to make "initialization" code explicit rather > than > >> implicit (Python zen) - and (possibly via decorators) simply initialize > >> what is needed and in the right sequence explicitly for each command. > If we > >> will be able to do it "airflow.sdk" is ok, if we will still have "import > >> airflow" side-effects, The "airflow_sdk" (or similar) is in this case > >> better, because otherwise we will have to have some ugly conditional > code - > >> when you have and when you do not have database access. > >> > >> As an example - If we go for "airflow.sdk" I'd love to see something > like > >> that: > >> > >> ``` > >> @configure_db > >> @configure_settings > >> def cli_db(): > >> pass > >> > >> @configure_db > >> @configure_settings > >> @configure_ui_plugins > >> def cli_webserver(): > >> pass > >> > >> @configure_settings > >> @configure_ui_plugins > >> def cli_worker(): > >> pass > >> ``` > >> > >> Rather than that: > >> > >> ``` > >> import airflow <-- here everything gets initialized > >> ``` > >> > >> J > >> > >> > >> On Sat, Aug 31, 2024 at 10:17 PM Jens Scheffler > <j_scheff...@gmx.de.invalid> > >> wrote: > >> > >>> Hi Ash, > >>> > >>> I was thinking hard... was setting the email aside and still have no > >>> real _good_ ideas. I am still good with "models" and "sdk". > >>> > >>> Actually what we want to define is an "execution interface" to which > the > >>> structual model as API in Python/or other language gives bindings and > >>> helper methods. For the application it is around DAGs - but naming it > >>> DAGs is not good because other non-DAG parts as side objects also need > >>> to belong there. > >>> > >>> Other terms which came into my mind were "Schema", "System" and "Plan" > >>> but all of there are not as good as the previous "models" or "SDK". > >>> > >>> API by the way is too brad and generic and smells like remote. So it > >>> should _not_ be "API". > >>> > >>> The term "Definitions" is a bit too long in my view. > >>> > >>> So... TLDR... this email is not much of help other than saying that I'd > >>> propose to use "airflow.models" or "airflow.sdk". If there are no other > >>> / better ideas coming :-D > >>> > >>> Jens > >>> > >>> On 30.08.24 19:03, Ash Berlin-Taylor wrote: > >>>>> As a side note, I wonder if we should do the user-internal separation > >>> better for DagRun and TaskInstance > >>>> Yes, that is a somewhat inevitable side effect of making it be behind > >>> an API, and one I am looking forward to. There are almost just > plain-data > >>> classes (but not using data classes per se) so we have two different > >>> classes — one that is the API representation, and an separate internal > one > >>> used by scheduler etc that will have all of the scheduling logic > methods. > >>>> > >>>> -ash > >>>> > >>>>> On 30 Aug 2024, at 17:55, Tzu-ping Chung <t...@astronomer.io.INVALID> > >>> wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On 30 Aug 2024, at 17:48, Ash Berlin-Taylor <a...@apache.org> wrote: > >>>>>> > >>>>>> Where should DAG, TaskGroup, Labels, decorators etc for authoring be > >>> imported from inside the DAG files? Similarly for DagRun, TaskInstance > >>> (these two likely won’t be created directly by users, but just used for > >>> reference docs/type hints) > >>>>>> > >>>>> How about airflow.definitions? When discussing assets there’s a > >>> question raised on how we should call “DAG files” going forward > (because > >>> those files now may not contain user-defined DAGs at all). “Definition > >>> files” was raised as a choice, but there’s no existing usage and it > might > >>> be a bit to catch on. If we put all these things into > airflow.definitions, > >>> maybe people will start using that term? > >>>>> > >>>>> As a side note, I wonder if we should do the user-internal separation > >>> better for DagRun and TaskInstance. We already have that separation for > >>> DAG/DagModel, Dataset/DatasetModel, and more. Maybe we should also have > >>> constructs that users only see, and are converted to “real” objects > (i.e. > >>> exists in the db) for the scheduler. We already sort of have those in > >>> DagRunPydantic and TaskInstancePydantic, we just need to name them > better > >>> and expose them at the right places. > >>>>> > >>>>> TP > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>> > >>> > >