Yes, strongly agreed on the “no side-effects form `import airflow`”. To summarise the options so far:
1. `from airflow import DAG, TaskGroup` — have the imports be from the top level airflow module 2. `from airflow.definitions import DAG, TaskGroup` 3. `from airflow.sdk import DAG, TaskGroup` > On 31 Aug 2024, at 23:07, Jarek Potiuk <ja...@potiuk.com> wrote: > > Should be: > > ``` > @configure_settings > @configure_worker_plugins > def cli_worker(): > pass > ``` > > On Sun, Sep 1, 2024 at 12:05 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Personally for me "airflow.sdk" is best and very straightforward. And we >> have not yet used that for other things before, so it's free to use. >> >> "Models" and similar carried more (often misleading) information - they >> were sometimes database models, sometimes they were not. This caused a lot >> of confusion. >> >> IMHO explicitly calling something "sdk" is a clear indication "this is >> what you are expected to use". And makes it very clear what is and what is >> not a public interface. We should aim to make everything in "airflow.<sdk>" >> (or whatever we choose) "public" and everything else "private". That should >> also reduce the need of having to have a separate description of "what is >> public and what is not". >> >> Actually - if we continue doing import initialization as we do today - I >> would even go as far as the "airflow_sdk" package - unless we do something >> else that we have had a problem with for a long time - getting rid of side >> effects of "airflow" import. >> >> It's a bit tangential but actually related - as part of this work we >> should IMHO get rid of all side-effects of "import airflow" that we >> currently have. If we stick to sub-package of airflow - it is almost a >> given thing since "airflow.sdk" (or whatever we choose) will be >> available to "worker", "dag file processor" and "triggerer" but the rest of >> the "airlfow","whatever" will not be, and they won't be able to use DB, >> where scheduler, api_server will. >> >> So having side effects - such as connecting to the DB, configuring >> settings, plugin manager initialization when you do "import" caused a lot >> of pain, cyclic imports and a number of other problems. >> >> I think we should aim to make "initialization" code explicit rather than >> implicit (Python zen) - and (possibly via decorators) simply initialize >> what is needed and in the right sequence explicitly for each command. If we >> will be able to do it "airflow.sdk" is ok, if we will still have "import >> airflow" side-effects, The "airflow_sdk" (or similar) is in this case >> better, because otherwise we will have to have some ugly conditional code - >> when you have and when you do not have database access. >> >> As an example - If we go for "airflow.sdk" I'd love to see something like >> that: >> >> ``` >> @configure_db >> @configure_settings >> def cli_db(): >> pass >> >> @configure_db >> @configure_settings >> @configure_ui_plugins >> def cli_webserver(): >> pass >> >> @configure_settings >> @configure_ui_plugins >> def cli_worker(): >> pass >> ``` >> >> Rather than that: >> >> ``` >> import airflow <-- here everything gets initialized >> ``` >> >> J >> >> >> On Sat, Aug 31, 2024 at 10:17 PM Jens Scheffler <j_scheff...@gmx.de.invalid> >> wrote: >> >>> Hi Ash, >>> >>> I was thinking hard... was setting the email aside and still have no >>> real _good_ ideas. I am still good with "models" and "sdk". >>> >>> Actually what we want to define is an "execution interface" to which the >>> structual model as API in Python/or other language gives bindings and >>> helper methods. For the application it is around DAGs - but naming it >>> DAGs is not good because other non-DAG parts as side objects also need >>> to belong there. >>> >>> Other terms which came into my mind were "Schema", "System" and "Plan" >>> but all of there are not as good as the previous "models" or "SDK". >>> >>> API by the way is too brad and generic and smells like remote. So it >>> should _not_ be "API". >>> >>> The term "Definitions" is a bit too long in my view. >>> >>> So... TLDR... this email is not much of help other than saying that I'd >>> propose to use "airflow.models" or "airflow.sdk". If there are no other >>> / better ideas coming :-D >>> >>> Jens >>> >>> On 30.08.24 19:03, Ash Berlin-Taylor wrote: >>>>> As a side note, I wonder if we should do the user-internal separation >>> better for DagRun and TaskInstance >>>> Yes, that is a somewhat inevitable side effect of making it be behind >>> an API, and one I am looking forward to. There are almost just plain-data >>> classes (but not using data classes per se) so we have two different >>> classes — one that is the API representation, and an separate internal one >>> used by scheduler etc that will have all of the scheduling logic methods. >>>> >>>> -ash >>>> >>>>> On 30 Aug 2024, at 17:55, Tzu-ping Chung <t...@astronomer.io.INVALID> >>> wrote: >>>>> >>>>> >>>>> >>>>>> On 30 Aug 2024, at 17:48, Ash Berlin-Taylor <a...@apache.org> wrote: >>>>>> >>>>>> Where should DAG, TaskGroup, Labels, decorators etc for authoring be >>> imported from inside the DAG files? Similarly for DagRun, TaskInstance >>> (these two likely won’t be created directly by users, but just used for >>> reference docs/type hints) >>>>>> >>>>> How about airflow.definitions? When discussing assets there’s a >>> question raised on how we should call “DAG files” going forward (because >>> those files now may not contain user-defined DAGs at all). “Definition >>> files” was raised as a choice, but there’s no existing usage and it might >>> be a bit to catch on. If we put all these things into airflow.definitions, >>> maybe people will start using that term? >>>>> >>>>> As a side note, I wonder if we should do the user-internal separation >>> better for DagRun and TaskInstance. We already have that separation for >>> DAG/DagModel, Dataset/DatasetModel, and more. Maybe we should also have >>> constructs that users only see, and are converted to “real” objects (i.e. >>> exists in the db) for the scheduler. We already sort of have those in >>> DagRunPydantic and TaskInstancePydantic, we just need to name them better >>> and expose them at the right places. >>>>> >>>>> TP >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>>