Hi everyone! First of all, I’d like to thank all the participants in this discussion.
Based on what I’ve read, am I understanding correctly that in order for this PR to be merged into the main Airflow codebase, I need to: – add usage examples to the Airflow documentation, – add display of this variable in the UI(the main question where it should be)? I also have a question: should the UI only show the current state of backend_order without allowing it to be edited? At the moment, I’m maintaining this PR and I’m ready to make the necessary improvements. Best regards, Anton Nitochkin On Mon, 7 Jul 2025 at 11:19, Amogh Desai <amoghdesai....@gmail.com> wrote: > I also think it would be beneficial for users / someone editing / accessing > connections or variables > from the UI to know "where" they are editing it. > > Right now it's the metadata DB but with the proposed PR that probably could > change (for cases when > DB is highest priority / middle priority?) > > But generally speaking, DAG authors at any point need not know where they > are getting connections / variables > from in a happy path scenario, but things will change if something starts > to fail and it really depends on who is > debugging the failure :)! The deployment manager can go and run the > `airflow config get-value` command, but I am guessing > most DAG authors wouldn't / shouldn't be able to do that. > > So in short, the idea makes sense theoretically to me, but it needs much > more work, mainly in terms of: > - Doc clarification > - Debugging assistance (how to know the order?): it's a more general > problem not due to the task but > similar / related to this > - Considering the worker backend angle > > Thanks & Regards, > Amogh Desai > > > On Sun, Jul 6, 2025 at 11:39 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > I think the only real "behavioural" change that you might expect from the > > user if they "know" what is the sequence is at the connection / variable > > UI. This is where the user (with connection/variable editing capabilities > > or connection/variable viewing capabilities) might actually make a > > different decision or draw a different conclusion. So my proposal would > be > > to explain the sequence - in possibly some concise way - at the > > connection/variable screen. > > > > And that seems both natural and obvious. > > > > Is that "enough" for you ? Or do you think other places need "surfacing" > ? > > What other behaviour of the users (different actors) you see might be > > impacted by lack / presence of the information? > > > > J. > > > > > > On Sun, Jul 6, 2025 at 5:42 PM Elad Kalif <elad...@apache.org> wrote: > > > > > > This seems like > > > organisation-wide policy that simply all DAG authors in the > organization > > > should be made aware of > > > > > > One among several other things that the admin expects users to > remember. > > We > > > should reduce it, not increase it. > > > From my point of view this setting adds a blind spot. I am not happy > with > > > this. > > > I have similar feelings towards cluster policies, yet another blind > spot > > > that dag authors should be aware of but no actual tools provided to see > > the > > > override in their side. > > > > > > I initially shared my thoughts on 31 March in > > > https://github.com/apache/airflow/pull/45931#discussion_r2021018760 > > > So far I haven't seen any comments that explain why we can't implement > > such > > > a mechanism. Is it technically complicated? Is it high effort? or > > > the assumption is that it serves little value? > > > > > > > > > On Sun, Jul 6, 2025 at 3:12 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > > I am missing the part of how can DAG Author be aware of the backend > > > order > > > > the cluster admin chooses? > > > > > This is a crucial part > > > > > > > > I am not sure there is a special need for it. This seems like > > > > organisation-wide policy that simply all DAG authors in the > > organization > > > > should be made aware of - it has 0 impact on the way how DAGs are > > > written. > > > > If it would be different for different DAGs you'd surely need to > > > > communicate this, but I am not sure if any other indication is > needed. > > > It's > > > > largely transparent for `DAG authors` if you ask me - they want a > > > > connection by id and the "organizational policy" decides how this > > > happens. > > > > > > > > J. > > > > > > > > > > > > On Sun, Jul 6, 2025 at 2:06 PM Elad Kalif <elad...@apache.org> > wrote: > > > > > > > > > I am missing the part of how can DAG Author be aware of the backend > > > order > > > > > the cluster admin chooses? > > > > > This is a crucial part. > > > > > > > > > > On Thu, Jul 3, 2025 at 12:14 PM Jarek Potiuk <ja...@potiuk.com> > > wrote: > > > > > > > > > > > Sorry for typos - that was my mobile auto complete... I hope it > is > > > > > > understandable anyway > > > > > > > > > > > > czw., 3 lip 2025, 11:13 użytkownik Jarek Potiuk < > ja...@potiuk.com> > > > > > > napisał: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > czw., 3 lip 2025, 10:14 użytkownik Amogh Desai < > > > > > amoghdesai....@gmail.com > > > > > > > > > > > > > > napisał: > > > > > > > > > > > > > >> Thanks for that angle, Jarek. > > > > > > >> > > > > > > >> Lets say DB lookup has higher precedence than that of say ENV > > > > backend. > > > > > > >> Wouldn't this be shooting ourselves in the foot by > compromising > > > the > > > > > > >> performance here? DB lookup > > > > > > >> will be more expensive than DB. > > > > > > >> > > > > > > >> > > > > > > > Oh absolutely. I think if we have this possibility of managing > > > order > > > > > > those > > > > > > > kind of scenarios alshould be explained in the docs so that > users > > > do > > > > > not > > > > > > > shoot themselves in a foot > > > > > > > > > > > > > > Also following my mail about multi team. I started to think > > > recently > > > > - > > > > > > > looking at some other OSS software thetwe sometimes take too > much > > > > > > > responsibility for our users and the snuffer be cause we have > to > > > > defend > > > > > > out > > > > > > > opinionated choices when there are use cases that outlet > choices > > do > > > > not > > > > > > > enable. > > > > > > > > > > > > > > This is the reason why we have so many 'options' and config > > values > > > > > > because > > > > > > > sometimes we do not want to make decisions for our users - but > > > where > > > > we > > > > > > can > > > > > > > make it an option and configuration and clearly explain to o > lut > > > > users > > > > > > (and > > > > > > > mostly I am talking about Deployment Manager role from our > > security > > > > > > model). > > > > > > > - it's their responsibility to read all the information we > > provide > > > > and > > > > > > > follow it when they make decisions on how to configure Airflow > - > > > > > knowing > > > > > > > the consequences. And we should be 'harsh' with them - in the > > sense > > > > > that > > > > > > if > > > > > > > they did not read the docs and did not understand it - any time > > > they > > > > > ask > > > > > > > imus about something not working that is explained in the docs > - > > we > > > > > > should > > > > > > > send them to the doc with 'Read The Friendly Manual' advice - > > > simply > > > > > > > because this is the only job they have. And we should not do > the > > > job > > > > > for > > > > > > > them. > > > > > > > > > > > > > > Similarly having operations like that allow our managed service > > > > > providers > > > > > > > to make their opinionated choices and make some configuration > > > options > > > > > > > possible, some selected for their users in the context of the > > > service > > > > > > > managed. But again - that's their responsibility to manage and > > > > > understand > > > > > > > what are the options and what they mean. Same as individual > > > > deployment > > > > > > > managers - they can make their own decisions - and if it does > not > > > > cost > > > > > > us a > > > > > > > lot we should make it possible for them to make those choices > > (and > > > > take > > > > > > > responsibility for their choices) > > > > > > > > > > > > > > With great powers (of choice) you also have great > > responsibilities > > > > (of > > > > > > > consequences of your choices) - and as long we are aware of > those > > > > > > > consequences and communicate it to deployment managers - it's > on > > > > their > > > > > > > shoulders to make the choices and bear the consequences. > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > > > > > There could also be a few more side effects that we will have > to > > > > fully > > > > > > >> uncover and come up > > > > > > >> with a detailed plan to allow this to be configurable. > > > > > > >> > > > > > > >> Thanks & Regards, > > > > > > >> Amogh Desai > > > > > > >> > > > > > > >> > > > > > > >> On Wed, Jul 2, 2025 at 6:43 PM Jarek Potiuk <ja...@potiuk.com > > > > > > wrote: > > > > > > >> > > > > > > >> > I think this is a good idea - but as Ash mentioned, it has > to > > be > > > > > > >> executed > > > > > > >> > well with a lot of bells and whistles, so that users will > not > > > > shoot > > > > > > >> > themselves in their foot. For example we had recently > > > discussions > > > > on > > > > > > the > > > > > > >> > new UI whether/how to explain the users that their > connections > > > in > > > > UI > > > > > > and > > > > > > >> > API **only** show the DB connections (for good reasons) - > and > > it > > > > is > > > > > > >> already > > > > > > >> > difficult to explain to the users, now - this change will > also > > > > make > > > > > it > > > > > > >> > behave differently (for example - currently when you edit > > > > connection > > > > > > >> via UI > > > > > > >> > it might **not** get into effect if you have same connection > > > > defined > > > > > > in > > > > > > >> the > > > > > > >> > secret/env var. But if you make DB first - this changes and > > > there > > > > > are > > > > > > >> few > > > > > > >> > edge-cases where it might have some unexpected effect. > > > > > > >> > > > > > > > >> > But there is one inevitable benefit of this approach that I > > > like - > > > > > the > > > > > > >> > ability of turning airflow DB into an effective "shield" for > > > > secret > > > > > > >> usage. > > > > > > >> > The big drawback of the current "sequence" is that airflow > > > > > generates a > > > > > > >> LOT > > > > > > >> > of queries to Secrets' manager, even if your connection is > > > defined > > > > > in > > > > > > >> the > > > > > > >> > DB - because it will query secrets first. So currently it is > > not > > > > > > >> possible > > > > > > >> > to say "for this, highly frequently used connection I want > to > > > keep > > > > > it > > > > > > >> in DB > > > > > > >> > to save on the secret's manager queries - both performance > and > > > > cost > > > > > > >> wise - > > > > > > >> > because defining connection in the DB does not limit the > > number > > > of > > > > > > >> secret > > > > > > >> > manager's queries. So in a number of scenarios, being able > to > > > > revert > > > > > > it > > > > > > >> and > > > > > > >> > query DB first might be very good for cost and network > > > > optimisation. > > > > > > >> > > > > > > > >> > I think if we describe it (as Ash wrote) well in the docs > and > > > > > explain > > > > > > >> those > > > > > > >> > scenarios and also clearly communicate it in the UI if > Airflow > > > (we > > > > > > need > > > > > > >> to > > > > > > >> > likely have some way of explaining the user what is their > > > > currently > > > > > > >> > configured sequence and what they should expect to happen if > > > they > > > > > > >> > remove/add connection) - then I see it as a really useful > > > feature. > > > > > > >> > > > > > > > >> > J. > > > > > > >> > > > > > > > >> > On Wed, Jul 2, 2025 at 2:54 PM Ash Berlin-Taylor < > > > a...@apache.org> > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > > At a high level I’m good with allowing this to be fully > > > > > > configurable, > > > > > > >> as > > > > > > >> > > long as we document the possible warts (“Doctor, it hurts > > > when I > > > > > do > > > > > > >> this” > > > > > > >> > > “well don’t do that then!” etc) — though as Amogh > mentioned > > it > > > > is > > > > > > >> > slightly > > > > > > >> > > complicated by the distinction between API > Server/Scheduler > > > and > > > > > the > > > > > > >> > > execution time on the worker. > > > > > > >> > > > > > > > > >> > > (I haven’t looked at the specific implementation yet) > > > > > > >> > > > > > > > > >> > > -ash > > > > > > >> > > > > > > > > >> > > > On 2 Jul 2025, at 11:56, Amogh Desai < > > > > amoghdesai....@gmail.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > > > >> > > > Hello Anton, > > > > > > >> > > > > > > > > > >> > > > Thanks for kicking off this discussion. I’d love to > > > understand > > > > > > your > > > > > > >> > > > motivations a bit more on this front. > > > > > > >> > > > From your PR, I am seeing that you are just not allowing > > > > > addition > > > > > > of > > > > > > >> > > > multiple custom backends > > > > > > >> > > > but also changing the *default_backend* order. I am a > bit > > > torn > > > > > on > > > > > > >> that > > > > > > >> > > > part. > > > > > > >> > > > > > > > > > >> > > > The current design intentionally places the metadata DB > > > > backend > > > > > at > > > > > > >> the > > > > > > >> > > > lowest precedence in the order, > > > > > > >> > > > since it’s meant to serve as the ultimate fallback > source > > of > > > > > > truth. > > > > > > >> Any > > > > > > >> > > > additional configured > > > > > > >> > > > backends are prioritized higher than it by design. > > > > > > >> > > > > > > > > > >> > > > With your changes, we now allow configurations like: > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > * @conf_vars({("secrets", "backends_order"): > > > > > > >> > > > "metastore,environment_variable,unsupported"}) def > > > > > > >> > > > test_backends_order_unsupported(self): with > > > > > > >> > > > pytest.raises(AirflowConfigException): > > > > > > >> > > ensure_secrets_loaded()* > > > > > > >> > > > > > > > > > >> > > > I don’t fully understand the motivation behind > supporting > > > this > > > > > > >> level of > > > > > > >> > > > override, especially since it > > > > > > >> > > > could allow unsupported or unintended configurations. > > > > > > Additionally, > > > > > > >> > with > > > > > > >> > > > Airflow 3.0+, we already support > > > > > > >> > > > a multi layered secret backend resolution capability > with > > > the > > > > > > >> > > introduction > > > > > > >> > > > of secrets backend for workers. > > > > > > >> > > > Order goes as: > > > > > > >> > > > > > > > > > >> > > > *secrets backend on worker directly (optional) > env > vars > > on > > > > > > worker > > > > > > >> > * > > > > > > >> > > > *reach out to api server [secrets backend defined here > > > > > (optional) > > > > > > > > > > > > > >> env > > > > > > >> > > > vars on api server > metadata DB].* > > > > > > >> > > > > > > > > > >> > > > You will have to consider this angle too. > > > > > > >> > > > > > > > > > >> > > > In my opinion, a more practical and realistic use case > > would > > > > be > > > > > to > > > > > > >> have > > > > > > >> > > the > > > > > > >> > > > ability to define multiple custom backends > > > > > > >> > > > both on worker or the API server. > > > > > > >> > > > > > > > > > >> > > > Looking forward to hearing more from you. > > > > > > >> > > > > > > > > > >> > > > Thanks & Regards, > > > > > > >> > > > Amogh Desai > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > On Wed, Jul 2, 2025 at 3:59 PM Anton Nitochkin < > > > > > > >> > ant.nitoch...@gmail.com> > > > > > > >> > > > wrote: > > > > > > >> > > > > > > > > > >> > > >> Hello, > > > > > > >> > > >> > > > > > > >> > > >> I'd like to discuss a new option that can be added via > > this > > > > PR: > > > > > > >> > > >> https://github.com/apache/airflow/pull/45931. > > > > > > >> > > >> > > > > > > >> > > >> Recently, I asked developers in Slack for their > thoughts > > on > > > > the > > > > > > new > > > > > > >> > > >> variable [secrets]backend_order. Long story short: this > > > > option > > > > > > will > > > > > > >> > > >> introduce the ability to configure the backend order > and > > > > > control > > > > > > it > > > > > > >> > > using > > > > > > >> > > >> this variable. The default value will remain the same > as > > in > > > > the > > > > > > >> > current > > > > > > >> > > >> version, so for users who don't need it, things will > stay > > > as > > > > > they > > > > > > >> are > > > > > > >> > > now. > > > > > > >> > > >> > > > > > > >> > > >> Jarek Potiuk advised starting a conversation and > > discussing > > > > the > > > > > > PR > > > > > > >> to > > > > > > >> > > reach > > > > > > >> > > >> a consensus with the community. > > > > > > >> > > >> > > > > > > >> > > >> Can you please share your thoughts on the option and > its > > > > > > >> > implementation? > > > > > > >> > > >> > > > > > > >> > > >> Anton Nitochkin > > > > > > >> > > >> > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > --------------------------------------------------------------------- > > > > > > >> > > To unsubscribe, e-mail: > dev-unsubscr...@airflow.apache.org > > > > > > >> > > For additional commands, e-mail: > > dev-h...@airflow.apache.org > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >