Hi all, I started a discussion on the user management extraction of core Airflow here: https://github.com/apache/airflow/discussions/29986. Feel free to jump in this conversation if you're interested in that topic.
On 2023-02-21, 5:08 PM, "Mehta, Shubham" <shu...@amazon.com <mailto:shu...@amazon.com>> wrote: @Jarek - thank you for your initial deep dive on Keycloak. It looks very promising and is likely the open-source provider we should adopt for Multi-tenancy support. We can decide this at a later stage once we finish step 1. >1) For existing users/those who want to keep all "in-airflow-ui" they could >use FAB Provider (which will be separated from the Core). Same as today, but >without the advanced management features for groups and tenants. We might >consider dropping that altogether eventually. As the next step, we will do deep dive on separating the FAB provider and designing the Airflow Authorization API. We will share our findings with the community as a GitHub discussion and may even do a PoC if necessary. @Community, if you are an expert in authorization or FAB and interested in collaborating on this effort, please contact me or @Beck, Vincent (here or on Airflow Slack). We will be happy to work together to make Multi-tenancy in Airflow a reality. Shubham On 2023-02-17, 5:45 AM, "Jarek Potiuk" <ja...@potiuk.com <mailto:ja...@potiuk.com>> wrote: And to Kaxil's mail: yep. What you wrote is exactly what I understood needs to be done. On Fri, Feb 17, 2023 at 2:40 PM Jarek Potiuk <ja...@potiuk.com <mailto:ja...@potiuk.com>> wrote: > > > Understood. I like the idea of extensibility and "Airflow as a platform." > > However, we should make sure that we do not worsen the user experience with > > the extensibility. The "User Management Provider" is something that could > > potentially make the user experience worse, especially for customers who > > are self-hosting Airflow. Managed services will ensure that they dedicate > > resources to maintaining their user management providers. Multi-tenancy > > will end up becoming a feature for managed service customers, leaving the > > 74% of Airflow users [1] with a less powerful Airflow. As an example, > > Timetables is a very powerful feature, which, anecdotally, no customer ends > > up using due to its complexity. > > I do not think this will happen. I think part of the effort should not > only implement the API but also to provide a fully fledged (though > simple) implementation of such a provider which works with an > open-source implementation of identity - KeyCloak is one that comes to > my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak > as reference provider we can release", but I think KeyCloak has all we > need: > * integration with mutliple authentication providers and protocols > * User Management: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html > > <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html> > * Role Mangement including user mapping: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html > > <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html> > * Group management: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html > > <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html> > > It comes with a management console, CLI and much more > (auditing/session management etc. etc.) > > In a way it would be simply providing very much the same what FAB > Security Manager does, but with much more complete scope and - most > importantly - it would not be "part of Airflow as FAB is", it would be > "outside" of it and the only thing Airflow would provide is merely > pointers to the Docs of Keycloak on how to integrate it with Airflow > as a proxy: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html > > <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html> > (or it could be done by writing Airflow KeyCloak Adapter - to be > decided what would be easier to maintain).The users will be free to > configure KeyCloak proxy as they see fit. No DB needed in Airflow to > manage any of those, no UI, no API, no CLI - all that delegated out > and integrated via incoming headers or adapter. > > The users will have several choices: > > 1) For existing users/those who want to keep all "in-airflow-ui" they > could use FAB Provider (which will be separated from the Core). Same > as today, but without the advanced management features for groups and > tenants. We might consider dropping that altogether eventually. > 2) If they are on premise - they can use KeyCloak Provider - by > following our advice/suggestions/simple guidelines on how to > integrate. They would have to manage their own KeyCloak instance (it > won't be a "standard" part of Airflow). > 3) If the user runs on AWS/Azure/GCP/others - each cloud would > (hopefully) develop their own provider to integrate with IAM etc - > > they could use that provider directly. Or they could use and manage > their KeyCloak in the cloud as they see fit (it supports all the > clouds Oauth integration). Or develop their own provider. > 4) Those on managed services will have no choice but to use the > provider installed by the Service of theirs > > I think that all gives the user the choice - if they want to go role > management and multi-tenant capabilities, fine but they will have to > mange the users outside of Airflow and integrate Airflow with it (and > they can either integrate with what they have already or use > KeyCloak). And does not really impair them. > > J, > > > On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham > <shu...@amazon.com.inva <mailto:shu...@amazon.com.inva>lid> wrote: > > > > Thanks, Kaxil – that helped to clarify the proposal a bit more. > > > > > Replacing Access Control provided by FAB with a base/core security model > > > (that is still resource-based) > > > > Are you suggesting that we build this resource-driven security model > > directly into Airflow, without relying on external dependencies like FAB? > > > > > Extend this to the other Airflow components (scheduler, workers, > > > triggered, cli) > > > > Are there cases where the scheduler or CLI would require the authorization > > API? Since they are considered trusted components, I assumed they would not > > need it. > > > > > > Jarek - as always, I appreciate you sharing your thoughts and having an > > open discussion. > > > > > Which really explains what "Airflow as a Platform" is all about. I do not > > > think we already know all the parts that should be converted into > > > "Airflow extendability". It's more of an incremental effort like that > > > where we have those bright ideas "Hey - this part can be removed and > > > delegated to others". I think this has never been formulated explicitly > > > but I think for quite a while we are really in the mode where we think > > > much more about what we can SPLIT OUT from Airflow rather than what we > > > can ADD to Airflow. > > > > Understood. I like the idea of extensibility and "Airflow as a platform." > > However, we should make sure that we do not worsen the user experience with > > the extensibility. The "User Management Provider" is something that could > > potentially make the user experience worse, especially for customers who > > are self-hosting Airflow. Managed services will ensure that they dedicate > > resources to maintaining their user management providers. Multi-tenancy > > will end up becoming a feature for managed service customers, leaving the > > 74% of Airflow users [1] with a less powerful Airflow. As an example, > > Timetables is a very powerful feature, which, anecdotally, no customer ends > > up using due to its complexity. > > > > I am still unclear about other user scenarios related to user management, > > besides multi-tenancy, that Airflow customers are looking to enable. While > > the extensibility we aim for will enable this, is there a need for it? > > Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you > > interested in building a custom user management provider that works with > > your platform? Have there been cases where your customers were limited by > > the current permissioning model, and you considered replacing FAB? > > > > I believe that the primary motivation for "user management provider" is > > driven by the excitement around getting rid of FAB, which I think we can > > still achieve while including multi-tenancy in the core Airflow. Both > > should be treated as separate problems. > > > > References: > > 1. > > https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice > > > > <https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice> > > > > On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com > > <mailto:ja...@potiuk.com>> wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > > Comment to Subham's question: > > > > > In addition, are there any other user scenarios, beyond multi-tenancy, > > > that Airflow users are looking to enable and that require this > > > pluggability? Asking as I haven't come across them. Overall, I believe we > > > need more information on your proposal before seeking feedback from the > > > community. Could we work together during February to develop a concrete > > > proposal? > > > > I am glad you asked. I think, this is one of the what I wanted to > > achieve by adding this page > > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst > > > > <https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst> > > - it will be live in 2.6 and one of the main parts is this one: > > > > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities > > > > <https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities> > > > > Which really explains what "Airflow as a Platform" is all about. I do > > not think we already know all the parts that should be converted into > > "Airflow extendability". It's more of an incremental effort like that > > where we have those bright ideas "Hey - this part can be removed and > > delegated to others". I think this has never been formulated > > explicitly but I think for quite a while we are really in the mode > > where we think much more about what we can SPLIT OUT from Airflow > > rather than what we can ADD to Airflow. > > > > When you look at it, this is also the main idea behind Open Lineage > > integration for example - we are adding open linage (which is really > > just an API) so that others can build "everything-lineage" on top of > > it. So we are adding a minimum-possible set of APIs and integration so > > that we can expose the lineage capability so that all the lineage "UI" > > and other use cases that lineage exposes would be done outside. We are > > in a strong position to do it - being sure that when we expose it, > > others will implement the integration they care about. > > > > I think more and more (and It has been preached by Ash mostly, but > > also others) that we should be focusing solely on being an extremely > > powerful and robust scheduler and make sure we are exposing all of the > > possible things that can be exposed as an external API (while still > > providing basic implementation that makes airflow still a "finished" > > product that can be used to handle basic cases. > > > > BTW. We are now preparing for the Airflow Summit CFP (some > > announcements will follow shortly, I do not want to spill too many > > beans) and we have a very interesting broad category "Airflow and > > ...." . And I think we should work in the direction that the `...` is > > far bigger than Airflow itself. > > > > J. > > > > On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <kaxiln...@gmail.com > > <mailto:kaxiln...@gmail.com>> wrote: > > > > > > Great idea Vikram, I love the idea of making this a provider/pluggable. > > > > > > In some ways, we already have a pluggable mechanism for Authentication > > > with Auth Backends [1]. Where we will need lot more work I think is: > > > > > > Replacing Access Control provided by FAB with a base/core security model > > > (that is still resource-based) [2] > > > Extend this to the other Airflow components (scheduler, workers, > > > triggered, cli) or make them all driven by a single API that takes care > > > of Auth. This will also reduce a lot of duplication of code across many > > > of the components > > > For backwards compact, we could ship with FAB-provider that still uses > > > Flask-app builder in addition to our recommended provider that will have > > > more features and users/companies/stabkeholders can build on top of that > > > provider to extend it further. > > > > > > > > > References: > > > [1]: > > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends > > > > > > <https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends> > > > [2]: > > > https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html > > > > > > <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html> > > > > > > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <shu...@amazon.com.inva > > > <mailto:shu...@amazon.com.inva>lid> wrote: > > >> > > >> Hi Vikram, > > >> Thank you for taking the time to review the proposal. I appreciate your > > >> insights — I will make sure to reach out to you directly in the future > > >> for feedback as that would've undoubtedly saved us some time and effort. > > >> > > >> In regards to the separation of user management, I understand your > > >> concerns and, on a high-level, I agree with you. However, I think it > > >> would be beneficial to have more details on how it will work. Here are a > > >> few questions that come to mind: > > >> 1. How will the user-id/group-id interface interact with Airflow > > >> resource-level permissions? What parts of "John can-edit dag1 and > > >> can-view dag2" be part of Airflow core? What will be exposed to the > > >> external system? > > >> 2. Who will be responsible for managing the resource-level permissions? > > >> Will it be the external system? > > >> 3. What are the limitations of this new pluggable model compared to FAB? > > >> Will there be restrictions on the granularity of resource access that > > >> Airflow admins can provide to their users? > > >> 4. As Jarek pointed out, with this change we want to make authorization > > >> externally driven. Will this have a significant impact on Airflow > > >> performance as authorization will be required for fetching variables, > > >> executing tasks, etc.? > > >> 5. What will the migration process look like for existing users to this > > >> non-FAB pluggable model? > > >> > > >> In addition, are there any other user scenarios, beyond multi-tenancy, > > >> that Airflow users are looking to enable and that require this > > >> pluggability? Asking as I haven't come across them. Overall, I believe > > >> we need more information on your proposal before seeking feedback from > > >> the community. Could we work together during February to develop a > > >> concrete proposal? > > >> > > >> Beside this, I would like to propose that we define the scope and > > >> long-term vision of "Airflow core". To achieve this, it may be helpful > > >> to first outline the perspectives of the Airflow PMCs. Recently, there > > >> have been discussions regarding the separation of executors into a > > >> separate package, the implementation of pluggable schedulers, and other > > >> related topics. Currently, these decisions and discussions are somewhat > > >> ad hoc and are made through the mailing list. I would be happy to > > >> collaborate and invest time in this effort. > > >> > > >> Regards > > >> Shubham > > >> > > >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com > > >> <mailto:ja...@potiuk.com>> wrote: > > >> > > >> CAUTION: This email originated from outside of the organization. Do not > > >> click links or open attachments unless you can confirm the sender and > > >> know the content is safe. > > >> > > >> > > >> > > >> Hey Vikram, > > >> > > >> I think it's brilliant and I wonder how it happened that had not > > >> occurred to us earlier. And I believe that is due to the natural > > >> tendency of "following as we always did" rather than thinking > > >> completely out-of-the-box. Thanks Vikram for bringing it up. > > >> > > >> The funny thing is that when I see this: > > >> > > >> > However, I don't agree that this level of user management belongs in > > >> > "Core Airflow". > > >> > > >> I almost immediately think - NOOOOO, why, it's always been here, how > > >> can we remove it? > > >> > > >> But then if you look a bit closer: > > >> > > >> > think this is a time to consider the concept of a "user management > > >> > provider" with a simple built-in implementation being the current > > >> > Airflow functionality, enabling alternate more complex (but separate) > > >> > implementations such as your proposal here as alternate user > > >> > management providers. > > >> > > >> Then it starts to make way more sense. Way more. > > >> > > >> And when you look further: > > >> > > >> > Maybe, this also enables us to get rid of the Fab security manager > > >> > from core Airflow? > > >> > > >> My heart jumps and I am immediately sold on the idea. > > >> > > >> When I was commenting on the doc initially, something was not right. > > >> I had a feeling It is probably the 5th time I am looking and > > >> commenting on a similar document. And, well, I did, actually. Most of > > >> the things we discussed there are already implemented out there. We > > >> just need to make sure we expose enough of the API to use them. For > > >> example we have Keycloak that is an open source implementation of > > >> Identity and Access Management. With everything out there already > > >> integrated. and I've been part of the project that integrated just the > > >> authentication part. Now if we rethink the authorization and make it > > >> simpler and "externally driven", this will not only be faster IMHO, > > >> but also will allow enterprise users to integrate much better. > > >> > > >> I believe following the path that Vikram outlined will be a good > > >> direction for everyone in the community - including all the Manage > > >> Service providers, who will have a far easier job on integrating > > >> Airflow into their authentication models. > > >> > > >> J. > > >> > > >> > > >> > > >> On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka > > >> <vik...@astronomer.io.inva <mailto:vik...@astronomer.io.inva>lid> wrote: > > >> > > > >> > Shubham and Vincent, > > >> > > > >> > Let me start by saying that I apologize for my delayed response to > > >> > your original email. > > >> > > > >> > I appreciate the detailed write-up and the thought behind it. I > > >> > completely agree with your use case and understand how this is > > >> > applicable to enterprises with multiple data teams using Airflow. > > >> > > > >> > However, I don't agree that this level of user management belongs in > > >> > "Core Airflow". > > >> > > > >> > I strongly believe that the core Airflow mission is for the community > > >> > at large and for data practitioners either individuals or teams within > > >> > enterprises. And therefore, I don't disagree with the intent of making > > >> > it easier for enterprise teams to adopt Airflow. But, I think there is > > >> > a never ending list of user management features which are needed to > > >> > support Enterprise needs. We have already struggled with this over > > >> > time and faced challenges with the Fab security manager and its > > >> > integration in Airflow. > > >> > > > >> > I think we should use this opportunity and your use case to "separate > > >> > the user management" from Core Airflow outside of the absolute basics. > > >> > I think this is a time to consider the concept of a "user management > > >> > provider" with a simple built-in implementation being the current > > >> > Airflow functionality, enabling alternate more complex (but separate) > > >> > implementations such as your proposal here as alternate user > > >> > management providers. Maybe, this also enables us to get rid of the > > >> > Fab security manager from core Airflow? > > >> > > > >> > Best regards, > > >> > Vikram > > >> > > > >> > > > >> > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vincb...@amazon.com.inva > > >> > <mailto:vincb...@amazon.com.inva>lid> wrote: > > >> >> > > >> >> Thanks __ > > >> >> > > >> >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com > > >> >> <mailto:ja...@potiuk.com>> wrote: > > >> >> > > >> >> CAUTION: This email originated from outside of the organization. Do > > >> >> not click links or open attachments unless you can confirm the sender > > >> >> and know the content is safe. > > >> >> > > >> >> > > >> >> > > >> >> Added. > > >> >> > > >> >> On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent > > >> >> <vincb...@amazon.com.inva <mailto:vincb...@amazon.com.inva>lid> wrote: > > >> >> > > > >> >> > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck > > >> >> > <https://cwiki.apache.org/confluence/display/~vin100.beck> > > >> >> > > > >> >> > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com > > >> >> > <mailto:ja...@potiuk.com>> wrote: > > >> >> > > > >> >> > CAUTION: This email originated from outside of the organization. Do > > >> >> > not click links or open attachments unless you can confirm the > > >> >> > sender and know the content is safe. > > >> >> > > > >> >> > > > >> >> > > > >> >> > What's your cwiki ID, Vincent (I'll add you without going into > > >> >> > details yet) > > >> >> > > > >> >> > > >> > >