And to Kaxil's mail: yep. What you wrote is exactly what I understood needs to be done.
On Fri, Feb 17, 2023 at 2:40 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Understood. I like the idea of extensibility and "Airflow as a platform." > > However, we should make sure that we do not worsen the user experience with > > the extensibility. The "User Management Provider" is something that could > > potentially make the user experience worse, especially for customers who > > are self-hosting Airflow. Managed services will ensure that they dedicate > > resources to maintaining their user management providers. Multi-tenancy > > will end up becoming a feature for managed service customers, leaving the > > 74% of Airflow users [1] with a less powerful Airflow. As an example, > > Timetables is a very powerful feature, which, anecdotally, no customer ends > > up using due to its complexity. > > I do not think this will happen. I think part of the effort should not > only implement the API but also to provide a fully fledged (though > simple) implementation of such a provider which works with an > open-source implementation of identity - KeyCloak is one that comes to > my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak > as reference provider we can release", but I think KeyCloak has all we > need: > * integration with mutliple authentication providers and protocols > * User Management: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html > * Role Mangement including user mapping: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html > * Group management: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html > > It comes with a management console, CLI and much more > (auditing/session management etc. etc.) > > In a way it would be simply providing very much the same what FAB > Security Manager does, but with much more complete scope and - most > importantly - it would not be "part of Airflow as FAB is", it would be > "outside" of it and the only thing Airflow would provide is merely > pointers to the Docs of Keycloak on how to integrate it with Airflow > as a proxy: > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html > (or it could be done by writing Airflow KeyCloak Adapter - to be > decided what would be easier to maintain).The users will be free to > configure KeyCloak proxy as they see fit. No DB needed in Airflow to > manage any of those, no UI, no API, no CLI - all that delegated out > and integrated via incoming headers or adapter. > > The users will have several choices: > > 1) For existing users/those who want to keep all "in-airflow-ui" they > could use FAB Provider (which will be separated from the Core). Same > as today, but without the advanced management features for groups and > tenants. We might consider dropping that altogether eventually. > 2) If they are on premise - they can use KeyCloak Provider - by > following our advice/suggestions/simple guidelines on how to > integrate. They would have to manage their own KeyCloak instance (it > won't be a "standard" part of Airflow). > 3) If the user runs on AWS/Azure/GCP/others - each cloud would > (hopefully) develop their own provider to integrate with IAM etc - > > they could use that provider directly. Or they could use and manage > their KeyCloak in the cloud as they see fit (it supports all the > clouds Oauth integration). Or develop their own provider. > 4) Those on managed services will have no choice but to use the > provider installed by the Service of theirs > > I think that all gives the user the choice - if they want to go role > management and multi-tenant capabilities, fine but they will have to > mange the users outside of Airflow and integrate Airflow with it (and > they can either integrate with what they have already or use > KeyCloak). And does not really impair them. > > J, > > > On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham > <shu...@amazon.com.invalid> wrote: > > > > Thanks, Kaxil – that helped to clarify the proposal a bit more. > > > > > Replacing Access Control provided by FAB with a base/core security model > > > (that is still resource-based) > > > > Are you suggesting that we build this resource-driven security model > > directly into Airflow, without relying on external dependencies like FAB? > > > > > Extend this to the other Airflow components (scheduler, workers, > > > triggered, cli) > > > > Are there cases where the scheduler or CLI would require the authorization > > API? Since they are considered trusted components, I assumed they would not > > need it. > > > > > > Jarek - as always, I appreciate you sharing your thoughts and having an > > open discussion. > > > > > Which really explains what "Airflow as a Platform" is all about. I do not > > > think we already know all the parts that should be converted into > > > "Airflow extendability". It's more of an incremental effort like that > > > where we have those bright ideas "Hey - this part can be removed and > > > delegated to others". I think this has never been formulated explicitly > > > but I think for quite a while we are really in the mode where we think > > > much more about what we can SPLIT OUT from Airflow rather than what we > > > can ADD to Airflow. > > > > Understood. I like the idea of extensibility and "Airflow as a platform." > > However, we should make sure that we do not worsen the user experience with > > the extensibility. The "User Management Provider" is something that could > > potentially make the user experience worse, especially for customers who > > are self-hosting Airflow. Managed services will ensure that they dedicate > > resources to maintaining their user management providers. Multi-tenancy > > will end up becoming a feature for managed service customers, leaving the > > 74% of Airflow users [1] with a less powerful Airflow. As an example, > > Timetables is a very powerful feature, which, anecdotally, no customer ends > > up using due to its complexity. > > > > I am still unclear about other user scenarios related to user management, > > besides multi-tenancy, that Airflow customers are looking to enable. While > > the extensibility we aim for will enable this, is there a need for it? > > Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you > > interested in building a custom user management provider that works with > > your platform? Have there been cases where your customers were limited by > > the current permissioning model, and you considered replacing FAB? > > > > I believe that the primary motivation for "user management provider" is > > driven by the excitement around getting rid of FAB, which I think we can > > still achieve while including multi-tenancy in the core Airflow. Both > > should be treated as separate problems. > > > > References: > > 1. > > https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice > > > > On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > > Comment to Subham's question: > > > > > In addition, are there any other user scenarios, beyond > > multi-tenancy, that Airflow users are looking to enable and that require > > this pluggability? Asking as I haven't come across them. Overall, I believe > > we need more information on your proposal before seeking feedback from the > > community. Could we work together during February to develop a concrete > > proposal? > > > > I am glad you asked. I think, this is one of the what I wanted to > > achieve by adding this page > > > > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst > > - it will be live in 2.6 and one of the main parts is this one: > > > > > > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities > > > > Which really explains what "Airflow as a Platform" is all about. I do > > not think we already know all the parts that should be converted into > > "Airflow extendability". It's more of an incremental effort like that > > where we have those bright ideas "Hey - this part can be removed and > > delegated to others". I think this has never been formulated > > explicitly but I think for quite a while we are really in the mode > > where we think much more about what we can SPLIT OUT from Airflow > > rather than what we can ADD to Airflow. > > > > When you look at it, this is also the main idea behind Open Lineage > > integration for example - we are adding open linage (which is really > > just an API) so that others can build "everything-lineage" on top of > > it. So we are adding a minimum-possible set of APIs and integration so > > that we can expose the lineage capability so that all the lineage "UI" > > and other use cases that lineage exposes would be done outside. We are > > in a strong position to do it - being sure that when we expose it, > > others will implement the integration they care about. > > > > I think more and more (and It has been preached by Ash mostly, but > > also others) that we should be focusing solely on being an extremely > > powerful and robust scheduler and make sure we are exposing all of the > > possible things that can be exposed as an external API (while still > > providing basic implementation that makes airflow still a "finished" > > product that can be used to handle basic cases. > > > > BTW. We are now preparing for the Airflow Summit CFP (some > > announcements will follow shortly, I do not want to spill too many > > beans) and we have a very interesting broad category "Airflow and > > ...." . And I think we should work in the direction that the `...` is > > far bigger than Airflow itself. > > > > J. > > > > On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > > Great idea Vikram, I love the idea of making this a > > provider/pluggable. > > > > > > In some ways, we already have a pluggable mechanism for > > Authentication with Auth Backends [1]. Where we will need lot more work I > > think is: > > > > > > Replacing Access Control provided by FAB with a base/core security > > model (that is still resource-based) [2] > > > Extend this to the other Airflow components (scheduler, workers, > > triggered, cli) or make them all driven by a single API that takes care of > > Auth. This will also reduce a lot of duplication of code across many of the > > components > > > For backwards compact, we could ship with FAB-provider that still > > uses Flask-app builder in addition to our recommended provider that will > > have more features and users/companies/stabkeholders can build on top of > > that provider to extend it further. > > > > > > > > > References: > > > [1]: > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends > > > [2]: > > https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html > > > > > > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham > > <shu...@amazon.com.invalid> wrote: > > >> > > >> Hi Vikram, > > >> Thank you for taking the time to review the proposal. I appreciate > > your insights — I will make sure to reach out to you directly in the future > > for feedback as that would've undoubtedly saved us some time and effort. > > >> > > >> In regards to the separation of user management, I understand your > > concerns and, on a high-level, I agree with you. However, I think it would > > be beneficial to have more details on how it will work. Here are a few > > questions that come to mind: > > >> 1. How will the user-id/group-id interface interact with Airflow > > resource-level permissions? What parts of "John can-edit dag1 and can-view > > dag2" be part of Airflow core? What will be exposed to the external system? > > >> 2. Who will be responsible for managing the resource-level > > permissions? Will it be the external system? > > >> 3. What are the limitations of this new pluggable model compared to > > FAB? Will there be restrictions on the granularity of resource access that > > Airflow admins can provide to their users? > > >> 4. As Jarek pointed out, with this change we want to make > > authorization externally driven. Will this have a significant impact on > > Airflow performance as authorization will be required for fetching > > variables, executing tasks, etc.? > > >> 5. What will the migration process look like for existing users to > > this non-FAB pluggable model? > > >> > > >> In addition, are there any other user scenarios, beyond > > multi-tenancy, that Airflow users are looking to enable and that require > > this pluggability? Asking as I haven't come across them. Overall, I believe > > we need more information on your proposal before seeking feedback from the > > community. Could we work together during February to develop a concrete > > proposal? > > >> > > >> Beside this, I would like to propose that we define the scope and > > long-term vision of "Airflow core". To achieve this, it may be helpful to > > first outline the perspectives of the Airflow PMCs. Recently, there have > > been discussions regarding the separation of executors into a separate > > package, the implementation of pluggable schedulers, and other related > > topics. Currently, these decisions and discussions are somewhat ad hoc and > > are made through the mailing list. I would be happy to collaborate and > > invest time in this effort. > > >> > > >> Regards > > >> Shubham > > >> > > >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote: > > >> > > >> CAUTION: This email originated from outside of the organization. > > Do not click links or open attachments unless you can confirm the sender > > and know the content is safe. > > >> > > >> > > >> > > >> Hey Vikram, > > >> > > >> I think it's brilliant and I wonder how it happened that had not > > >> occurred to us earlier. And I believe that is due to the natural > > >> tendency of "following as we always did" rather than thinking > > >> completely out-of-the-box. Thanks Vikram for bringing it up. > > >> > > >> The funny thing is that when I see this: > > >> > > >> > However, I don't agree that this level of user management > > belongs in "Core Airflow". > > >> > > >> I almost immediately think - NOOOOO, why, it's always been here, > > how > > >> can we remove it? > > >> > > >> But then if you look a bit closer: > > >> > > >> > think this is a time to consider the concept of a "user > > management provider" with a simple built-in implementation being the > > current Airflow functionality, enabling alternate more complex (but > > separate) implementations such as your proposal here as alternate user > > management providers. > > >> > > >> Then it starts to make way more sense. Way more. > > >> > > >> And when you look further: > > >> > > >> > Maybe, this also enables us to get rid of the Fab security > > manager from core Airflow? > > >> > > >> My heart jumps and I am immediately sold on the idea. > > >> > > >> When I was commenting on the doc initially, something was not > > right. > > >> I had a feeling It is probably the 5th time I am looking and > > >> commenting on a similar document. And, well, I did, actually. > > Most of > > >> the things we discussed there are already implemented out there. > > We > > >> just need to make sure we expose enough of the API to use them. > > For > > >> example we have Keycloak that is an open source implementation of > > >> Identity and Access Management. With everything out there already > > >> integrated. and I've been part of the project that integrated > > just the > > >> authentication part. Now if we rethink the authorization and > > make it > > >> simpler and "externally driven", this will not only be faster > > IMHO, > > >> but also will allow enterprise users to integrate much better. > > >> > > >> I believe following the path that Vikram outlined will be a good > > >> direction for everyone in the community - including all the > > Manage > > >> Service providers, who will have a far easier job on integrating > > >> Airflow into their authentication models. > > >> > > >> J. > > >> > > >> > > >> > > >> On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka > > >> <vik...@astronomer.io.invalid> wrote: > > >> > > > >> > Shubham and Vincent, > > >> > > > >> > Let me start by saying that I apologize for my delayed > > response to your original email. > > >> > > > >> > I appreciate the detailed write-up and the thought behind it. > > I completely agree with your use case and understand how this is applicable > > to enterprises with multiple data teams using Airflow. > > >> > > > >> > However, I don't agree that this level of user management > > belongs in "Core Airflow". > > >> > > > >> > I strongly believe that the core Airflow mission is for the > > community at large and for data practitioners either individuals or teams > > within enterprises. And therefore, I don't disagree with the intent of > > making it easier for enterprise teams to adopt Airflow. But, I think there > > is a never ending list of user management features which are needed to > > support Enterprise needs. We have already struggled with this over time and > > faced challenges with the Fab security manager and its integration in > > Airflow. > > >> > > > >> > I think we should use this opportunity and your use case to > > "separate the user management" from Core Airflow outside of the absolute > > basics. I think this is a time to consider the concept of a "user > > management provider" with a simple built-in implementation being the > > current Airflow functionality, enabling alternate more complex (but > > separate) implementations such as your proposal here as alternate user > > management providers. Maybe, this also enables us to get rid of the Fab > > security manager from core Airflow? > > >> > > > >> > Best regards, > > >> > Vikram > > >> > > > >> > > > >> > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent > > <vincb...@amazon.com.invalid> wrote: > > >> >> > > >> >> Thanks __ > > >> >> > > >> >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> > > wrote: > > >> >> > > >> >> CAUTION: This email originated from outside of the > > organization. Do not click links or open attachments unless you can confirm > > the sender and know the content is safe. > > >> >> > > >> >> > > >> >> > > >> >> Added. > > >> >> > > >> >> On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent > > >> >> <vincb...@amazon.com.invalid> wrote: > > >> >> > > > >> >> > Thank you! > > https://cwiki.apache.org/confluence/display/~vin100.beck > > >> >> > > > >> >> > On 2023-02-02, 5:38 PM, "Jarek Potiuk" > > <ja...@potiuk.com> wrote: > > >> >> > > > >> >> > CAUTION: This email originated from outside of the > > organization. Do not click links or open attachments unless you can confirm > > the sender and know the content is safe. > > >> >> > > > >> >> > > > >> >> > > > >> >> > What's your cwiki ID, Vincent (I'll add you without > > going into details yet) > > >> >> > > > >> >> > > >> > >