In general I support this suggestion for more clarity and separation, but on the assumption we at least protect the core sections as in point 2). With a scale large enough, someone, consciously or not, will break the gentleman's agreement.
On Fri, Jul 21, 2023 at 7:13 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Hello everyone, > > TL;DR; I wanted to discuss one aspect of the "configuration" separation > between core and providers that I was working on while moving executors to > providers - namely : can providers contribute configuration options in > "core" sections of configuration. > > It is almost complete and we are gearing up to release it in 2.7.0 > together with executor separation - we have one final PR that moves > configuration for celery to the celery provider and it is at the last > stages of review [1]. > > The effect it is going to have: > > * it will keep 100% compatibility for the user when it comes to > configuring Airflow > * documentation of configuration for providers is moved to the provider's > docs > > In the PR (also attached) you can see screenshots on how the docs will > look like after separation (which is the only "user" visible change. The > documentation of configuration for providers will land in provider's docs > and we will only have links from the main configuration page of Airflow to > community provider's configuration (for those providers that will have its > own configuration). > > This is all nice and cool. > > You will see that "celery", "celery_broker_transport_options" and > "celery_kubernetes_executor" sections land in the Celery docs, disappear > from airflow config but there is a separate page that lists "celery" > provider as having configuration and linking between the core and provider > docs. > > But - we have a possibility that a provider could contribute a new option > or change default for an existing option into some of the "core" sections. > > And Jed asked a great and important question - do we want to do it or not? > > We don't need it. But I feel it can be useful and can remove the last > point where we have some configuration coupling between core and providers. > > We even have such a case even now and it's good to explain it: > > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level > > * AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL > > It is currently in the "logging" section. And I think it is quite right > that it is in logging. But technically - this one belongs to both logging > and celery. > > I did not not want to move it just yet to celery provider (I wanted to > make sure we want to keep it simple (and I think we do not have to decide > in order to merge this PR) and I did not want to complicate the whole > setup, but it feels like it **could be** contributed by celery: > > * it is only used by code in celery provider > * it is only useful when you use Celery executor > > So - purely theoretically, we could move it entirely to celery - and get > it described there, rather than in Airflow core. The only (small) thing - > it is also part of a "deprecated" array so we would have to add a > possibility of a provider contributing also deprecated entries to the core > configuration (and that would be a rather easy, small change). > > We do not "really" need it - Celery is anyhow going to be referred from > the core in the documentation and will come pre-installed, so this would > also be ok to treat this as part of "core" of airflow. > > Benefits: > > * core would have no knowledge of celery (except documentation for Celery > Executor > * we could easier separate the code if we decide providers go out to > separate repo > * documentation for the logging/celery_logging_level would automatically > go to celery provider > > I can think of three drawbacks there: > > * potential confusion for docs - where the configuration comes from > (though it is already largely addressed - in the PR you can run `airflow > config list --include-sources` and you will be able to see for each option > whether it was contributed from "defaults" or "defaults celery" > > * providers could also semi-randomly override each other's defaults - we > do not have a mechanism to make providers have separate "namespaces" for > sections - it's mostly a convention of section name. Maybe we can change it > in the future, but for now it's more of a "gentleman's agreement" to decide > what is the section name that each provider contributes. > > * potential security issue where 3rd-party provider could change "any" > default for core > > I am not sure if the last one is really a threat. We already assume that > providers are behaving "nicely". We already use entry points to discover > providers and plugins and this means that any package installed in the > system that provides the entrypoints, can execute arbitrary code while > airflow is starting (and it could do all kinds of nasty things). And this > is basically no different than installing and running any package in the > system. > > However, for this case, we could potentially easily add a protection and > refuse any provider (including celery) to contribute the options in > existing sections and "fail". > > We might want to add a section about it to our security model [2], but I > think providers should be treated as "trusted". > > I think - to summarise it, we have three options: > > 1) to allow providers to contribute "core" sections - total freedom, > trust, gentleman's agreement that providers do not do nasty things (current > situation) > > 2) to have a list of "core" sections that can be only contributed by the > core - (easy to add for 2.7 if we decide to as a small follow up after > 32064 ) > > 3) we could also have a way to "reserve" certain sections for certain > providers (but this is rather long term and we do not really need that yet). > > WDYT? > > [1] https://github.com/apache/airflow/pull/32604 > [2] > https://airflow.apache.org/docs/apache-airflow/stable/security/index.html > > J. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org