Comment: I started to think that indeed option 2) is best. And as a way out from the celery logging case, we could also deprecate the celery/logging and move the logging option for celery to "[celery]" - that would solve the need to "merge" sections entirely for the current move of celery executor to provider.
J. On Fri, Jul 21, 2023 at 11:51 AM Michał Modras <michalmod...@google.com.invalid> wrote: > > In general I support this suggestion for more clarity and separation, but > on the assumption we at least protect the core sections as in point 2). > With a scale large enough, someone, consciously or not, will break the > gentleman's agreement. > > On Fri, Jul 21, 2023 at 7:13 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Hello everyone, > > > > TL;DR; I wanted to discuss one aspect of the "configuration" separation > > between core and providers that I was working on while moving executors to > > providers - namely : can providers contribute configuration options in > > "core" sections of configuration. > > > > It is almost complete and we are gearing up to release it in 2.7.0 > > together with executor separation - we have one final PR that moves > > configuration for celery to the celery provider and it is at the last > > stages of review [1]. > > > > The effect it is going to have: > > > > * it will keep 100% compatibility for the user when it comes to > > configuring Airflow > > * documentation of configuration for providers is moved to the provider's > > docs > > > > In the PR (also attached) you can see screenshots on how the docs will > > look like after separation (which is the only "user" visible change. The > > documentation of configuration for providers will land in provider's docs > > and we will only have links from the main configuration page of Airflow to > > community provider's configuration (for those providers that will have its > > own configuration). > > > > This is all nice and cool. > > > > You will see that "celery", "celery_broker_transport_options" and > > "celery_kubernetes_executor" sections land in the Celery docs, disappear > > from airflow config but there is a separate page that lists "celery" > > provider as having configuration and linking between the core and provider > > docs. > > > > But - we have a possibility that a provider could contribute a new option > > or change default for an existing option into some of the "core" sections. > > > > And Jed asked a great and important question - do we want to do it or not? > > > > We don't need it. But I feel it can be useful and can remove the last > > point where we have some configuration coupling between core and providers. > > > > We even have such a case even now and it's good to explain it: > > > > > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level > > > > * AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL > > > > It is currently in the "logging" section. And I think it is quite right > > that it is in logging. But technically - this one belongs to both logging > > and celery. > > > > I did not not want to move it just yet to celery provider (I wanted to > > make sure we want to keep it simple (and I think we do not have to decide > > in order to merge this PR) and I did not want to complicate the whole > > setup, but it feels like it **could be** contributed by celery: > > > > * it is only used by code in celery provider > > * it is only useful when you use Celery executor > > > > So - purely theoretically, we could move it entirely to celery - and get > > it described there, rather than in Airflow core. The only (small) thing - > > it is also part of a "deprecated" array so we would have to add a > > possibility of a provider contributing also deprecated entries to the core > > configuration (and that would be a rather easy, small change). > > > > We do not "really" need it - Celery is anyhow going to be referred from > > the core in the documentation and will come pre-installed, so this would > > also be ok to treat this as part of "core" of airflow. > > > > Benefits: > > > > * core would have no knowledge of celery (except documentation for Celery > > Executor > > * we could easier separate the code if we decide providers go out to > > separate repo > > * documentation for the logging/celery_logging_level would automatically > > go to celery provider > > > > I can think of three drawbacks there: > > > > * potential confusion for docs - where the configuration comes from > > (though it is already largely addressed - in the PR you can run `airflow > > config list --include-sources` and you will be able to see for each option > > whether it was contributed from "defaults" or "defaults celery" > > > > * providers could also semi-randomly override each other's defaults - we > > do not have a mechanism to make providers have separate "namespaces" for > > sections - it's mostly a convention of section name. Maybe we can change it > > in the future, but for now it's more of a "gentleman's agreement" to decide > > what is the section name that each provider contributes. > > > > * potential security issue where 3rd-party provider could change "any" > > default for core > > > > I am not sure if the last one is really a threat. We already assume that > > providers are behaving "nicely". We already use entry points to discover > > providers and plugins and this means that any package installed in the > > system that provides the entrypoints, can execute arbitrary code while > > airflow is starting (and it could do all kinds of nasty things). And this > > is basically no different than installing and running any package in the > > system. > > > > However, for this case, we could potentially easily add a protection and > > refuse any provider (including celery) to contribute the options in > > existing sections and "fail". > > > > We might want to add a section about it to our security model [2], but I > > think providers should be treated as "trusted". > > > > I think - to summarise it, we have three options: > > > > 1) to allow providers to contribute "core" sections - total freedom, > > trust, gentleman's agreement that providers do not do nasty things (current > > situation) > > > > 2) to have a list of "core" sections that can be only contributed by the > > core - (easy to add for 2.7 if we decide to as a small follow up after > > 32064 ) > > > > 3) we could also have a way to "reserve" certain sections for certain > > providers (but this is rather long term and we do not really need that yet). > > > > WDYT? > > > > [1] https://github.com/apache/airflow/pull/32604 > > [2] > > https://airflow.apache.org/docs/apache-airflow/stable/security/index.html > > > > J. > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org