In general I support this suggestion for more clarity and separation, but
on the assumption we at least protect the core sections as in point 2).
With a scale large enough, someone, consciously or not, will break the
gentleman's agreement.

On Fri, Jul 21, 2023 at 7:13 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello everyone,
>
> TL;DR; I wanted to discuss one aspect of the "configuration" separation
> between core and providers that I was working on while moving executors to
> providers - namely : can providers contribute configuration options in
> "core" sections of configuration.
>
> It is almost complete and we are gearing up to release it in 2.7.0
> together with executor separation - we have one final PR that moves
> configuration for celery to the celery provider and it is at the last
> stages of review [1].
>
> The effect it is going to have:
>
> * it will keep 100% compatibility for the user when it comes to
> configuring Airflow
> * documentation of configuration for providers is moved to the provider's
> docs
>
> In the PR (also attached) you can see screenshots on how the docs will
> look like after separation (which is the only "user" visible change. The
> documentation of configuration for providers will land in provider's docs
> and we will only have links from the main configuration page of Airflow to
> community provider's configuration (for those providers that will have its
> own configuration).
>
> This is all nice and cool.
>
> You will see that "celery", "celery_broker_transport_options" and
> "celery_kubernetes_executor" sections land in the Celery docs, disappear
> from airflow config but there is a separate page that lists "celery"
> provider as having configuration and linking between the core and provider
> docs.
>
> But - we have a possibility that a provider could contribute a new  option
> or change default for an existing option into some of the "core" sections.
>
> And Jed asked a great and important question - do we want to do it or not?
>
> We don't need it. But I feel it can be useful and can remove the last
> point where we have some configuration coupling between core and providers.
>
> We even have such a case even now and it's good to explain it:
>
>
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level
>
> * AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL
>
> It is currently in the "logging" section. And I think it is quite right
> that it is in logging. But technically - this one belongs to both logging
> and celery.
>
> I did not not want to move it just yet to celery provider (I wanted to
> make sure we want to keep it simple (and I think we do not have to decide
> in order to merge this PR) and I did not want to complicate the whole
> setup, but it feels like it **could be** contributed by celery:
>
> * it is only used by code in celery provider
> * it is only useful when you use Celery executor
>
> So - purely theoretically, we could move it entirely to celery - and get
> it described there, rather than in Airflow core.  The only (small) thing -
> it is also part of a "deprecated" array so we would have to add a
> possibility of a provider contributing also deprecated entries to the core
> configuration (and that would be a rather easy, small change).
>
> We do not "really" need it - Celery is anyhow going to be referred from
> the core in the documentation and will come pre-installed, so this would
> also be ok to treat this as part of "core" of airflow.
>
> Benefits:
>
> * core would have no knowledge of celery (except documentation for Celery
> Executor
> * we could easier separate the code if we decide providers go out to
> separate repo
> * documentation for the logging/celery_logging_level would automatically
> go to celery provider
>
> I can think of three drawbacks there:
>
> * potential confusion for docs - where the configuration comes from
> (though it is already largely addressed - in the PR you can run `airflow
> config list --include-sources` and you will be able to see for each option
> whether it was contributed from "defaults" or "defaults celery"
>
> * providers could also semi-randomly override each other's defaults - we
> do not have a mechanism to make providers have separate "namespaces" for
> sections - it's mostly a convention of section name. Maybe we can change it
> in the future, but for now it's more of a "gentleman's agreement" to decide
> what is the section name that each provider contributes.
>
> * potential security issue where 3rd-party provider could change "any"
> default for core
>
> I am not sure if the last one is really a threat. We already assume that
> providers are behaving "nicely". We already use entry points to discover
> providers and plugins and this means that any package installed in the
> system that provides the entrypoints, can execute arbitrary code while
> airflow is starting (and it could do all kinds of nasty things). And this
> is basically no different than installing and running any package in the
> system.
>
> However, for this case, we could potentially easily add a protection and
> refuse any provider (including celery) to contribute the options in
> existing sections and "fail".
>
> We might want to add a section about it to our security model [2], but I
> think providers should be treated as "trusted".
>
> I think - to summarise it, we have three options:
>
> 1) to allow providers to contribute "core" sections - total freedom,
> trust, gentleman's agreement that providers do not do nasty things (current
> situation)
>
> 2) to have a list of "core" sections that can be only contributed by the
> core - (easy to add for 2.7 if we decide to as a small follow up after
> 32064 )
>
> 3) we could also have a way to "reserve" certain sections for certain
> providers (but this is rather long term and we do not really need that yet).
>
> WDYT?
>
> [1] https://github.com/apache/airflow/pull/32604
> [2]
> https://airflow.apache.org/docs/apache-airflow/stable/security/index.html
>
> J.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to