Comment:

I started to think that indeed option 2) is best. And as a way out
from the celery logging case, we could also deprecate the
celery/logging and move the logging option for celery to "[celery]" -
that would solve the need to "merge" sections entirely for the current
move of celery executor to provider.

J.

On Fri, Jul 21, 2023 at 11:51 AM Michał Modras
<michalmod...@google.com.invalid> wrote:
>
> In general I support this suggestion for more clarity and separation, but
> on the assumption we at least protect the core sections as in point 2).
> With a scale large enough, someone, consciously or not, will break the
> gentleman's agreement.
>
> On Fri, Jul 21, 2023 at 7:13 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Hello everyone,
> >
> > TL;DR; I wanted to discuss one aspect of the "configuration" separation
> > between core and providers that I was working on while moving executors to
> > providers - namely : can providers contribute configuration options in
> > "core" sections of configuration.
> >
> > It is almost complete and we are gearing up to release it in 2.7.0
> > together with executor separation - we have one final PR that moves
> > configuration for celery to the celery provider and it is at the last
> > stages of review [1].
> >
> > The effect it is going to have:
> >
> > * it will keep 100% compatibility for the user when it comes to
> > configuring Airflow
> > * documentation of configuration for providers is moved to the provider's
> > docs
> >
> > In the PR (also attached) you can see screenshots on how the docs will
> > look like after separation (which is the only "user" visible change. The
> > documentation of configuration for providers will land in provider's docs
> > and we will only have links from the main configuration page of Airflow to
> > community provider's configuration (for those providers that will have its
> > own configuration).
> >
> > This is all nice and cool.
> >
> > You will see that "celery", "celery_broker_transport_options" and
> > "celery_kubernetes_executor" sections land in the Celery docs, disappear
> > from airflow config but there is a separate page that lists "celery"
> > provider as having configuration and linking between the core and provider
> > docs.
> >
> > But - we have a possibility that a provider could contribute a new  option
> > or change default for an existing option into some of the "core" sections.
> >
> > And Jed asked a great and important question - do we want to do it or not?
> >
> > We don't need it. But I feel it can be useful and can remove the last
> > point where we have some configuration coupling between core and providers.
> >
> > We even have such a case even now and it's good to explain it:
> >
> >
> > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level
> >
> > * AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL
> >
> > It is currently in the "logging" section. And I think it is quite right
> > that it is in logging. But technically - this one belongs to both logging
> > and celery.
> >
> > I did not not want to move it just yet to celery provider (I wanted to
> > make sure we want to keep it simple (and I think we do not have to decide
> > in order to merge this PR) and I did not want to complicate the whole
> > setup, but it feels like it **could be** contributed by celery:
> >
> > * it is only used by code in celery provider
> > * it is only useful when you use Celery executor
> >
> > So - purely theoretically, we could move it entirely to celery - and get
> > it described there, rather than in Airflow core.  The only (small) thing -
> > it is also part of a "deprecated" array so we would have to add a
> > possibility of a provider contributing also deprecated entries to the core
> > configuration (and that would be a rather easy, small change).
> >
> > We do not "really" need it - Celery is anyhow going to be referred from
> > the core in the documentation and will come pre-installed, so this would
> > also be ok to treat this as part of "core" of airflow.
> >
> > Benefits:
> >
> > * core would have no knowledge of celery (except documentation for Celery
> > Executor
> > * we could easier separate the code if we decide providers go out to
> > separate repo
> > * documentation for the logging/celery_logging_level would automatically
> > go to celery provider
> >
> > I can think of three drawbacks there:
> >
> > * potential confusion for docs - where the configuration comes from
> > (though it is already largely addressed - in the PR you can run `airflow
> > config list --include-sources` and you will be able to see for each option
> > whether it was contributed from "defaults" or "defaults celery"
> >
> > * providers could also semi-randomly override each other's defaults - we
> > do not have a mechanism to make providers have separate "namespaces" for
> > sections - it's mostly a convention of section name. Maybe we can change it
> > in the future, but for now it's more of a "gentleman's agreement" to decide
> > what is the section name that each provider contributes.
> >
> > * potential security issue where 3rd-party provider could change "any"
> > default for core
> >
> > I am not sure if the last one is really a threat. We already assume that
> > providers are behaving "nicely". We already use entry points to discover
> > providers and plugins and this means that any package installed in the
> > system that provides the entrypoints, can execute arbitrary code while
> > airflow is starting (and it could do all kinds of nasty things). And this
> > is basically no different than installing and running any package in the
> > system.
> >
> > However, for this case, we could potentially easily add a protection and
> > refuse any provider (including celery) to contribute the options in
> > existing sections and "fail".
> >
> > We might want to add a section about it to our security model [2], but I
> > think providers should be treated as "trusted".
> >
> > I think - to summarise it, we have three options:
> >
> > 1) to allow providers to contribute "core" sections - total freedom,
> > trust, gentleman's agreement that providers do not do nasty things (current
> > situation)
> >
> > 2) to have a list of "core" sections that can be only contributed by the
> > core - (easy to add for 2.7 if we decide to as a small follow up after
> > 32064 )
> >
> > 3) we could also have a way to "reserve" certain sections for certain
> > providers (but this is rather long term and we do not really need that yet).
> >
> > WDYT?
> >
> > [1] https://github.com/apache/airflow/pull/32604
> > [2]
> > https://airflow.apache.org/docs/apache-airflow/stable/security/index.html
> >
> > J.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to