I love this proposal, that would indeed simplify many things and makes things more "clean". +1
On 2025/12/25 17:28:56 Jens Scheffler wrote: > Hi Jason, > > thanks for raising the discussion. +1 also from me. > > Also we always had the idea to move the widget definition from hook > forms into some descriptive structure or store them in DB ... which > would replace the rather ugly mocks that are preventing to load all the > form UI widgets from the past. But irrespective of this, a CLI > performance improvement would be beneficial. > > Jens > > On 12/25/25 17:05, Jarek Potiuk wrote: > > I am all for it. > > > > There were earlier concerns about performance of provider's discovery, but > > I think that provider's discovery alone is "fast enough". Initially when > > ProvidersManager was introduced, it discovered everything including > > importing Hooks and finding out the connection definition - with Widgets > > and everything related. Also our circular import complexity of > > settings/configuration - things partially imported when airflow help was > > being loaded and commands were initialized and imported made it very > > brittle - one innocent import added here or there caused cyclic imports. > > However, since then, a lot changed: > > > > * Providers Manager discovery is very much optimized and lazy-loads > > whatever is needed - only when it is needed (so connections are no more > > imported when ProvidersManager is initialized) > > * For Airflow 3 we've introduced mocking of the Widget classes, so we don't > > even import all flask module hierarchy even if connections are not used > > from providers > > * We are close to finishing separation of "shared" libraries as part of > > task isolation, that we are working on now is done (we pay a lot of > > attention to it with Amogh and others. This includes adding prek hooks > > that will guard proper imports (WP - > > https://github.com/apache/airflow/pull/58825). > > > > So I hope also eventually - gains in those benchmark results will be even > > more impressive - but even now, your results show that it's better to do > > discovery like we do now. > > > > Also - this was raised quite a few times that not seeing "celery" and > > "kubernetes" commands when you have no executor configured is misleading, > > people expect that the commands will be visible when you "install" > > provider, not only when you "configure executor" - which was a big > > limitation so far and we had at least few related issues about it. It's not > > intuitive at all. > > > > Plus - it solves one more problem, currently some Kubernetes CLI commands > > (generate-dag-yaml, cleanup-pods) are useful even without Kubernetes* > > family of executors. > > > > So ... Big +1 from me. > > > > J. > > > > > > > > On Thu, Dec 25, 2025 at 3:50 PM Zhe-You(Jason) Liu <[email protected]> > > wrote: > > > >> Hi all, > >> > >> First of all, I’d like to wish everyone a Merry Christmas and a happy > >> holiday season 🎄! > >> > >> I’d like to start a discussion about introducing a new `cli` section in > >> provider metadata, with the goals of: > >> > >> 1. > >> > >> **Improving Airflow CLI startup and response time** > >> According to the PoC benchmark, this change provides a noticeable > >> performance improvement. > >> 2. > >> > >> **Unlocking the ability for all providers to expose commands in the > >> Airflow CLI** > >> Currently, only AuthManger and Executor can expose commands. > >> > >> This change is probably not large enough to justify a full AIP, so I > >> believe a discussion followed by lazy consensus should be sufficient. > >> ------------------------------ > >> Why > >> > >> Before the recent refactor, regardless of which `airflow` command is > >> executed, `cli_parser` imports the actual AuthManager and Executor in use > >> in order to call `get_cli_commands` and collect optional CLI commands [1]. > >> > >> This means that **every** CLI invocation — including something as simple as > >> `airflow --help` — will import heavy modules such as `kubernetes`, ` > >> flask_appbuilder`, etc., depending on the values of ` > >> AIRFLOW__CORE__AUTH_MANAGER` and `AIRFLOW__CORE__EXECUTOR`. > >> > >> In the worst case (e.g. `FabAuthManager` + `CeleryKubernetesExecutor`), it > >> takes **~5 seconds** just to display `airflow --help` based on the > >> benchmark results. > >> ------------------------------ > >> How > >> > >> The refactor includes: > >> > >> 1. > >> > >> Adding a `cli` section to provider metadata (`provider.yaml` / `def > >> get_provider_info`) that points to `get_cli_commands` > >> 2. > >> > >> Moving `get_cli_commands` into a **clean** module that does not import > >> any heavy dependencies > >> - > >> > >> It should only import from `airflow.cli.cli_config` > >> - > >> > >> It should rely on `lazy_load_command` > >> > >> ------------------------------ > >> What > >> > >> The main behavioral change is that, after this refactor, **any installed > >> provider that exposes CLI commands will have those commands available in > >> the Airflow CLI**, even if it is not configured as the active AuthManager > >> or Executor. > >> > >> For example: > >> > >> - > >> > >> If both the Celery and Kubernetes providers are installed > >> - > >> > >> And `AIRFLOW__CORE__EXECUTOR=LocalExecutor` > >> > >> The Celery and Kubernetes command groups will still appear in `airflow > >> --help`. > >> > >> If there are no strong drawbacks to introducing the cli section in provider > >> metadata, I can either: > >> > >> - > >> > >> Break the change down provider by provider, or > >> - > >> > >> Submit one larger atomic change covering all providers > >> > >> ------------------------------ > >> PoC & Summary > >> > >> I’ve completed a PoC [2] along with a benchmark script [3] and result [4]. > >> Below is a summary of the CLI response time improvements: > >> > >> - > >> > >> *Overall average*: 3.117s, down from 4.048s (*~23.0% improvement*) > >> - > >> > >> *Fastest run*: 3.092s, down from 3.566s (*~13.3% improvement*) > >> - > >> > >> *Slowest run*: 3.155s, down from 5.006s (*~37.0% improvement*) > >> > >> ------------------------------ > >> > >> *References* > >> > >> [1] > >> > >> https://github.com/apache/airflow/blob/ac085a425652d16b5fff17f8e937938c7d47b868/airflow-core/src/airflow/cli/cli_parser.py#L62-L86 > >> [2] https://github.com/apache/airflow/pull/59805 > >> [3] > >> > >> https://github.com/apache/airflow/pull/59805/changes#diff-7ef81cd1589183b63d2452f64809adcba5f6b14e5ee337be02524d83ad6698e4 > >> [4] https://github.com/apache/airflow/pull/59805#benchmark-result > >> > >> I’d really appreciate any feedback, suggestions, or concerns about this > >> approach. Thanks! > >> > >> Best regards, > >> Jason > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
