This part of the cleanup is now complete. Thanks to all who participated - again, the Airflow community is amazing!
I wanted to share some stats and what we were able to achieve in just a few days. Note that those are not "perfect" numbers - there are some outliers here and there and likely things do not add up - but I scraped it quickly from the output of CI jobs: * Before the change: *11398* non-db tests, *5169* db tests * After: *12960* non-db tests, *3705* db-tests This means that we were able to move ~*1500* tests from "db" to "non-db" (around *30%* of all db tests in providers). This all in just < 5 days and *46* PRs We will have to do similar "mini" projects to move all of them eventually - but at least most "connection-only" tests are already cleaned. We also have now 45 (out of 97) providers that are fully "clean" - i.e. they contain no db tests, and we protect those provider tests from adding new ones. Those are the 17 (!) people who made it happen (I hope I did not miss anyone) https://ibb.co/Vh9g97S for those who do not see embedded image: [image: Screenshot 2025-06-27 at 01.12.55.png] Stay tuned for next stages of "db test" cleanup. And kudos especially to Amogh - the new PMC member (!) - who implemented the Connection mocking approach that made it possible. J. ------- Appendix - those are the "db clean" providers: * airbyte * apache/beam * apache/flink * apache/iceberg * apache/kafka * arangodb * asana * cloudant * cohere * common/compat * common/messaging * datadog * dingding * discord * exasol * facebook * ftp * grpc * hashicorp * imap * influxdb * jdbc * jenkins * mongo * microsoft/psrp * microsoft/winrm * neo4j * odbc * openai * openfaas * oracle * pagerduty * pgvector * pinecone * postgres * presto * segment * sendgrid * singularity * tableau * teradata * trino * vertica * yandex * zendesk On Tue, Jun 24, 2025 at 11:40 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Cool :) > > On Tue, Jun 24, 2025 at 11:07 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> Hello, >> >> Yeah I realised that there could probably be some places that require more >> attention and the pattern that I see >> is that those tests are mostly playing around or accessing the *default* >> connections. >> >> I am working on a task that can separate out the DB dependency while >> creating *default* connections >> using the ENV backend too: Separate out creation of default Connections >> for >> tests and non-tests <https://github.com/apache/airflow/pull/52129>. >> >> This should help us in the long run and will also make it easier to >> migrate >> provider code away from direct DB access! >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Mon, Jun 23, 2025 at 11:57 AM Jarek Potiuk <ja...@potiuk.com> wrote: >> >> > yep - there are a few providers that require more "thorough" changes - >> ssh >> > for example :) ... We noticed when doing cleanup. But we already can >> > clean-up and remove a lot of the pytest.mark.db_tests that were only >> there >> > due to connections :) >> > >> > I am thrilled at the prospect of having all our "providers" tests >> > eventually DB-less. This has been long in the making and is only >> possible >> > now for Airflow 3 :). >> > >> > While it will not be possible for **ALL** of them - for example >> "Executors" >> > - like edge3 have to use the DB, the regular providers that only provide >> > hooks/operators/triggers should all be db-less for tests - that will >> also >> > help us when finally task.sdk will be decoupled from "airflow-core" as >> we >> > will be able to remove "airflow-core" from being a test dependency in >> those >> > providers (and we will truly be able to see that indeed there are no >> > left-overs in providers to follow our "airflow 3" architecture. >> > >> > J. >> > >> > >> > On Mon, Jun 23, 2025 at 7:57 AM Amogh Desai <amoghdesai....@gmail.com> >> > wrote: >> > >> > > Thanks for the email, Jarek. >> > > >> > > A quick summary of the change: while working on moving BaseHook to the >> > task >> > > SDK in #51873, >> > > I noticed that many providers rely on `db.merge_conn()` to set up test >> > > connections & there are thousands >> > > of occurrences across the codebase. Doing this comes with few >> drawbacks: >> > > >> > > - Slows down test execution due to database transactions. >> > > - It introduces complexity by requiring DB setup/teardown. >> > > - It occasionally results in flaky tests due to DB access issues. >> > > >> > > So I replaced those with a fixture that creates these test >> connections in >> > > the ENV backend, as it is a pre-configured backend >> > > for Airflow. >> > > >> > > There are still a few TODOs I plan to handle in follow ups (if someone >> > > wants to contribute, I will be more than happy to >> > > review too): >> > > >> > > - Replace remaining occurrences in unit tests. >> > > - Clean up some in-code TODO comments I’ve left as placeholders. >> > > - Address similar usage in system tests. >> > > - Possibly improve the fixture to support adding multiple >> connections >> > at >> > > once. >> > > - Update the Telegram provider tests ( >> > > *providers/telegram/tests/unit/telegram/hooks/test_telegram.py*). >> > > >> > > >> > > Thanks & Regards, >> > > Amogh Desai >> > > >> > > >> > > On Sun, Jun 22, 2025 at 6:37 PM Jarek Potiuk <ja...@potiuk.com> >> wrote: >> > > >> > > > Hello here, >> > > > >> > > > After #51930 where Amogh introduced a way how tests can define >> > > connections >> > > > without DB and follow up in #52017 where I made an attempt to remove >> > some >> > > > of the *pytest.mark.db_test,* we should now remove those marks from >> > > > providers, where it is easy. >> > > > >> > > > Some providers still use DB for other things, but likely there are >> many >> > > > providers that only used db to create a Connection and we can turn >> > those >> > > > tests into non-db tests. >> > > > >> > > > I created an issue - where I have checkboxes for providers that need >> > > review >> > > > - and I have a kind request to the contributors and committers to - >> as >> > > > usual - help. >> > > > >> > > > The issue is here: >> > > > >> > > > https://github.com/apache/airflow/issues/52020 >> > > > >> > > > And I provided detailed instructions (they are rather easy) on how >> to >> > do >> > > > it. >> > > > >> > > > Looking forward to your help ! >> > > > >> > > > J. >> > > > >> > > >> > >> >