Thanks for pulling out the stats, Jarek. Absolutely delighted by the engagement from various contributors, moving 1500 tests within a week is not an easy task and I really love the issue that was created and active engagement on it.
Time to move on to the next further steps of cleaning up providers from using the DB! In case i come up with something that involves / can use community help, I will make sure to reach out. Thanks for all you do, everyone! Thanks & Regards, Amogh Desai On Fri, Jun 27, 2025 at 4:51 AM Jarek Potiuk <ja...@potiuk.com> wrote: > This part of the cleanup is now complete. Thanks to all who participated - > again, the Airflow community is amazing! > > I wanted to share some stats and what we were able to achieve in just a > few days. Note that those are not "perfect" numbers - there are some > outliers here and there and likely things do not add up - but I scraped it > quickly from the output of CI jobs: > > * Before the change: *11398* non-db tests, *5169* db tests > * After: *12960* non-db tests, *3705* db-tests > > This means that we were able to move ~*1500* tests from "db" to "non-db" > (around *30%* of all db tests in providers). This all in just < 5 days > and *46* PRs > > We will have to do similar "mini" projects to move all of them eventually > - but at least most "connection-only" tests are already cleaned. > We also have now 45 (out of 97) providers that are fully "clean" - i.e. > they contain no db tests, and we protect those provider tests from adding > new ones. > Those are the 17 (!) people who made it happen (I hope I did not miss > anyone) https://ibb.co/Vh9g97S for those who do not see embedded image: > > [image: Screenshot 2025-06-27 at 01.12.55.png] > > Stay tuned for next stages of "db test" cleanup. And kudos especially to > Amogh - the new PMC member (!) - who implemented the Connection mocking > approach that made it possible. > > J. > > ------- > > Appendix - those are the "db clean" providers: > > * airbyte > * apache/beam > * apache/flink > * apache/iceberg > * apache/kafka > * arangodb > * asana > * cloudant > * cohere > * common/compat > * common/messaging > * datadog > * dingding > * discord > * exasol > * facebook > * ftp > * grpc > * hashicorp > * imap > * influxdb > * jdbc > * jenkins > * mongo > * microsoft/psrp > * microsoft/winrm > * neo4j > * odbc > * openai > * openfaas > * oracle > * pagerduty > * pgvector > * pinecone > * postgres > * presto > * segment > * sendgrid > * singularity > * tableau > * teradata > * trino > * vertica > * yandex > * zendesk > > > > > > > > On Tue, Jun 24, 2025 at 11:40 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Cool :) >> >> On Tue, Jun 24, 2025 at 11:07 AM Amogh Desai <amoghdesai....@gmail.com> >> wrote: >> >>> Hello, >>> >>> Yeah I realised that there could probably be some places that require >>> more >>> attention and the pattern that I see >>> is that those tests are mostly playing around or accessing the *default* >>> connections. >>> >>> I am working on a task that can separate out the DB dependency while >>> creating *default* connections >>> using the ENV backend too: Separate out creation of default Connections >>> for >>> tests and non-tests <https://github.com/apache/airflow/pull/52129>. >>> >>> This should help us in the long run and will also make it easier to >>> migrate >>> provider code away from direct DB access! >>> >>> Thanks & Regards, >>> Amogh Desai >>> >>> >>> On Mon, Jun 23, 2025 at 11:57 AM Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>> > yep - there are a few providers that require more "thorough" changes - >>> ssh >>> > for example :) ... We noticed when doing cleanup. But we already can >>> > clean-up and remove a lot of the pytest.mark.db_tests that were only >>> there >>> > due to connections :) >>> > >>> > I am thrilled at the prospect of having all our "providers" tests >>> > eventually DB-less. This has been long in the making and is only >>> possible >>> > now for Airflow 3 :). >>> > >>> > While it will not be possible for **ALL** of them - for example >>> "Executors" >>> > - like edge3 have to use the DB, the regular providers that only >>> provide >>> > hooks/operators/triggers should all be db-less for tests - that will >>> also >>> > help us when finally task.sdk will be decoupled from "airflow-core" as >>> we >>> > will be able to remove "airflow-core" from being a test dependency in >>> those >>> > providers (and we will truly be able to see that indeed there are no >>> > left-overs in providers to follow our "airflow 3" architecture. >>> > >>> > J. >>> > >>> > >>> > On Mon, Jun 23, 2025 at 7:57 AM Amogh Desai <amoghdesai....@gmail.com> >>> > wrote: >>> > >>> > > Thanks for the email, Jarek. >>> > > >>> > > A quick summary of the change: while working on moving BaseHook to >>> the >>> > task >>> > > SDK in #51873, >>> > > I noticed that many providers rely on `db.merge_conn()` to set up >>> test >>> > > connections & there are thousands >>> > > of occurrences across the codebase. Doing this comes with few >>> drawbacks: >>> > > >>> > > - Slows down test execution due to database transactions. >>> > > - It introduces complexity by requiring DB setup/teardown. >>> > > - It occasionally results in flaky tests due to DB access issues. >>> > > >>> > > So I replaced those with a fixture that creates these test >>> connections in >>> > > the ENV backend, as it is a pre-configured backend >>> > > for Airflow. >>> > > >>> > > There are still a few TODOs I plan to handle in follow ups (if >>> someone >>> > > wants to contribute, I will be more than happy to >>> > > review too): >>> > > >>> > > - Replace remaining occurrences in unit tests. >>> > > - Clean up some in-code TODO comments I’ve left as placeholders. >>> > > - Address similar usage in system tests. >>> > > - Possibly improve the fixture to support adding multiple >>> connections >>> > at >>> > > once. >>> > > - Update the Telegram provider tests ( >>> > > *providers/telegram/tests/unit/telegram/hooks/test_telegram.py*). >>> > > >>> > > >>> > > Thanks & Regards, >>> > > Amogh Desai >>> > > >>> > > >>> > > On Sun, Jun 22, 2025 at 6:37 PM Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > > >>> > > > Hello here, >>> > > > >>> > > > After #51930 where Amogh introduced a way how tests can define >>> > > connections >>> > > > without DB and follow up in #52017 where I made an attempt to >>> remove >>> > some >>> > > > of the *pytest.mark.db_test,* we should now remove those marks from >>> > > > providers, where it is easy. >>> > > > >>> > > > Some providers still use DB for other things, but likely there are >>> many >>> > > > providers that only used db to create a Connection and we can turn >>> > those >>> > > > tests into non-db tests. >>> > > > >>> > > > I created an issue - where I have checkboxes for providers that >>> need >>> > > review >>> > > > - and I have a kind request to the contributors and committers to >>> - as >>> > > > usual - help. >>> > > > >>> > > > The issue is here: >>> > > > >>> > > > https://github.com/apache/airflow/issues/52020 >>> > > > >>> > > > And I provided detailed instructions (they are rather easy) on how >>> to >>> > do >>> > > > it. >>> > > > >>> > > > Looking forward to your help ! >>> > > > >>> > > > J. >>> > > > >>> > > >>> > >>> >>