This part of the cleanup is now complete. Thanks to all who participated -
again, the Airflow community is amazing!

I wanted to share some stats and what we were able to achieve in just a few
days. Note that those are not "perfect" numbers - there are some outliers
here and there and likely things do not add up - but I scraped it quickly
from the output of CI jobs:

* Before the change: *11398* non-db tests, *5169*  db tests
* After: *12960* non-db tests, *3705* db-tests

This means that we were able to move ~*1500*  tests from "db" to "non-db"
(around *30%* of all db tests in providers). This all in just < 5 days and
*46* PRs

We will have to do similar "mini" projects to move all of them eventually -
but at least most "connection-only" tests are already cleaned.
We also have now 45 (out of 97) providers that are fully "clean" - i.e.
they contain no db tests, and we protect those provider tests from adding
new ones.
Those are the 17 (!) people who made it happen (I hope I did not miss
anyone) https://ibb.co/Vh9g97S for those who do not see embedded image:

[image: Screenshot 2025-06-27 at 01.12.55.png]

Stay tuned for next stages of "db test" cleanup. And kudos especially to
Amogh - the new PMC member (!) - who implemented the Connection mocking
approach that made it possible.

J.

-------

Appendix - those are the "db clean" providers:

 * airbyte
 * apache/beam
 * apache/flink
 * apache/iceberg
 * apache/kafka
 * arangodb
 * asana
 * cloudant
 * cohere
 * common/compat
 * common/messaging
 * datadog
 * dingding
 * discord
 * exasol
 * facebook
 * ftp
 * grpc
 * hashicorp
 * imap
 * influxdb
 * jdbc
 * jenkins
 * mongo
 * microsoft/psrp
 * microsoft/winrm
 * neo4j
 * odbc
 * openai
 * openfaas
 * oracle
 * pagerduty
 * pgvector
 * pinecone
 * postgres
 * presto
 * segment
 * sendgrid
 * singularity
 * tableau
 * teradata
 * trino
 * vertica
 * yandex
 * zendesk







On Tue, Jun 24, 2025 at 11:40 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Cool :)
>
> On Tue, Jun 24, 2025 at 11:07 AM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
>
>> Hello,
>>
>> Yeah I realised that there could probably be some places that require more
>> attention and the pattern that I see
>> is that those tests are mostly playing around or accessing the *default*
>>  connections.
>>
>> I am working on a task that can separate out the DB dependency while
>> creating *default* connections
>> using the ENV backend too: Separate out creation of default Connections
>> for
>> tests and non-tests <https://github.com/apache/airflow/pull/52129>.
>>
>> This should help us in the long run and will also make it easier to
>> migrate
>> provider code away from direct DB access!
>>
>> Thanks & Regards,
>> Amogh Desai
>>
>>
>> On Mon, Jun 23, 2025 at 11:57 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> > yep - there are a few providers that require more "thorough" changes -
>> ssh
>> > for example :) ... We noticed when doing cleanup. But we already can
>> > clean-up and remove a  lot of the pytest.mark.db_tests that were only
>> there
>> > due to connections :)
>> >
>> > I am thrilled at the prospect of having all our "providers" tests
>> > eventually DB-less. This has been long in the making and is only
>> possible
>> > now for Airflow 3 :).
>> >
>> > While it will not be possible for **ALL** of them - for example
>> "Executors"
>> > - like edge3 have to use the DB, the regular providers that only provide
>> > hooks/operators/triggers should all be db-less for tests - that will
>> also
>> > help us when finally task.sdk will be decoupled from "airflow-core" as
>> we
>> > will be able to remove "airflow-core" from being a test dependency in
>> those
>> > providers (and we will truly be able to see that indeed there are no
>> > left-overs in providers to follow our "airflow 3" architecture.
>> >
>> > J.
>> >
>> >
>> > On Mon, Jun 23, 2025 at 7:57 AM Amogh Desai <amoghdesai....@gmail.com>
>> > wrote:
>> >
>> > > Thanks for the email, Jarek.
>> > >
>> > > A quick summary of the change: while working on moving BaseHook to the
>> > task
>> > > SDK in #51873,
>> > > I noticed that many providers rely on `db.merge_conn()` to set up test
>> > > connections & there are thousands
>> > > of occurrences across the codebase. Doing this comes with few
>> drawbacks:
>> > >
>> > >    - Slows down test execution due to database transactions.
>> > >    - It introduces complexity by requiring DB setup/teardown.
>> > >    - It occasionally results in flaky tests due to DB access issues.
>> > >
>> > > So I replaced those with a fixture that creates these test
>> connections in
>> > > the ENV backend, as it is a pre-configured backend
>> > > for Airflow.
>> > >
>> > > There are still a few TODOs I plan to handle in follow ups (if someone
>> > > wants to contribute, I will be more than happy to
>> > > review too):
>> > >
>> > >    - Replace remaining occurrences in unit tests.
>> > >    - Clean up some in-code TODO comments I’ve left as placeholders.
>> > >    - Address similar usage in system tests.
>> > >    - Possibly improve the fixture to support adding multiple
>> connections
>> > at
>> > >    once.
>> > >    - Update the Telegram provider tests (
>> > >    *providers/telegram/tests/unit/telegram/hooks/test_telegram.py*).
>> > >
>> > >
>> > > Thanks & Regards,
>> > > Amogh Desai
>> > >
>> > >
>> > > On Sun, Jun 22, 2025 at 6:37 PM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>> > >
>> > > > Hello here,
>> > > >
>> > > > After #51930 where Amogh  introduced a way how tests can define
>> > > connections
>> > > > without DB and follow up in #52017 where I made an attempt to remove
>> > some
>> > > > of the *pytest.mark.db_test,* we should now remove those marks from
>> > > > providers, where it is easy.
>> > > >
>> > > > Some providers still use DB for other things, but likely there are
>> many
>> > > > providers that only used db to create a Connection and we can turn
>> > those
>> > > > tests into non-db tests.
>> > > >
>> > > > I created an issue - where I have checkboxes for providers that need
>> > > review
>> > > > - and I have a kind request to the contributors and committers to -
>> as
>> > > > usual - help.
>> > > >
>> > > > The issue is here:
>> > > >
>> > > > https://github.com/apache/airflow/issues/52020
>> > > >
>> > > > And I provided detailed instructions (they are rather easy) on how
>> to
>> > do
>> > > > it.
>> > > >
>> > > > Looking forward to your help !
>> > > >
>> > > > J.
>> > > >
>> > >
>> >
>>
>

Reply via email to