Re: [DISCUSS] Turn "tests_common" into separate distribution for development

2025-03-05 Thread Ferruzzi, Dennis
devel-common sounds reasonable


 - ferruzzi



From: Jarek Potiuk 
Sent: Tuesday, March 4, 2025 10:53 AM
To: dev@airflow.apache.org
Subject: RE: [EXT] [DISCUSS] Turn "tests_common" into separate distribution for 
development


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.


I am doing a bit more cleanup, and I have found that the easier way to fix some 
of the remaining issues will be to clean-up (and remove) the remaining editable 
devel dependencies and incorporate them all in the "tests-common" package.

You can take a look at the PR:  https://github.com/apache/airflow/pull/47281 - 
but basically what it means is:

* all "devel" dependencies are added as required dependencies of "tests-common" 
(except "doc" - I will treat doc separately).
* I removed all "legacy" extras from the "airflow" package including "bundle" 
extras: "devel-ci", "devel-db" and a few others - except installing "all" 
dependencies as they were pretty useless. Except "editable" all only - see 
below - we will have no more devel and bundle extras  (Ash - I guess this is 
what you were looking forward to :) )
* Instead we have one "all" extra that is available only in editable mode - 
it's not documented in user documentation and it is really only useful to 
install everything with `pip` (with uv you get the same with `uv sync 
--all-extras`)  - this is still used internally in the CI image to run `uv pip 
install .[all] --constraints` until we switch to use "uv.lock" in the future
* hatch_build.py is significantly simpler and easier to understand now - with 
all the bundle removal and moving all dependencies to ./tests-common
* we still need dynamic dependencies and ./hatch_build.py - but less and less, 
with PEP735 (https://peps.python.org/pep-0735/) implemented in pip in April we 
will likely be able to turn our optional dependencies into static 
pyproject.toml deps, and with https://peps.python.org/pep-0771/ (needs approval 
and implementation) we will likely be able to have static pyproject.toml 
required dependencies as well.
* I updated install and contributing docs to be "uv first" - presenting as 
recommended and the first option to go with `uv`  - as it is becoming 
deceptively simple now to work with both - airflow and providers (and it will 
be even simpler after few next PRs)

Now. THE BIG QUESTION - naming again.  With all those changes. 
`tests-common` is becoming more of a `devel-common` package - because what it 
will do - it will contribute to all other sub-projects all the development 
tooling that is needed for those other projects to be developed.

Shall we name it "devel-common" instead of `tests-common`?

Part of why I think it makes sense is this specification in 
`tests-common/pyproject.toml` (see attachment) - also https://ibb.co/1cwZQRm if 
you do not see attachment. Note that these are generally "common" devel 
dependencies - and each of the packages can contribute their own (including 
task-sdk, airflow-core (future), every provider etc.)

[Screenshot from 2025-03-04 19-50-32.png]


J.



On Mon, Mar 3, 2025 at 4:14 AM Jarek Potiuk 
mailto:ja...@potiuk.com>> wrote:
Hello everyone.

I created the PR for that https://github.com/apache/airflow/pull/47281.

It's even nicer than I anticipated. I love the new super-simple workflows this 
restructuring finally enabled.

With `uv` and workspace, and the new structure of tests, developing and running 
 tests for providers, task-sdk or any of the other future sub-projects becomes 
very, very straightforward, we avoid duplication of pytest options and 
switching between airflow, task-sdk, providers tests will be simple and 
straightforward.

1) First of all with this change, we remove `devel-tests` extra. It was always 
needed in local venv to install all test dependencies, But this is completely 
gone right now. Test dependencies are automatically installed now when you run 
`uv sync`  - and you do not need to specify `--extra devel-tests`. If you are 
still using `pip` (I strongly recommend switching to uv) - you just install 
`pip install -e ./tests-common`.

2) the previous way of syncing and running tests in "everything installed" mode 
works as it worked before:

uv sync --all-extras

This installs all possible extras of airflow, allows you to run tests for all 
providers, activate the venv in `.venv` and run `pytests tests/always` or 
`pytest providers/mongo/tests`  or `pytest task_sdk/tests and it should all 
work fine as it did before

3) you can also install dependencies of a selected provider (or a few of those) 
in your .venv

uv syn

Re: [DISCUSS] Turn "tests_common" into separate distribution for development

2025-03-05 Thread Jarek Potiuk
Seems like the PR is getting to "green" zone - so one last push - and I am
changing it to "devel-common" unless I hear strong NOOO!

On Tue, Mar 4, 2025 at 10:36 PM Vincent Beck  wrote:

> devel-common makes sense to me
>
> On 2025/03/04 21:13:47 "Oliveira, Niko" wrote:
> > +1 to devel-common from me
> >
> > 
> > From: Ferruzzi, Dennis 
> > Sent: Tuesday, March 4, 2025 11:21:20 AM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXT] [DISCUSS] Turn "tests_common" into separate
> distribution for development
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> certain que le contenu ne présente aucun risque.
> >
> >
> >
> > devel-common sounds reasonable
> >
> >
> >  - ferruzzi
> >
> >
> > 
> > From: Jarek Potiuk 
> > Sent: Tuesday, March 4, 2025 10:53 AM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXT] [DISCUSS] Turn "tests_common" into separate
> distribution for development
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> certain que le contenu ne présente aucun risque.
> >
> >
> > I am doing a bit more cleanup, and I have found that the easier way to
> fix some of the remaining issues will be to clean-up (and remove) the
> remaining editable devel dependencies and incorporate them all in the
> "tests-common" package.
> >
> > You can take a look at the PR:
> https://github.com/apache/airflow/pull/47281 - but basically what it
> means is:
> >
> > * all "devel" dependencies are added as required dependencies of
> "tests-common" (except "doc" - I will treat doc separately).
> > * I removed all "legacy" extras from the "airflow" package including
> "bundle" extras: "devel-ci", "devel-db" and a few others - except
> installing "all" dependencies as they were pretty useless. Except
> "editable" all only - see below - we will have no more devel and bundle
> extras  (Ash - I guess this is what you were looking forward to :) )
> > * Instead we have one "all" extra that is available only in editable
> mode - it's not documented in user documentation and it is really only
> useful to install everything with `pip` (with uv you get the same with `uv
> sync --all-extras`)  - this is still used internally in the CI image to run
> `uv pip install .[all] --constraints` until we switch to use "uv.lock" in
> the future
> > * hatch_build.py is significantly simpler and easier to understand now -
> with all the bundle removal and moving all dependencies to ./tests-common
> > * we still need dynamic dependencies and ./hatch_build.py - but less and
> less, with PEP735 (https://peps.python.org/pep-0735/) implemented in pip
> in April we will likely be able to turn our optional dependencies into
> static pyproject.toml deps, and with https://peps.python.org/pep-0771/
> (needs approval and implementation) we will likely be able to have static
> pyproject.toml required dependencies as well.
> > * I updated install and contributing docs to be "uv first" - presenting
> as recommended and the first option to go with `uv`  - as it is becoming
> deceptively simple now to work with both - airflow and providers (and it
> will be even simpler after few next PRs)
> >
> > Now. THE BIG QUESTION - naming again.  With all those changes.
> `tests-common` is becoming more of a `devel-common` package - because what
> it will do - it will contribute to all other sub-projects all the
> development tooling that is needed for those other projects to be developed.
> >
> > Shall we name it "devel-common" instead of `tests-common`?
> >
> > Part of why I think it makes sense is this specification in
> `tests-common/pyproject.toml` (see attachment) - also
> https://ibb.co/1cwZQRm if you do not see attachment. Note that these are
> generally "common" devel dependencies - and each of the packages can
> contribute their own (including task-sdk, airflow-core (future), every
> provider etc.)
> >
> > [Screenshot from 2025-03-04 19-50-32.png]
> >
> >
> > J.
> >
> >
> >
> > On Mon, Mar 3, 2025 at 4:14 AM Jarek Potiuk  ja...@potiuk.com>> wrote:
> > Hello everyone.
> >
> > I created the PR for that https://github.com/apache/airflow/pull/47281.
> >
> > It's even nicer than I anticipated. I love the new super-simple
> workflows this restructuring finally enabled.
> >
> > Wi

Re: [DISCUSS] Auth backends

2025-03-05 Thread Pierre Jeambrun
I think this is a great way to move forward considering airflow 3 deadline.
Also given that this is already implemented and merged in main I’m not sure
if an AIP makes sense at this point, I’ll let others weigh in on this.

On Tue 4 Mar 2025 at 23:05, Vincent Beck  wrote:

> Option 1 seems to be the winning choice. (If you disagree, there is still
> time to bring it up.)
>
> Regarding the need for more details on the implementation and flow for
> creating and using the JWT token, what should be the next step? Jarek, you
> suggested creating an AIP, and I agree that this would help formalize a
> proposal with more details, enabling a more thoughtful decision. However,
> given the timeline, this will not be ready for Airflow 3.0.
>
> What will be the recommended way to call the Airflow 3 API? Currently,
> there is no documented solution for calling the Airflow 3 public API,
> including JWT token creation and API calls.
>
> One possible solution is to document the existing mechanism in main for
> creating a JWT token. This is the approach I described earlier in this
> thread: using APIs provided by auth managers. Each auth manager is
> responsible for creating the JWT token and then using it to make
> authorization decisions.
>
> Everything is already implemented except for the documentation—we need to
> create documentation to explain the flow to users so they can use it to
> call the Airflow 3 public API.
>
> To prepare for the future, we would also include a note in the
> documentation informing users that this approach is experimental/temporary
> and may change in the future.
>
> What do you think?
>
> On 2025/03/03 17:51:45 Vincent Beck wrote:
> > Yes, 100%.
> >
> > On 2025/03/03 17:33:55 Ash Berlin-Taylor wrote:
> > > So is the auth manager involved in interpreting the JWT token in to
> something more meaningful in order to make permission decisions etc then?
> > >
> > > > On 3 Mar 2025, at 16:38, Vincent Beck  wrote:
> > > >
> > > > JWT token created by FAB auth manager:
> > > > ```
> > > > {
> > > >   "id": "12345789"
> > > > }
> > > > ```
> > > >
> > > > JWT token created by simple auth manager:
> > > > ```
> > > > {
> > > >   "username": "Test",
> > > >   "role": "Admin"
> > > > }
> > > > ```
> > >
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Michał Modras
I strongly disagree with the proposal of changing the default for all DAGs.
This requires every user that does not specify catchup to modify their
DAGs. As pointed out in other similar threads about changes requiring DAG
code changes:

>I am concerned simply because it is a physical code change, and one that
would affect a good percentage of DAGs out there. No matter how complex the
change is, it forces the users to modify their code, which is huge
operational overhead in bigger organizations. Imagine - there could be a
central platform team responsible for migrating to Airflow 3. But then,
before Airflow 3 deployments can be used, each team using particular
Airflow deployment would need to modify the code. At the same time, the
platform team might not be permitted to touch code of these teams -
these are different personas. It can easily become a very complex migration
procedure to expedite across the organization, even if the code change
itself is simple.

Let's prioritise Airflow 3 ease of adoption. The fewer breaking changes,
the faster it will be adopted across the industry.


On Wed, Mar 5, 2025 at 7:43 PM Tamara Fingerlin
 wrote:

> Hey there, long time reader, first time poster here :)
>
>
> *tl;dr:*
>
> *As part of the 3.0 release, I would like to propose changing the default
> for `catchup_by_default` from True to False. *
>
> *This discussion asks for input and whether this can be a lazy consensus or
> should be a vote.*
>
> Timethings are hard. Especially for new Airflow users. When I first started
> using Airflow, it took me a while (and one or two napkin sketches) to
> understand how to set the start_date and trigger the DAG runs I wanted. To
> this day, I still often just pick a date a couple days in the past and just
> set catchup to False to not have to do the math on schedules that aren’t
> straightforward.
>
> As part of the Astronomer DevRel team, I teach users about Airflow. This
> “gotcha” is especially common for new users to run into. Imagine that
> you’re a new person writing a DAG with a start date of Jan 1st. You unpause
> your DAG, and you unexpectedly see a large amount of DAG runs kicking off.
> When we talk to practitioners in Airflow 101 webinars, many share that have
> accidentally overflooded their Airflow deployment because they didn’t
> understand the relationship between the start_date and DAG runs, by not
> knowing about catchup, or by forgetting to add the line when writing new
> dags.
>
> This is why I propose changing the config catchup_by_default from True to
> False.
>
> Pro:
>
>-
>
>Less accidental DAG runs by beginners and people accidentally forgetting
>catchup=False. Especially for beginners this is confusing.
>-
>
>One parameter less for beginners to learn when they write their first
>DAG, one line less to write for most DAGs in the future.
>
>
> Con:
>
>-
>
>Breaking change, but since it is a config value a minor one that users
>who want the old behavior can easily adjust. We can add something to the
>config linter to highlight this change, and prompt users to set the
> value
>back to True if they prefer the current behaviour.
>
>
>
> Elad pointed out that there has been previous discussion on this including:
>
>
>-
>
>The suggestion to move away from a binary option to an enum to have more
>fine grained control on when to catch up (only when the DAG is first
> turned
>on, only when the DAG is not first turned on, always, never…) #35392
>
>
>
> This is a good idea, but there is more to figure out. As others have
> pointed out in the PR, if we go this route this means more configurations.
> I don’t think changing the default blocks from going this route in the
> future.
>
> When the time comes, we could turn this into an enum. For migration
> purposes and to avoid DAG code changes, we could add more options including
> “always” and “never”, and map True to “always” and False to “never”. For
> this feature, what we do at the global level should match what’s available
> at the DAG level, meaning the DAG parameter will also need to be adjusted
> accordingly. Even in this new model, defaulting to "False"/“never” is the
> right way forward.
>
>
>-
>
>#38168 
> discussed/proposed
>the possibility for an option to disable the “catch up of the latest DAG
>run” behavior when unpausing a DAG with catchup=False.
>
>While it is closely related I think this is a separate issue that merits
>its own discussion. I.e. we’re not talking about changing a default
> value,
>we’re talking about fundamentally changing what catchup=False means.
> It’s a
>lot less alarming for users to accidentally trigger one DAG run because
>they didn’t understand catchup behaviour, versus a large number of DAG
>runs. That is the confusing behaviour, and what I’m hoping to prevent
>

[DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Tamara Fingerlin
Hey there, long time reader, first time poster here :)


*tl;dr:*

*As part of the 3.0 release, I would like to propose changing the default
for `catchup_by_default` from True to False. *

*This discussion asks for input and whether this can be a lazy consensus or
should be a vote.*

Timethings are hard. Especially for new Airflow users. When I first started
using Airflow, it took me a while (and one or two napkin sketches) to
understand how to set the start_date and trigger the DAG runs I wanted. To
this day, I still often just pick a date a couple days in the past and just
set catchup to False to not have to do the math on schedules that aren’t
straightforward.

As part of the Astronomer DevRel team, I teach users about Airflow. This
“gotcha” is especially common for new users to run into. Imagine that
you’re a new person writing a DAG with a start date of Jan 1st. You unpause
your DAG, and you unexpectedly see a large amount of DAG runs kicking off.
When we talk to practitioners in Airflow 101 webinars, many share that have
accidentally overflooded their Airflow deployment because they didn’t
understand the relationship between the start_date and DAG runs, by not
knowing about catchup, or by forgetting to add the line when writing new
dags.

This is why I propose changing the config catchup_by_default from True to
False.

Pro:

   -

   Less accidental DAG runs by beginners and people accidentally forgetting
   catchup=False. Especially for beginners this is confusing.
   -

   One parameter less for beginners to learn when they write their first
   DAG, one line less to write for most DAGs in the future.


Con:

   -

   Breaking change, but since it is a config value a minor one that users
   who want the old behavior can easily adjust. We can add something to the
   config linter to highlight this change, and prompt users to set the value
   back to True if they prefer the current behaviour.



Elad pointed out that there has been previous discussion on this including:


   -

   The suggestion to move away from a binary option to an enum to have more
   fine grained control on when to catch up (only when the DAG is first turned
   on, only when the DAG is not first turned on, always, never…) #35392
   


This is a good idea, but there is more to figure out. As others have
pointed out in the PR, if we go this route this means more configurations.
I don’t think changing the default blocks from going this route in the
future.

When the time comes, we could turn this into an enum. For migration
purposes and to avoid DAG code changes, we could add more options including
“always” and “never”, and map True to “always” and False to “never”. For
this feature, what we do at the global level should match what’s available
at the DAG level, meaning the DAG parameter will also need to be adjusted
accordingly. Even in this new model, defaulting to "False"/“never” is the
right way forward.


   -

   #38168  discussed/proposed
   the possibility for an option to disable the “catch up of the latest DAG
   run” behavior when unpausing a DAG with catchup=False.

   While it is closely related I think this is a separate issue that merits
   its own discussion. I.e. we’re not talking about changing a default value,
   we’re talking about fundamentally changing what catchup=False means. It’s a
   lot less alarming for users to accidentally trigger one DAG run because
   they didn’t understand catchup behaviour, versus a large number of DAG
   runs. That is the confusing behaviour, and what I’m hoping to prevent with
   the default change.



I started a PR here for the most basic option for this change, just
changing the config variable from True to False:
https://github.com/apache/airflow/pull/47354

If there is general alignment I’d try for a lazy consensus, otherwise a
vote 🙂


Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Constance Martineau
> I strongly disagree with the proposal of changing the default for all
DAGs.
This requires every user that does not specify catchup to modify their
DAGs.

I don't think that's accurate. Tamara's proposal is just to change the
default value of the global configuration that controls this
,
not remove the option entirely or force people to specify it at the dag
level. If teams rely on the current behaviour, they can simply update the
global config to keep it as True.

I do understand the concern about operational overhead in large
organizations. It’s reasonable for platform teams to want to avoid
requiring DAG authors to modify their code just to migrate to Airflow 3.
But in this case, the solution is straightforward: Platform teams can set
the config back to True as part of their migration process, all without
touching DAG author code.

The real question is whether DAG authors or platform teams should bear the
"default" burden? Keeping catchup enabled by default might make migrations
easier for platform teams, but it increases risk and unnecessary work for
DAG authors, especially those who may not understand the relationship
between start dates, logical dates, and catchup behaviour. Based on what we
see on Astro, the majority of DAGs on our platform have catchup set to
False. If we're prioritizing DAG authors, disabling catchup by default is
the right move.

On Wed, Mar 5, 2025 at 1:55 PM Michał Modras
 wrote:

> I strongly disagree with the proposal of changing the default for all DAGs.
> This requires every user that does not specify catchup to modify their
> DAGs. As pointed out in other similar threads about changes requiring DAG
> code changes:
>
> >I am concerned simply because it is a physical code change, and one that
> would affect a good percentage of DAGs out there. No matter how complex the
> change is, it forces the users to modify their code, which is huge
> operational overhead in bigger organizations. Imagine - there could be a
> central platform team responsible for migrating to Airflow 3. But then,
> before Airflow 3 deployments can be used, each team using particular
> Airflow deployment would need to modify the code. At the same time, the
> platform team might not be permitted to touch code of these teams -
> these are different personas. It can easily become a very complex migration
> procedure to expedite across the organization, even if the code change
> itself is simple.
>
> Let's prioritise Airflow 3 ease of adoption. The fewer breaking changes,
> the faster it will be adopted across the industry.
>
>
> On Wed, Mar 5, 2025 at 7:43 PM Tamara Fingerlin
>  wrote:
>
> > Hey there, long time reader, first time poster here :)
> >
> >
> > *tl;dr:*
> >
> > *As part of the 3.0 release, I would like to propose changing the default
> > for `catchup_by_default` from True to False. *
> >
> > *This discussion asks for input and whether this can be a lazy consensus
> or
> > should be a vote.*
> >
> > Timethings are hard. Especially for new Airflow users. When I first
> started
> > using Airflow, it took me a while (and one or two napkin sketches) to
> > understand how to set the start_date and trigger the DAG runs I wanted.
> To
> > this day, I still often just pick a date a couple days in the past and
> just
> > set catchup to False to not have to do the math on schedules that aren’t
> > straightforward.
> >
> > As part of the Astronomer DevRel team, I teach users about Airflow. This
> > “gotcha” is especially common for new users to run into. Imagine that
> > you’re a new person writing a DAG with a start date of Jan 1st. You
> unpause
> > your DAG, and you unexpectedly see a large amount of DAG runs kicking
> off.
> > When we talk to practitioners in Airflow 101 webinars, many share that
> have
> > accidentally overflooded their Airflow deployment because they didn’t
> > understand the relationship between the start_date and DAG runs, by not
> > knowing about catchup, or by forgetting to add the line when writing new
> > dags.
> >
> > This is why I propose changing the config catchup_by_default from True to
> > False.
> >
> > Pro:
> >
> >-
> >
> >Less accidental DAG runs by beginners and people accidentally
> forgetting
> >catchup=False. Especially for beginners this is confusing.
> >-
> >
> >One parameter less for beginners to learn when they write their first
> >DAG, one line less to write for most DAGs in the future.
> >
> >
> > Con:
> >
> >-
> >
> >Breaking change, but since it is a config value a minor one that users
> >who want the old behavior can easily adjust. We can add something to
> the
> >config linter to highlight this change, and prompt users to set the
> > value
> >back to True if they prefer the current behaviour.
> >
> >
> >
> > Elad pointed out that there has been previous discussion on this
> including:
> >
> >
> >-
> >
> >The 

Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Jarek Potiuk
I think Michał - you might have not entirely understood / realised the
impact. It's not as bad as you described it, I think.

As I understand it - the idea here is to set "catchup_by_default" to False,
but you can still keep 100% compatibility by setting it to True
manually for the whole installation. Then all the dags will behave the same
way as in Airflow 2.

This might mean for example that if someone is migrating from Airflow 2, as
part of the migration (if they care about catchup_by_default) - either the
service provider, or someone who manages Airflow installation can set the
"catchup_by_default" to True (and we can even suggest it as part of
migration process) - and the compatibility is set.

And if my understanding is right, I am quite in favour of this proposal.

J.




On Wed, Mar 5, 2025 at 7:54 PM Michał Modras
 wrote:

> I strongly disagree with the proposal of changing the default for all DAGs.
> This requires every user that does not specify catchup to modify their
> DAGs. As pointed out in other similar threads about changes requiring DAG
> code changes:
>
> >I am concerned simply because it is a physical code change, and one that
> would affect a good percentage of DAGs out there. No matter how complex the
> change is, it forces the users to modify their code, which is huge
> operational overhead in bigger organizations. Imagine - there could be a
> central platform team responsible for migrating to Airflow 3. But then,
> before Airflow 3 deployments can be used, each team using particular
> Airflow deployment would need to modify the code. At the same time, the
> platform team might not be permitted to touch code of these teams -
> these are different personas. It can easily become a very complex migration
> procedure to expedite across the organization, even if the code change
> itself is simple.
>
> Let's prioritise Airflow 3 ease of adoption. The fewer breaking changes,
> the faster it will be adopted across the industry.
>
>
> On Wed, Mar 5, 2025 at 7:43 PM Tamara Fingerlin
>  wrote:
>
> > Hey there, long time reader, first time poster here :)
> >
> >
> > *tl;dr:*
> >
> > *As part of the 3.0 release, I would like to propose changing the default
> > for `catchup_by_default` from True to False. *
> >
> > *This discussion asks for input and whether this can be a lazy consensus
> or
> > should be a vote.*
> >
> > Timethings are hard. Especially for new Airflow users. When I first
> started
> > using Airflow, it took me a while (and one or two napkin sketches) to
> > understand how to set the start_date and trigger the DAG runs I wanted.
> To
> > this day, I still often just pick a date a couple days in the past and
> just
> > set catchup to False to not have to do the math on schedules that aren’t
> > straightforward.
> >
> > As part of the Astronomer DevRel team, I teach users about Airflow. This
> > “gotcha” is especially common for new users to run into. Imagine that
> > you’re a new person writing a DAG with a start date of Jan 1st. You
> unpause
> > your DAG, and you unexpectedly see a large amount of DAG runs kicking
> off.
> > When we talk to practitioners in Airflow 101 webinars, many share that
> have
> > accidentally overflooded their Airflow deployment because they didn’t
> > understand the relationship between the start_date and DAG runs, by not
> > knowing about catchup, or by forgetting to add the line when writing new
> > dags.
> >
> > This is why I propose changing the config catchup_by_default from True to
> > False.
> >
> > Pro:
> >
> >-
> >
> >Less accidental DAG runs by beginners and people accidentally
> forgetting
> >catchup=False. Especially for beginners this is confusing.
> >-
> >
> >One parameter less for beginners to learn when they write their first
> >DAG, one line less to write for most DAGs in the future.
> >
> >
> > Con:
> >
> >-
> >
> >Breaking change, but since it is a config value a minor one that users
> >who want the old behavior can easily adjust. We can add something to
> the
> >config linter to highlight this change, and prompt users to set the
> > value
> >back to True if they prefer the current behaviour.
> >
> >
> >
> > Elad pointed out that there has been previous discussion on this
> including:
> >
> >
> >-
> >
> >The suggestion to move away from a binary option to an enum to have
> more
> >fine grained control on when to catch up (only when the DAG is first
> > turned
> >on, only when the DAG is not first turned on, always, never…) #35392
> > >
> >
> >
> > This is a good idea, but there is more to figure out. As others have
> > pointed out in the PR, if we go this route this means more
> configurations.
> > I don’t think changing the default blocks from going this route in the
> > future.
> >
> > When the time comes, we could turn this into an enum. For migration
> > purposes and to avoid DAG code changes, we c

Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Jens Scheffler

+1 - I would also favor this proposal.

On 05.03.25 21:25, Akash Sharma wrote:

+1 We should change it from both global and DAG level.

Best regards,
Akash

On Thu, 6 Mar, 2025, 01:01 Jed Cunningham,  wrote:


+1, this is one of the few configs that I change from the default
immediately.



-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Jarek Potiuk
Looks like `[LAZY CONSENSUS]` candidate to me :)

On Wed, Mar 5, 2025 at 11:23 PM Oliveira, Niko 
wrote:

> +1 as well,l I've never liked the default on behaviour for this config.
>
> 
> From: Jens Scheffler 
> Sent: Wednesday, March 5, 2025 12:54:05 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXT] [DISCUSSION] Changing catchup_by_default from True to
> False
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1 - I would also favor this proposal.
>
> On 05.03.25 21:25, Akash Sharma wrote:
> > +1 We should change it from both global and DAG level.
> >
> > Best regards,
> > Akash
> >
> > On Thu, 6 Mar, 2025, 01:01 Jed Cunningham, 
> wrote:
> >
> >> +1, this is one of the few configs that I change from the default
> >> immediately.
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


[REMINDER]: Airflow 3 Dev call on 6th March - agenda

2025-03-05 Thread Vikram Koka
Hi everyone,

Here is a quick reminder that we have our Airflow 3 dev call tomorrow
Thursday, the 6th of March at 8AM PST (11 am EST | 4 pm UTC / GMT).

Proposed agenda:
1. Check in on action items from the last call
- Airflow 3.0 minimum version to be Python 3.10 (Ash Berlin-Taylor)

2. Development updates and presentations
- Update on AIP-72 Task Execution Interface aka Task SDK
  Short circuit and Branch operators (Shahar Epstein)
- Test plan update (Rahul Vats)
- Milestone update (Vikram Koka)
- TBD

3. Discussions
- Packaging update - open items (Ash Berlin-Taylor, Jed Cunningham)
- TBD


For additions to the agenda, please feel free to add a comment to the wiki
page

or
send me an email or slack message.
The summary of the call will also be sent to the mailing list and posted on
the same wiki page.

Looking forward to seeing you all soon,
Vikram

-- 

Vikram Koka
Chief Strategy Officer
Email: vik...@astronomer.io





Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Jed Cunningham
+1, this is one of the few configs that I change from the default
immediately.


Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Michał Modras
If we can control the behaviour globally instead of specifying it on the
DAG level, I take back my concerns. Thanks for clarifying!

On Wed, Mar 5, 2025 at 8:26 PM Jarek Potiuk  wrote:

> I think Michał - you might have not entirely understood / realised the
> impact. It's not as bad as you described it, I think.
>
> As I understand it - the idea here is to set "catchup_by_default" to False,
> but you can still keep 100% compatibility by setting it to True
> manually for the whole installation. Then all the dags will behave the same
> way as in Airflow 2.
>
> This might mean for example that if someone is migrating from Airflow 2, as
> part of the migration (if they care about catchup_by_default) - either the
> service provider, or someone who manages Airflow installation can set the
> "catchup_by_default" to True (and we can even suggest it as part of
> migration process) - and the compatibility is set.
>
> And if my understanding is right, I am quite in favour of this proposal.
>
> J.
>
>
>
>
> On Wed, Mar 5, 2025 at 7:54 PM Michał Modras
>  wrote:
>
> > I strongly disagree with the proposal of changing the default for all
> DAGs.
> > This requires every user that does not specify catchup to modify their
> > DAGs. As pointed out in other similar threads about changes requiring DAG
> > code changes:
> >
> > >I am concerned simply because it is a physical code change, and one that
> > would affect a good percentage of DAGs out there. No matter how complex
> the
> > change is, it forces the users to modify their code, which is huge
> > operational overhead in bigger organizations. Imagine - there could be a
> > central platform team responsible for migrating to Airflow 3. But then,
> > before Airflow 3 deployments can be used, each team using particular
> > Airflow deployment would need to modify the code. At the same time, the
> > platform team might not be permitted to touch code of these teams -
> > these are different personas. It can easily become a very complex
> migration
> > procedure to expedite across the organization, even if the code change
> > itself is simple.
> >
> > Let's prioritise Airflow 3 ease of adoption. The fewer breaking changes,
> > the faster it will be adopted across the industry.
> >
> >
> > On Wed, Mar 5, 2025 at 7:43 PM Tamara Fingerlin
> >  wrote:
> >
> > > Hey there, long time reader, first time poster here :)
> > >
> > >
> > > *tl;dr:*
> > >
> > > *As part of the 3.0 release, I would like to propose changing the
> default
> > > for `catchup_by_default` from True to False. *
> > >
> > > *This discussion asks for input and whether this can be a lazy
> consensus
> > or
> > > should be a vote.*
> > >
> > > Timethings are hard. Especially for new Airflow users. When I first
> > started
> > > using Airflow, it took me a while (and one or two napkin sketches) to
> > > understand how to set the start_date and trigger the DAG runs I wanted.
> > To
> > > this day, I still often just pick a date a couple days in the past and
> > just
> > > set catchup to False to not have to do the math on schedules that
> aren’t
> > > straightforward.
> > >
> > > As part of the Astronomer DevRel team, I teach users about Airflow.
> This
> > > “gotcha” is especially common for new users to run into. Imagine that
> > > you’re a new person writing a DAG with a start date of Jan 1st. You
> > unpause
> > > your DAG, and you unexpectedly see a large amount of DAG runs kicking
> > off.
> > > When we talk to practitioners in Airflow 101 webinars, many share that
> > have
> > > accidentally overflooded their Airflow deployment because they didn’t
> > > understand the relationship between the start_date and DAG runs, by not
> > > knowing about catchup, or by forgetting to add the line when writing
> new
> > > dags.
> > >
> > > This is why I propose changing the config catchup_by_default from True
> to
> > > False.
> > >
> > > Pro:
> > >
> > >-
> > >
> > >Less accidental DAG runs by beginners and people accidentally
> > forgetting
> > >catchup=False. Especially for beginners this is confusing.
> > >-
> > >
> > >One parameter less for beginners to learn when they write their
> first
> > >DAG, one line less to write for most DAGs in the future.
> > >
> > >
> > > Con:
> > >
> > >-
> > >
> > >Breaking change, but since it is a config value a minor one that
> users
> > >who want the old behavior can easily adjust. We can add something to
> > the
> > >config linter to highlight this change, and prompt users to set the
> > > value
> > >back to True if they prefer the current behaviour.
> > >
> > >
> > >
> > > Elad pointed out that there has been previous discussion on this
> > including:
> > >
> > >
> > >-
> > >
> > >The suggestion to move away from a binary option to an enum to have
> > more
> > >fine grained control on when to catch up (only when the DAG is first
> > > turned
> > >on, only when the DAG is not first turned on, always, never…) #35392
> > >

Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Akash Sharma
+1 We should change it from both global and DAG level.

Best regards,
Akash

On Thu, 6 Mar, 2025, 01:01 Jed Cunningham,  wrote:

> +1, this is one of the few configs that I change from the default
> immediately.
>


Re: [DISCUSS] Turn "tests_common" into separate distribution for development

2025-03-05 Thread Jarek Potiuk
The PR is now merged,,, we have 'devel-common` now.

The important bit now for everyone... It's easier than ever to have dev-env
running for airflow...

1) uv sync -> and your .venv env allows you to run all tests for
airflow/task-sdk
2) uv sync --all-packages -> and your .venv allows you to also run tests
for all providers
3)
cd providers/PROVIDER
uv sync

gives env ready to run all tests for the provider of your choice

uv run pytest

does it.

There are likely a few missing per-provider deps but we will find and
squash them soon !

Have fun!

J.



On Wed, Mar 5, 2025 at 12:48 PM Jarek Potiuk  wrote:

> Seems like the PR is getting to "green" zone - so one last push - and I am
> changing it to "devel-common" unless I hear strong NOOO!
>
> On Tue, Mar 4, 2025 at 10:36 PM Vincent Beck  wrote:
>
>> devel-common makes sense to me
>>
>> On 2025/03/04 21:13:47 "Oliveira, Niko" wrote:
>> > +1 to devel-common from me
>> >
>> > 
>> > From: Ferruzzi, Dennis 
>> > Sent: Tuesday, March 4, 2025 11:21:20 AM
>> > To: dev@airflow.apache.org
>> > Subject: RE: [EXT] [DISCUSS] Turn "tests_common" into separate
>> distribution for development
>> >
>> > CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>> >
>> >
>> >
>> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
>> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
>> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
>> certain que le contenu ne présente aucun risque.
>> >
>> >
>> >
>> > devel-common sounds reasonable
>> >
>> >
>> >  - ferruzzi
>> >
>> >
>> > 
>> > From: Jarek Potiuk 
>> > Sent: Tuesday, March 4, 2025 10:53 AM
>> > To: dev@airflow.apache.org
>> > Subject: RE: [EXT] [DISCUSS] Turn "tests_common" into separate
>> distribution for development
>> >
>> >
>> > CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>> >
>> >
>> >
>> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
>> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
>> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
>> certain que le contenu ne présente aucun risque.
>> >
>> >
>> > I am doing a bit more cleanup, and I have found that the easier way to
>> fix some of the remaining issues will be to clean-up (and remove) the
>> remaining editable devel dependencies and incorporate them all in the
>> "tests-common" package.
>> >
>> > You can take a look at the PR:
>> https://github.com/apache/airflow/pull/47281 - but basically what it
>> means is:
>> >
>> > * all "devel" dependencies are added as required dependencies of
>> "tests-common" (except "doc" - I will treat doc separately).
>> > * I removed all "legacy" extras from the "airflow" package including
>> "bundle" extras: "devel-ci", "devel-db" and a few others - except
>> installing "all" dependencies as they were pretty useless. Except
>> "editable" all only - see below - we will have no more devel and bundle
>> extras  (Ash - I guess this is what you were looking forward to :) )
>> > * Instead we have one "all" extra that is available only in editable
>> mode - it's not documented in user documentation and it is really only
>> useful to install everything with `pip` (with uv you get the same with `uv
>> sync --all-extras`)  - this is still used internally in the CI image to run
>> `uv pip install .[all] --constraints` until we switch to use "uv.lock" in
>> the future
>> > * hatch_build.py is significantly simpler and easier to understand now
>> - with all the bundle removal and moving all dependencies to ./tests-common
>> > * we still need dynamic dependencies and ./hatch_build.py - but less
>> and less, with PEP735 (https://peps.python.org/pep-0735/) implemented in
>> pip in April we will likely be able to turn our optional dependencies into
>> static pyproject.toml deps, and with https://peps.python.org/pep-0771/
>> (needs approval and implementation) we will likely be able to have static
>> pyproject.toml required dependencies as well.
>> > * I updated install and contributing docs to be "uv first" - presenting
>> as recommended and the first option to go with `uv`  - as it is becoming
>> deceptively simple now to work with both - airflow and providers (and it
>> will be even simpler after few next PRs)
>> >
>> > Now. THE BIG QUESTION - naming again.  With all those changes.
>> `tests-common` is becoming more of a `devel-common` package - because what
>> it will do - it will contribute to all other sub-projects all the
>> development tooling that is needed for those other projects to be developed.
>> >
>> > Shall we name it "devel-common" instead of `tests-common`?
>> >
>> > Part of why I think 

Re: [DISCUSSION] Changing catchup_by_default from True to False

2025-03-05 Thread Oliveira, Niko
+1 as well,l I've never liked the default on behaviour for this config.


From: Jens Scheffler 
Sent: Wednesday, March 5, 2025 12:54:05 PM
To: dev@airflow.apache.org
Subject: RE: [EXT] [DISCUSSION] Changing catchup_by_default from True to False

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



+1 - I would also favor this proposal.

On 05.03.25 21:25, Akash Sharma wrote:
> +1 We should change it from both global and DAG level.
>
> Best regards,
> Akash
>
> On Thu, 6 Mar, 2025, 01:01 Jed Cunningham,  wrote:
>
>> +1, this is one of the few configs that I change from the default
>> immediately.
>>

-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org