Feel free to ignore my last message, sorry about that.  I just came back from a 
vacation and it's Monday and I didn't realize I was missing an entire thread of 
discussion there.  Outlook does not handle the mailinglist message threading at 
all.  Apologies, it's not my intention to fork the conversation at this stage.

-ferruzzi

________________________________
From: Ferruzzi, Dennis <ferru...@amazon.com.INVALID>
Sent: Monday, March 27, 2023 12:43 PM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL][DISCUSS] AIP-52 updates - setup / teardown tasks

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



I really like the idea of this AIP and I'm looking forward to seeing how you 
implement it.   I know you've put lots of effort into it, and I'm not looking 
to derail your plans. It may be entirely too late at this point, but if you are 
doing a rework, how would you feel about the idea of the taskinstance accepting 
an optional list of tasks (callables maybe??) for setup and teardown?  Then 
when you define a task in a dag it might look something like (simplified)

====
@task
def setup1:
    create_thing()

@task(trigger_rule=TriggerRule.ALL_DONE)
def teardown1:
    delete_thing()

@task(trigger_rule=TriggerRule.ALL_DONE)
def teardown2:
    delete_logs()

with DAG() as dag:
    task1 = SomeRandomOperator(
        myVar = myVar,
        setup_tasks=[setup1],
        teardown_tasks=[teardown1, teardown2]
    )
====

One issue with that is that the @tasks would be added to the chain so there 
would need to be some way to stop that... maybe a new decorator called @setup 
and @teardown that inherits from @task but doesn't add to chain, or adding a 
parameter to @task like is_work_task which defaults to True and doesn't add to 
the chain if it's false or something?

-ferruzzi


________________________________
From: Daniel Standish <daniel.stand...@astronomer.io.INVALID>
Sent: Wednesday, March 22, 2023 11:57 PM
To: dev@airflow.apache.org
Subject: [EXTERNAL] [DISCUSS] AIP-52 updates - setup / teardown tasks

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



I’m part of a group working on the implementation of AIP-52.  We would like
to update the community on some changes to the implementation approach, the
planned roadmap, and give an opportunity to provide feedback.

First though, let’s recap briefly what are the main benefits of adding
setup and teardown as concepts in Airflow:

   -

   By separating setup and teardown from "work" tasks, after the failure of
   a work task, we can stop the dag from proceeding (i.e. onto subsequent work
   tasks) while still allowing needed teardown operations to proceed (e.g.
   deleting a cluster).
   -

   This separation also lets us optionally *not* fail a dag run when
   perhaps the important work was completed successfully but a cleanup or
   teardown operation failed.
   -

   By associating work and setup tasks, we can clear the setups (and their
   respective teardowns) when clearing the work tasks.


After experimenting with some different implementation approaches and
reviewing and writing a lot of example dags, we developed some conclusions
that caused us to change course somewhat, while still fulfilling the
principal goals of the AIP.

Perhaps most importantly, we believe it is essential that our design
choices leave room for multiple setup and teardown tasks in a given task
group or dag.  Dags don’t tend to do just one thing.  In a dag there could
be many tasks requiring their own “setup” and “teardown”.  Similarly, a
single “work” task may itself require multiple “setup” and teardown tasks.
For obvious reasons, combining the work of multiple operators into a single
task is not advisable.  And, requiring a new task group for each thing
requiring a setup also has pitfalls: it conflicts with the task group’s use
case as an arbitrary logical grouping of tasks, and as a task mapping
tool.  So we believe it will be necessary to be able to support multiple
setups within a group, and moreover we believe it will be necessary to be
able to set dependencies between them.

With that in mind, the main change we’d like to share is that we now
require that users must specify the relationship between setup/teardown
tasks and “normal” tasks.  *(In the original proposal, users were not
required to set relationships between setup/teardown tasks and the other
tasks in the group.)*

So in the original AIP you could do this:

with TaskGroup("group1") as tg1:

   setup1 = my_setup("g1_setup") # a setup task

   work1 = my_work("g1_work1")
  work2 = my_work("g1_work2")
  work1 >> work2

   teardown1 = my_teardown("g1_teardown") # a teardown task

Then in effect you’d get setup1 >> work1 >> work2 >> teardown1.

Now we require you to set those relationships explicitly.  Otherwise, if
you were to add a setup2 and a teardown2, it would not be clear what the
task sequencing should be.  Apart from this, we believe being explicit is
important for readability, because unless you are careful with object
naming in your dag it may not be obvious that setup1 and teardown1 are not
“normal” tasks, and therefore it might appear that they are free to run in
parallel as roots of the group.

Looking further ahead, while some of the design decisions are not yet
finalized, we’d like to give you a fuller preview of where we see this
going and how it should work.

At a high level, our approach is to make setup and teardown much more like
“normal” tasks, able to be organized and combined with all the flexibility
that Airflow users are accustomed to.  The behavior is mainly governed by a
few simple rules:

   -

   A teardown task will run if its setup has completed successfully and its
   upstreams are done.
   -

   The setup tasks “required by” a work task will be cleared when the work
   task is cleared.


When using multiple setups and teardowns, you will need to specify which
setup is for which teardown.  And the setup task “required by” a work task
can be inferred by its location between a setup and its teardown.

OK – any more detail would be too much for one email.  If you are
interested in reviewing our progress in greater detail and making comment,
you may review our working draft update here (
https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+DRAFT+updates+to+AIP-52).
We’ve added lots of examples with graph screenshots to help illustrate the
behavior, and there’s some discussion of the ways it differs from the
original.


Thanks for your consideration.


Daniel

Reply via email to