Can you describe a use case for the requested feature other than debugging?
This doesn't feel like the right approach to test a specific task in a
pipeline.

On Fri, Jan 28, 2022 at 11:44 PM Alex Begg <alex.b...@gmail.com> wrote:

> Actually, sorry, you can scratch out some of what I just said, I thought
> you were talking about clearing states, you are instead referring to
> triggering a DAG run. That does kind of make sense to have a way to trigger
> a DAG run but only run specific tasks.
>
> On Fri, Jan 28, 2022 at 1:41 PM Alex Begg <alex.b...@gmail.com> wrote:
>
>> I believe this is currently possible by just unselecting “downstream”
>> before you click “Clear” in the UI. It should only clear the one middle
>> task and not the downstream task(s).
>>
>> I would prefer to not have a more detailed UI to allow to skip (or i want
>> to say “bypass” as “skip” is itself a task state) specific downstream tasks
>> as it might signal to users that it is ideal to specify tasks to bypass
>> when in reality it is only something that should be done on occasion for
>> experiment or troubleshooting as you mention, not a common occurrence.
>>
>> What I can agree to though is the list of buttons on the dialog window to
>> change state of a task is a bit cluttered looking. There probably can be a
>> better UI/UX for that, but I do think being able to check/uncheck
>> downstream task is a way to go, that seems like it will be just as
>> cluttered.
>>
>> Alex Begg
>>
>> On Fri, Jan 28, 2022 at 11:46 AM Hongyi Wang <whyni...@gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I'd like to propose a new feature in Airflow -- allow users to specify 
>>> tasks to skip when trigger DAG run.
>>>
>>> From our own experience, this feature can be very useful when doing 
>>> experiments, troubleshooting or re-running existing DAGs. And I believe it 
>>> can benefit many Airflow users.
>>>
>>> To illustrate the use case, I am going to use this example below.
>>> task-a ☐ -> task-b ☑ -> task-c ☐
>>>
>>> Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and 
>>> "task-c", I want to trigger a manual DAG run and skip "task-b" (so I can 
>>> save time & resource & focus on other two tasks). To do so, today I have 
>>> two options:
>>>
>>> Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS`
>>> Option 2: Remove "task-b" from my DAG, then trigger DAG
>>>
>>> Neither of the options are great. Option 1 can be troublesome when DAG is 
>>> large, and there are multiple tasks I want to skip. Option 2 requires 
>>> change in the DAG file, which is not convenient for just troubleshooting.
>>>
>>> Therefore, I would love to discuss how we can provide an easy way for users 
>>> to skip tasks when triggering DAG.
>>>
>>> Things to consider are:
>>> 1) We should allow user to specify all tasks to skip at once when trigger 
>>> DAG
>>> 2) We should retain the dependencies between non-skip tasks (in above 
>>> example, "task-c" won't start until "task-a" completes even if we skipped 
>>> "task-b")
>>> 3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to make it 
>>> more intuitive
>>> 4) The implementation should be easy, clean and low risk
>>>
>>> Here is my proposed solution (tested locally):
>>> Today, Airflow allow user to pass a JSON to the Dagrun as {{dag_run.conf}} 
>>> when triggering DAG. The idea is, before queuing task instances that 
>>> satisfies dependences, `scheduler_job.py` (after we make some change) will 
>>> filter task instances to skip based on `dag_run.conf` user passes in (e.g. 
>>> {"skip_tasks": ["task-b"]}), then mark them as SKIPPED.
>>>
>>> Things I would love to discuss:
>>> - What do you think about this feature?
>>> - What do you think about the proposed solution?
>>> - Did I miss anything that you want to discuss?
>>> - Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to 
>>> differentiate SKIPPED?
>>>
>>> Howie
>>>
>>>

Reply via email to