|Hello everyone,|
||
|I'd like to initiate a vote on another new DAG authorship best practice.|
||
|## Proposal|
||
|When passing the output of one task to another via a Jinja template
string containing a single `xcom_pull` call — such as `"{{
ti.xcom_pull(task_ids='some_task') }}"` — the **`.output` attribute on
the task object** should be preferred instead.|
||
|### Before|
||
|```python|
|task_1 = PythonOperator(task_id="task_1", python_callable=my_func)|
|task_2 = BashOperator(|
|task_id="task_2",|
|bash_command="{{ ti.xcom_pull(task_ids='task_1') }}",|
|)|
|```|
||
|### After|
||
|```python|
|task_1 = PythonOperator(task_id="task_1", python_callable=my_func)|
|task_2 = BashOperator(|
|task_id="task_2",|
|bash_command=task_1.output,|
|)|
|```|
||
|The same pattern applies to TaskFlow-decorated functions and covers
both `ti.xcom_pull(...)` and `task_instance.xcom_pull(...)` forms.|
||
|## Rationale|
||
|- **Explicit dependencies:** Using `.output` makes the data dependency
visible to the DAG parser, enabling correct scheduling and rendering in
the Graph view — whereas a Jinja template string is opaque to the parser
until runtime.|
|- **Better IDE support:** `.output` provides autocompletion and
go-to-definition; template strings do not.|
|- **Refactoring safety:** Renaming a task with `.output` references is
caught by standard tooling. A `task_ids='...'` string silently breaks.|
|- **Consistency with TaskFlow:** `.output` is already the idiomatic
pattern in the TaskFlow API, so adopting it broadly reduces two mental
models down to one.|
|- **Real-world impact:** An initial ecosystem scan of the Airflow
repository itself found multiple existing violations of this pattern
across provider example DAGs (Amazon, Google, Databricks), confirming
this is a common and consequential pattern in practice.|
||
|## Cases NOT affected by this proposal|
||
|This best practice deliberately scopes to the unambiguous case of a
full, standalone `xcom_pull` template:|
||
|- **Mixed-content strings** such as `"echo {{
ti.xcom_pull(task_ids='task_1') }}"` — the template is part of a larger
string, so `.output` cannot be a drop-in replacement.|
|- **Non-default `key` arguments** such as `xcom_pull(task_ids='task_1',
key='my_key')` — these target a specific named XCom push, not the
default single output.|
|- **List `task_ids`** such as `xcom_pull(task_ids=['a', 'b'])` —
aggregating multiple outputs is outside the scope of `.output`.|
||
|## Call for Consensus|
||
|Please let me know if you have concerns, questions, or support.|
||
|Thank you,|
|Dev-iL|
||
|---|
||
|## See Also|
||
|1. [apache/airflow#43176
(comment)](https://github.com/apache/airflow/issues/43176#issuecomment-2667826944)
— original proposal in the static checks tracking issue|
|2. [astral-sh/ruff#23583](https://github.com/astral-sh/ruff/pull/23583)
— Draft Ruff PR implementing `AIR004`
(`airflow-xcom-pull-in-template-string`)|
|3.
[||apache/airflow#||62529||](https://github.com/apache/airflow/pull/62529)||—
Merged PR updating some of the matching patterns in the codebase.|