ShubhamGondane opened a new pull request, #62910:
URL: https://github.com/apache/airflow/pull/62910
## Root cause
`get_task_instances` combined explicit `join(TI.dag_run)` +
`outerjoin(TI.dag_version)` with `joinedload` options for the same tables
inside `eager_load_TI_and_TIH_for_validation()`. SQLAlchemy emits a separate
JOIN for each `joinedload` even when the table is already in the FROM clause,
producing duplicate JOINs that multiply the intermediate result set.
**Before — 7 JOINs (dag_run and dag_version each joined twice):**
```sql
FROM task_instance
JOIN dag_run ON ... -- explicit
LEFT OUTER JOIN dag_version ON ... -- explicit
LEFT OUTER JOIN dag_version AS dag_version_1 ON ... -- joinedload
duplicate
LEFT OUTER JOIN dag_bundle ON ...
JOIN dag_run AS dag_run_1 ON ... -- joinedload duplicate
LEFT OUTER JOIN dag AS dag_1 ON ...
LEFT OUTER JOIN task_instance_note ON ...
```
**After — 5 JOINs (each table once):**
```sql
FROM task_instance
JOIN dag_run ON ... -- reused by contains_eager
LEFT OUTER JOIN dag_version ON ... -- reused by contains_eager
LEFT OUTER JOIN dag_bundle ON ...
LEFT OUTER JOIN dag AS dag_1 ON ...
LEFT OUTER JOIN task_instance_note ON ...
```
## Fix
Replace `joinedload` options for already-joined tables with
`contains_eager`, which tells SQLAlchemy to reuse the existing JOIN result for
eager loading instead of adding a new one.
`eager_load_TI_and_TIH_for_validation()` is intentionally left unchanged —
its other callers (`get_mapped_task_instances`, `get_task_instances_history`)
have different join patterns and are not affected.
## Testing
- All 47 existing `TestGetTaskInstances` tests pass unchanged.
- Added `test_query_count_is_bounded` regression test asserting the endpoint
issues exactly 3 queries (1 auth + 1 COUNT + 1 main SELECT), preventing
duplicate JOINs from silently regressing.
closes: #62027
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]