shivaam opened a new pull request, #64089:
URL: https://github.com/apache/airflow/pull/64089

   Adds progress tracking to the backfill UI — the banner and Backfills list 
page now
   show a progress bar (green for success, red for failed, gray for remaining) 
with a
   completion count (e.g., "3/6"). Completed backfills show "Completed" text.
   
   ### Changes
   
   **Backend:**
   - Added `num_runs` and `dag_run_state_counts` to `BackfillResponse`
   - Single enrichment query (JOIN `backfill_dag_run` + `dag_run`, GROUP BY 
state)
   - `num_runs` derived by summing state counts (inner join excludes NULL 
dag_run_id rows)
   - All backfill endpoints enriched (list, get, pause, unpause, cancel)
   
   **Frontend:**
   - New `BackfillProgressBar` shared component (used by both banner and table)
   - Completed backfills → "Completed" text, active → progress bar
   
   **Tests:**
   - `test_get_backfill_with_dag_run_state_counts` covering mixed states
   - All existing assertions updated for new fields
   
   ### Open design questions — requesting reviewer input
   
   **Should this be a new API endpoint instead of enriching BackfillResponse?**
   
   We considered `GET /backfills/{id}/dagRuns` returning the linked dag runs 
with state,
   letting the UI compute counts client-side. Reasons it may be better:
   - Keeps `BackfillResponse` stable (no new fields on the public v2 API)
   - Enables linking individual dag runs to backfills in the UI (not possible 
today)
   - Supports future CLI `airflow backfill status` use case
   - UI can poll progress independently of backfill metadata
   
   The current approach (enriching BackfillResponse) was simpler to ship but 
couples
   progress data to every backfill fetch. Happy to refactor if reviewers prefer 
a
   separate endpoint.
   
   **Other questions for reviewers:**
   
   1. **Should `num_runs` include skipped `BackfillDagRun` rows?** Currently it 
only
      counts rows with an actual DagRun (inner join). Skipped slots
      (`exception_reason = "already exists"` / `"in flight"`) are excluded. 
Including
      them would change the denominator from "progress of created runs" to 
"progress
      across entire date range." The tradeoff: including skipped slots gives a 
more
      complete picture but a "Missing Runs" backfill that skipped all dates 
would show
      "0/6" which is confusing.
   
   2. **Should completed backfills show final stats?** Currently shows 
"Completed" text
      and hides the bar. A backfill that finished 95 success + 5 failed looks 
the same
      as 100 success. Showing the final breakdown would be more informative but 
the
      underlying data has a limitation: when a newer backfill reprocesses the 
same dates,
      `DagRun.backfill_id` gets reassigned, so old backfills lose their state 
counts.
      "Completed" text avoids exposing this stale data.
   
   3. **Should running/queued be visually distinct?** The bar currently has 
three
      segments: green (success), red (failed), gray (remaining). Running and 
queued are
      lumped into gray. Splitting them out adds information but also visual 
complexity.
   
   4. **Should `_enrich_backfill_responses` move to a shared utility?** It is 
currently
      in `routes/public/backfills.py` and imported by `routes/ui/backfills.py`, 
creating
      a cross-dependency between route modules.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (claude-opus-4-6)
   
   Generated-by: Claude Code (claude-opus-4-6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to