Just adding the [DISCUSS] prefix, which I forgot to add. On Thu, Oct 3, 2024 at 4:23 PM Daniel Standish < daniel.stand...@astronomer.io> wrote:
> Ok so, I'm thinking through what makes sense re concurrency control in > backfill. > > It was referred to > <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627729#AIP78Schedulermanagedbackfill-Otherideasunderconsideration> > in the AIP but I didn't define the behavior: > > Other ideas under consideration >> >> - Add extra concurrency control on dag run >> >> >> - Apply max active dag runs separately for backfill >> >> >> - Override any dag param in creating the backfill job and it’s only >> applied in that scope >> >> >> > As I have proceeded with implementation, here's what I went with: > > Each "backfill" gets its own concurrency control ("max_active_runs") that > is evaluated completely separate from the DAG scope max_active_runs > > So if DAG max active runs is 2, and the backfill max active runs is 1, > then you can have max of 3 concurrent runs. Your non-backfill dags cannot > starve out the backfill ones, and backfill dag runs cannot starve out the > non-backfill ones. > > The other way to go is to say that DAG.max_active_runs is global. This > does not feel quite right to me cus it gets a bit murky. E.g. what happens > if DAG.max is 10 and Backfill.max is 10. Do you allow it? What do you do > to avoid starving out non-backfill runs? > > What do people think? Relevant PR is here > <https://github.com/apache/airflow/pull/42686>. >