Exactly Jarek, and yes Daniel, I use that max_active_tis_per_dag parameter to control parallelism. The main reason for this solution is not the parallelism itself, but to be able to loop within one same task instance without overhead of Xcom's and task scheduling, which of course makes it (a lot) faster. By default, it will use parallelism in my implementation, but sometimes it's a necessity to not do it when doing IO but still want this approach as the processing will still be faster than when doing it through dynamic task mapping.
I was aware that if that feature would be added, it would have been Airflow 3.x anyway. As I mentioned before, I wanted a POC to know if it was easily feasible without too much changes and also I didn't wanted to wait until we have 3.0. That's why I tried to implement this generic approach at our company to see if it: 1. Would be easily feasible to implement this without too much changes in Airflow base code. 2. Would offer better performance (always need to measure to know it). 3. Would make the DAG code cleaner (less or no custom code required). Now that I know it's possible and all 3 points were check marked, we could think of a better implementation in Airflow 3.x, if this feature would be accepted of course. -----Original Message----- From: Jarek Potiuk <ja...@potiuk.com> Sent: Saturday, October 5, 2024 12:35 AM To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming support to PartialOperator EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>. From the earlier discussions with David - this is also (and mainly) about optimisation. Those operators do very little, and when you add total overhead that Airflow adds for scheduling and running every task, then it turns out that looping such operator's execute in a single interpreter is many, many times faster (several orders of magnitude) than running them sequentially as tasks. On Fri, Oct 4, 2024 at 10:16 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > Well, it looks like we do have concurrency control for mapped tasks > after all. > > See max_active_tis_per_dagrun which was added in > https://github.com/apache/airflow/pull/29094. > > So this would allow you to map over your 3000 users in a single run, > but process only one at a time (or 5 or 10 at a time). Does that help > your use case? > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org