Not interesting :) ?

On Thu, Jul 7, 2022 at 10:41 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> Hello everyone,
>
> We have just published a blog on our medium - 
> https://medium.com/apache-airflow/airflows-magic-loop-ec424b05b629 - that is 
> a blog of one of our users Itay Bittan (thanks!) who had been inspired by our 
> discussion on Slack on how they struggle with delays of loading dynamic dags 
> in their K8S.
>
> The problem that they had was that they have dynamic dags that are created in 
> a big loop (1000s of DAGs) and that caused ~ 2 minutes delays on starting 
> their tas on K8S, because all DAGs have to be created by the loop.
>
> What I proposed to try (since the DAGs were connected by the loop but really 
> isolated from each other) is to skip "all other" DAG creation in the loop 
> when it is parsed in the worker. That resulted in cutting the delay to ~ 
> 200ms.
>
> His case seems to be general enough to maybe suggest it even as a "general" 
> solution - currently it is based on possibly several "non-documented" 
> assumptions (that dag_id is passed in a certain way to the worker and that 
> you can use it to filter out such a loop.
>
> However maybe that's a good idea to make it documented and convert into "best 
> practice" when you have similar Dynamic DAGs.
>
> I can think of several caveats of such an approach - not all DAGs created in 
> a loop can be isolated, sometimes there might be side-effects that make your 
> dag have different structure if  you skip other DAGs, but - I thought that if 
> we add some "guidelines" that could be easily replicated by other users.
>
> WDYT?
>
> J.

Reply via email to