Re: [DISCUSS] pools and deferrables

Constance Martineau Fri, 11 Oct 2024 09:38:07 -0700

The main reason for pools is to control task execution parallelism,
especially when tasks interact with systems like APIs or databases, to
avoid overwhelming them. For deferrable operators, if a trigger is just
'sleeping' or waiting, it’s fine to exclude the task from the pool while in
a deferred state. But if the trigger is polling the source system to for
example check the state of something that is running, the trigger should
count towards the pool.


Honestly, I don’t think the flag should have been introduced in the first
place, but I’m sure someone out there relies on it. If we assume that most
triggers are polling the same systems the operator is interacting with, the
compromise could be to keep the setting but default to including deferred
tasks in the pool.

Is this configuration causing confusion or complicating the codebase?
> Otherwise, should we just keep it?
>
I think the confusion comes from who’s responsible for what. Most DAG
authors probably don’t know the exact limits on requests or connections an
API or database can handle. On the other hand, cluster or platform admins
likely don’t know the details about deferrable operators in Airflow—or even
which operators DAG authors are using to interact with resources. It’s
nuanced, but if we expect DAG authors to create and manage pools, then
having the setting is okay at that level. But if we think pools are mainly
managed by cluster admins for DAG authors to use, then the setting doesn’t
make much sense.

And yeah, I get that DAG authors can choose not to use a pool making all of
this moot. That’s something that should come up during code reviews or be
enforced through a task policy.




On Thu, Oct 10, 2024 at 10:43 PM Wei Lee <weilee...@gmail.com> wrote:

> I’m wondering why do we want to remove this. The design seems to be
> reasonable, but yep, it might not be as helpful as mentioned.
>
> > is it useful to have it take up a slot at the first and last couple
> seconds
> of its lifecycle?  methinks no.
>
> Some edge cases “might” be helpful. I guess? A deferrable operator that’s
> not (or could not) implemented well probably need it.
>
> My main question is whether this configuration is causing confusion or
> complicating logic in the code base? Otherwise, we probably could just keep
> it?
>
> Best,
> Wei
>
>
> > On Oct 10, 2024, at 9:05 PM, Daniel Standish
> <daniel.stand...@astronomer.io.INVALID> wrote:
> >
> > No worries.  The other issue is good to know about.  Just I'm trying to
> > keep the discussion focused on the main question.
> >
> > People don't have a lot of excess time to attend to things like this and
> > weigh in.  We're lucky if we get 3 people to weigh in on a discussion
> item
> > like this.
> >
> > So, I'm interested in your opinion.  Given that you've apparently thought
> > about the general topic, that's valuable.  So I want to invite you to
> think
> > about it and to share your take on it, or share that you don't have an
> > opinion on it.
>
>

Re: [DISCUSS] pools and deferrables

Reply via email to